Skip to content

Online HTML report (upload)

A very common use-case is the need to share an HTML report between several people. Typically - the website owner or QA engineer will analyze the website and want to share it with the developers.

We have therefore created a service that makes it very easy to upload an HTML report to our infrastructure and provides a secure unique URL for the report.

Features of the online HTML report

  • a unique and unguessable URL is generated for each report (this is the number of combinations with more than 20 digits);
  • it is possible to set the retention - how long the HTML report should be kept in online form, from 1 hour, through days and months, to years or infinity*. If you set the retention to 1 day and view the URL of the report in 2 days, you will see a message about the expired report;
  • it is possible to optionally set a password that will have to be entered (together with the username crawler) to display the report. The password is stored on the server in non-decodeable form with Bcrypt or Argon2ID;
  • as part of the upload, potentially sensitive information is removed from the HTML report (e.g. hostname or absolute paths on the PC/server where the crawler is running are replaced by %path%);
  • the service contains several anti-abuse features, described in the following chapters.

* This is a free service and we cannot predict future development and usage. In the event that storage expansion would be prohibitive for us, we would implement some intelligent form of deleting reports with infinite retention that are very large and have not been accessed for a long time.

Security mechanisms

To minimize misuse of this tool for purposes other than intended, we have implemented several security mechanisms:

  • from one IP address you cannot upload more than 10 reports in 5 minutes and a total of more than 100 reports per day;
  • you cannot upload more than 250 MB of reports in 1 day from one IP address;
  • firewalls are set to prevent brute-force attempts and other attacks (various combinations of rate-limiting and connection-limiting).

Since it is an open-source project, it is not possible to implement a very reliable protection and a form of “signing” of the uploaded HTML. But there are protections implemented that verify in some unspecified ways that the uploaded HTML comes from the SiteOne Crawler. It’s not 100% perfect though. At the same time, the mechanisms listed below are implemented to minimize the risk of misuse by modifying the uploaded HTML:

  • when displaying the HTML report, all available security HTTP headers are set with meaningful values, which can help minimize misuse;
  • however, the CSP (Content-Security-Policy) setting is absolutely essential. The setting is very strict and the only external requests that the browser can make when displaying are image requests from domains and subdomains of the analyzed domain. At the same time, it is possible to load images from some commonly used CDNs or image editing services, if their use is detected on the given website;
  • even if the attacker managed to abuse and modify the HTML of the report before uploading, he should not be able to abuse, for example, XSS to send some information to his own endpoint, etc. The CSP setting should prevent this;
  • it is absolutely essential that with CSP we only allow <script> tags with specific sha256 hashes of the given crawler version and any other JavaScript browsers will refuse to execute.

Note: If you are a security professional, we would appreciate your help in improving security.

How to set up your own upload service

  1. If using the command-line version, just add --upload --upload-to=https://my.domain.tld/my-service to the command. If using the desktop application, set your upload service URL to Upload to URL in the Full settings tab.
  2. Your service will receive a POST request after crawling is complete. The htmlBody field will contain the gzipped HTML report (as gzipped HTML text), the retention will be set in the retention field, and the password field can optionally be set with the password to be required to view the report. JSON must be returned as the return value of your service, which will contain the url property with the absolute URL to display the report.
  3. If your upload service is implemented correctly, the integration in the CLI and desktop application will also work, including the button to display the online report.

Command-line options

See Upload options for all settings.

Further development ideas

We consider being able to compare two or more reports of the same website over time to be a useful potential functionality in this area.

The HTML report would also contain main metrics/marks from individual areas of analysis in the hidden JSON. By comparing two or more historical reports, it would be possible to visualize long-term trends. It would thus be possible to visually see how SEO, security, accessibility, performance, best-practices, etc. parameters improve or deteriorate over time.

For this, website owners or developers would need to run a regular daily task using cron, which would analyze their website once a day and upload it online just by adding --upload to the command. Work for a few minutes and highly useful help.