FAQ

Who is this tool for?

This tool is intended for any website owner, developer, QA engineer, Administrator/DevOps or consultant who cares about its quality.

You can read some typical use-cases.

What is the difference between this tool and other tools?

It’s free.

It is made with love for IT and the web development.

It tries to combine the functionality and added value of several different tools into one, and also brings unique functionality, such as the ability to convert an entire website to markdown. It combines a CLI (command-line approach) for advanced users with a graphical interface for ordinary users.

Yes. You can use the online HTML report feature, which will upload the HTML report to our infrastructure and provide you with a unique URL to view the report. It’s free.

Is this tool difficult to use?

No.

System Requirements are very low and for most users, the desktop application will do just fine.

Advanced users can install the command-line tool and run the command
./crawler --url=https://crawler.siteone.io/.

Is this tool safe to use?

Yes.

The source code of the tool is publicly available and auditable. The desktop application is also open-source.

The tool does not send any data to the Internet, unless you choose to use the useful --upload feature. However, as part of its functionality, it browses the required web page to the required extent, downloads its content locally and possibly also the content of associated pages, if you require it using the parameters.

At the same time, it tries to find out, for example, the DNS settings of this page, or analyzes the supported SSL/TLS protocols that encrypt communication.

From the point of view of the load - in the default setting, crawling runs in a very considerate way, which should not cause a problematic load. By default, the crawler will not do more than 3 requests at the same time and no more than 10 requests per second. Even the most basic web hosting can easily handle such a load.

The crawler also reads and respects the /robots.txt file and does not crawl URLs that are prohibited from crawling.

How can I prevent SiteOne Crawler from crawling my website?

If you want to deny crawling of the whole website, just add these 2 lines to your robots.txt:

User-agent: SiteOne-Crawler
Disallow: /

In addition, SiteOne Crawler also respects all URLs defined with Disallow: that are defined for User-agent: *

What are the key features of this tool?

For the best descriptive summary, we recommend reading the Key Features section.

What are the known limitations of this tool?

SiteOne Crawler works great on Linux and macOS. On Windows, due to the use of Cygwin, it may happen that the tool cannot process the entire website or all functionalities. For maximum performance and stability, we recommend using WSL (Windows Subsystem for Linux).

SiteOne Crawler does not interpret JavaScript, so pages designed solely as SPA (Single-Page Applications) without functional SSR (Server-Sider Rendering) may not work properly. In the future, we are considering adding the option for advanced users to interpret JavaScript through Chromium/Puppeteer running in Docker.

What are the future plans for this tool?

I describe thoughts and considerations for further development of the tool in the section Ideas and Roadmap.

How can I contribute to this tool?

You can find more information about the possibility of contributing in the section Contribution and Development.

How can I report a bug or request a new feature?

You can report bugs or request new features in the Issues in our GitHub repository. You can also use other communication channels described in the section Contact and Community.

How can I contact the author?

You can write to me on X (Twitter) or e-mail me at jan.reges@siteone.cz. Other communication channels are mentioned in the section Contact and Community.