FAQ
Who is this tool for?
Section titled โWho is this tool for?โThis tool is intended for any website owner, developer, QA engineer, Administrator/DevOps or consultant who cares about its quality.
You can read some typical use-cases.
What is the difference between this tool and other tools?
Section titled โWhat is the difference between this tool and other tools?โItโs free.
It is made with love for IT and the web development.
It tries to combine the functionality and added value of several different tools into one, and also brings unique functionality, such as the ability to convert an entire website to markdown. It combines a CLI (command-line approach) for advanced users with a graphical interface for ordinary users.
Can I easily share analysis results with colleagues?
Section titled โCan I easily share analysis results with colleagues?โYes. You can use the online HTML report feature, which will upload the HTML report to our infrastructure and provide you with a unique URL to view the report. Itโs free.
Is this tool difficult to use?
Section titled โIs this tool difficult to use?โNo.
System Requirements are very low and for most users, the desktop application will do just fine.
Advanced users can install the command-line tool and run the command
./crawler --url=https://crawler.siteone.io/
.
Is this tool safe to use?
Section titled โIs this tool safe to use?โYes.
The source code of the tool is publicly available and auditable. The desktop application is also open-source.
The tool does not send any data to the Internet, unless you choose to use the useful --upload
feature. However, as part of its functionality, it browses the required web page to the required extent, downloads its content locally and possibly also the content of associated pages, if you require it using the parameters.
At the same time, it tries to find out, for example, the DNS settings of this page, or analyzes the supported SSL/TLS protocols that encrypt communication.
From the point of view of the load - in the default setting, crawling runs in a very considerate way, which should not cause a problematic load. By default, the crawler will not do more than 3 requests at the same time and no more than 10 requests per second. Even the most basic web hosting can easily handle such a load.
The crawler also reads and respects the /robots.txt
file and does not crawl URLs that are prohibited from crawling.
How can I prevent SiteOne Crawler from crawling my website?
Section titled โHow can I prevent SiteOne Crawler from crawling my website?โIf you want to deny crawling of the whole website, just add these 2 lines to your robots.txt
:
User-agent: SiteOne-CrawlerDisallow: /
In addition, SiteOne Crawler also respects all URLs defined with Disallow:
that are defined for User-agent: *
What are the key features of this tool?
Section titled โWhat are the key features of this tool?โFor the best descriptive summary, we recommend reading the Key Features section.
What are the known limitations of this tool?
Section titled โWhat are the known limitations of this tool?โSiteOne Crawler works great on Linux and macOS. On Windows, due to the use of Cygwin, it may happen that the tool cannot process the entire website or all functionalities. For maximum performance and stability, we recommend using WSL (Windows Subsystem for Linux).
SiteOne Crawler does not interpret JavaScript, so pages designed solely as SPA (Single-Page Applications) without functional SSR (Server-Sider Rendering) may not work properly. In the future, we are considering adding the option for advanced users to interpret JavaScript through Chromium/Puppeteer running in Docker.
What are the future plans for this tool?
Section titled โWhat are the future plans for this tool?โI describe thoughts and considerations for further development of the tool in the section Ideas and Roadmap.
How can I contribute to this tool?
Section titled โHow can I contribute to this tool?โYou can find more information about the possibility of contributing in the section Contribution and Development.
How can I report a bug or request a new feature?
Section titled โHow can I report a bug or request a new feature?โYou can report bugs or request new features in the Issues in our GitHub repository. You can also use other communication channels described in the section Contact and Community.
How can I contact the author?
Section titled โHow can I contact the author?โYou can write to me on X (Twitter) or e-mail me at jan.reges@siteone.cz. Other communication channels are mentioned in the section Contact and Community.