System Requirements
Hardware Requirements
CPU
The crawler can handle even one core of any common Intel/AMD CPU of the last 10 years. ARM CPUs are also supported.
Memory (RAM)
Depending on the size of your web page, the crawler may require hundreds of MB or units of GB of memory.
You can override the default memory limit of 2048M
by using the --memory-limit
option.
Disk
The crawler stores the following files on disk during operation:
tmp/*.[html|json|txt]
- directly to this folder are saved html/json/txt reports, by default with domain and timestamp in the name. These files are usually only units of MB.tmp/http-client-cache/
- cache for HTTP responses of all crawled files. If you also crawl all assets and whole your website has 1GB, this folder will also have ~ 1GB allocated. You can disable HTTP cache by using--http-cache-dir=''
. You can delete the contents of this folder at any time. However, if the crawler is started repeatedly, all content will have to be downloaded again.tmp/result-cache/
- cache files of internal crawler results. This folder is used by default (you can override it by--result-storage-dir
), when you use--result-storage=file
. Default is--result-storage=memory
, which does not use disk at all. You can delete the contents of this folder at any time.
For optimal performance, we recommend using an SSD or NVME disk.
Network/Internet
The crawler needs to be able to access the website you want to crawl (if you are not crawling only developer localhost).
If you want to crawl all assets, the crawler needs to be able to download all assets of the website.
The speed of the crawler also depends on the speed and latency of your internet. By default, Crawler supports brotli/gzip compression for the most efficient data transfer.
Software Requirements
Crawler has no non-standard requirements and works on common Linux (x64, arm64), macOS (x64, arm64) and Windows (x64). Cygwin is used on Windows and its stability is not optimal, therefore it is recommended to use WSL (Windows Subsystem for Linux).
The easiest is to use ready-to-use pre-built relelases, and more advanced users can use manual installation.