Configuration file
Instead of passing every option on the command line, you can store options in a configuration file and load them with --config-file. This is handy for shared team settings, repeatable runs, and keeping long commands readable.
Loading a configuration file
Section titled โLoading a configuration fileโUse --config-file=<path> to load options from a specific file:
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/If --config-file is not provided, the crawler auto-discovers a configuration file in this order and uses the first one that exists:
~/.siteone-crawler.conf(the current userโs home directory)/etc/siteone-crawler.conf(system-wide)
If neither exists, the crawler simply runs with the options you passed on the command line.
File format
Section titled โFile formatโThe format is intentionally simple โ one option per line, in the same --option=value form you would use on the command line:
- One option per line (e.g.
--workers=5or a boolean flag like--no-cache). - Blank lines are ignored.
- Lines starting with
#are treated as comments and ignored. - Leading and trailing whitespace on each line is trimmed.
- A leading UTF-8 BOM (common on files saved on Windows) is stripped automatically.
Example configuration file:
# Shared crawler configuration for our team
# Concurrency and rate limiting--workers=5--max-reqs-per-sec=20
# Output--output=json
# Reuse this comment line - it is ignored--max-depth=3Precedence: CLI overrides the file
Section titled โPrecedence: CLI overrides the fileโOptions from the configuration file are merged before the arguments you pass on the command line, so CLI arguments always win. This lets you keep stable defaults in the file and override individual options per run.
For example, with the team-crawler.conf shown above:
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/ --workers=2The crawl uses --workers=2 (from the command line), while --max-reqs-per-sec=20, --output=json and --max-depth=3 come from the file.
Example: shared team config
Section titled โExample: shared team configโA team can keep a single team-crawler.conf in version control:
# team-crawler.conf--workers=5--max-reqs-per-sec=20--max-depth=3--output=json--output-html-report=tmp/%domain%.report.%datetime%.html--hide-progress-barRun it against any site:
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/This is equivalent to the following full command line:
./siteone-crawler \ --url=https://mydomain.tld/ \ --workers=5 \ --max-reqs-per-sec=20 \ --max-depth=3 \ --output=json \ --output-html-report=tmp/%domain%.report.%datetime%.html \ --hide-progress-barSee the full command-line options reference for every option you can place in a configuration file.