Skip to content

Configuration file

Instead of passing every option on the command line, you can store options in a configuration file and load them with --config-file. This is handy for shared team settings, repeatable runs, and keeping long commands readable.

Use --config-file=<path> to load options from a specific file:

Terminal window
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/

If --config-file is not provided, the crawler auto-discovers a configuration file in this order and uses the first one that exists:

  1. ~/.siteone-crawler.conf (the current userโ€™s home directory)
  2. /etc/siteone-crawler.conf (system-wide)

If neither exists, the crawler simply runs with the options you passed on the command line.

The format is intentionally simple โ€” one option per line, in the same --option=value form you would use on the command line:

  • One option per line (e.g. --workers=5 or a boolean flag like --no-cache).
  • Blank lines are ignored.
  • Lines starting with # are treated as comments and ignored.
  • Leading and trailing whitespace on each line is trimmed.
  • A leading UTF-8 BOM (common on files saved on Windows) is stripped automatically.

Example configuration file:

# Shared crawler configuration for our team
# Concurrency and rate limiting
--workers=5
--max-reqs-per-sec=20
# Output
--output=json
# Reuse this comment line - it is ignored
--max-depth=3

Options from the configuration file are merged before the arguments you pass on the command line, so CLI arguments always win. This lets you keep stable defaults in the file and override individual options per run.

For example, with the team-crawler.conf shown above:

Terminal window
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/ --workers=2

The crawl uses --workers=2 (from the command line), while --max-reqs-per-sec=20, --output=json and --max-depth=3 come from the file.

A team can keep a single team-crawler.conf in version control:

# team-crawler.conf
--workers=5
--max-reqs-per-sec=20
--max-depth=3
--output=json
--output-html-report=tmp/%domain%.report.%datetime%.html
--hide-progress-bar

Run it against any site:

Terminal window
./siteone-crawler --config-file=./team-crawler.conf --url=https://mydomain.tld/

This is equivalent to the following full command line:

Terminal window
./siteone-crawler \
--url=https://mydomain.tld/ \
--workers=5 \
--max-reqs-per-sec=20 \
--max-depth=3 \
--output=json \
--output-html-report=tmp/%domain%.report.%datetime%.html \
--hide-progress-bar

See the full command-line options reference for every option you can place in a configuration file.