Skip to content

Contribution and Development

SiteOne Crawler is an open-source project written in Rust that welcomes contributions from the community. Whether you’re fixing bugs, adding features, improving documentation, or suggesting ideas, your help is valuable in making the tool better for everyone.

  1. Install Rust (1.94 or later) via rustup:

    Terminal window
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

    The repository pins the exact build toolchain in rust-toolchain.toml (currently 1.96.0, with the rustfmt and clippy components), so rustup will automatically fetch and use that toolchain when you build inside the project directory. This keeps local builds and CI on the same compiler and lint set.

  2. Fork and clone the repository:

    Terminal window
    git clone https://github.com/YOUR_USERNAME/siteone-crawler.git
    cd siteone-crawler
  3. Build the project:

    Terminal window
    cargo build # debug build (fast compile, slower runtime)
    cargo build --release # optimized release build

    The compiled binary is at ./target/debug/siteone-crawler or ./target/release/siteone-crawler.

For more on build targets (including a lean build without browser rendering and a static musl binary), see Build from Source.

The project uses Rust’s built-in test framework. Most source files include unit tests in a #[cfg(test)] mod tests block, and there are offline integration tests as well.

Terminal window
# Run all unit tests and offline integration tests
cargo test

Network-dependent integration tests live in tests/integration_crawl.rs and are marked #[ignore] by default, so the default cargo test stays fast and offline. To run them explicitly (they crawl a live site):

Terminal window
cargo test --test integration_crawl -- --ignored --test-threads=1

Add or update tests that cover your changes before submitting a pull request.

Format your code and run the linter before committing:

Terminal window
# Format with rustfmt (edition 2024; see rustfmt.toml — max_width = 120)
cargo fmt
# Lint with clippy (uses the pinned 1.96 toolchain from rust-toolchain.toml)
cargo clippy

CI treats clippy warnings as errors (cargo clippy -- -D warnings), so make sure your changes are clippy-clean. The pinned toolchain ensures the same lint set runs locally and in CI, avoiding “passes locally, fails in CI” drift.

The Rust source lives under src/, organized into focused modules:

ModuleResponsibility
engine/Crawl lifecycle — the initiator (CLI parsing → manager), the crawl manager, fetcher, URL queue, and orchestration.
analysis/Analyzers (SEO, security, accessibility, best practices, caching, DNS, SSL/TLS, redirects, 404s, headers, fastest/slowest URLs, external links, and more).
content_processor/Content-type processors that extract URLs and rewrite HTML/CSS/JS/XML (including Astro, Next.js, and Svelte framework handling).
export/Exporters: HTML report, offline website clone, markdown, sitemap, file output, SMTP mailer, and upload.
ai/Optional LLM features — providers, per-page actions, and the executive summary.
browser/Chromium rendering over the Chrome DevTools Protocol (only compiled with the browser Cargo feature).
scoring/Quality score (0.0–10.0) across five categories and the --ci quality gate.
wizard/Interactive no-arguments wizard with preset crawl modes.
options/CLI argument parsing, config-file handling, and option validation.
output/Console (text), JSON, and text-file output.
result/Result storage (in-memory or file-backed) and crawl status/statistics.

Other top-level items include components/ (shared UI building blocks such as the table renderer), server.rs (the built-in HTTP server for browsing exports), and main.rs/lib.rs (entry points).

For a deeper walkthrough of how analyzers, exporters, and content processors are defined and registered, see Extending.

The crawler has one optional Cargo feature:

  • browser — enables browser-rendering mode (--browser), screenshots, and console/JS/network diagnostics via the chromiumoxide (CDP) engine. It is on by default (default = ["browser"] in Cargo.toml), so the default build and the pre-built binaries already include it — it adds only the ~6 MB CDP client, not a browser. For a lean binary without it, build with cargo build --release --no-default-features.
  1. Create a branch for your feature or bugfix:

    Terminal window
    git checkout -b feature/my-feature # for features
    git checkout -b fix/issue-123 # for bugfixes
  2. Make your changes, following the existing code style. Run cargo fmt and cargo clippy and keep both clean.

  3. Add or update tests, then run cargo test.

  4. Use conventional commits for clear, machine-readable history, for example:

    feat: add custom analyzer for hreflang tags
    fix: correct redirect chain detection for 308 responses
    docs: document the browser-rendering build
  5. Keep your fork up to date with the upstream master branch before opening a PR:

    Terminal window
    git remote add upstream https://github.com/janreges/siteone-crawler.git
    git fetch upstream
    git rebase upstream/master
  6. Open a pull request against the repository’s master branch. Include a clear description of what your change does and why, and reference any related issues. Maintainers will review your code; address any feedback, and once approved your PR will be merged.

Report bugs and request features on GitHub Issues.

When reporting a bug, please include:

  • Your operating system and the crawler version (siteone-crawler --version).
  • The exact command-line options you used.
  • The error message or unexpected behavior, with example output if possible.
  • Steps to reproduce.

For feature requests, describe the problem you want to solve, a possible approach, and any alternatives you considered.