Contribution and Development
SiteOne Crawler is an open-source project written in Rust that welcomes contributions from the community. Whether you’re fixing bugs, adding features, improving documentation, or suggesting ideas, your help is valuable in making the tool better for everyone.
Development Environment Setup
Section titled “Development Environment Setup”-
Install Rust (1.94 or later) via rustup:
Terminal window curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shThe repository pins the exact build toolchain in
rust-toolchain.toml(currently1.96.0, with therustfmtandclippycomponents), sorustupwill automatically fetch and use that toolchain when you build inside the project directory. This keeps local builds and CI on the same compiler and lint set. -
Fork and clone the repository:
Terminal window git clone https://github.com/YOUR_USERNAME/siteone-crawler.gitcd siteone-crawler -
Build the project:
Terminal window cargo build # debug build (fast compile, slower runtime)cargo build --release # optimized release buildThe compiled binary is at
./target/debug/siteone-crawleror./target/release/siteone-crawler.
For more on build targets (including a lean build without browser rendering and a static musl binary), see Build from Source.
Testing
Section titled “Testing”The project uses Rust’s built-in test framework. Most source files include unit tests in a #[cfg(test)] mod tests block, and there are offline integration tests as well.
# Run all unit tests and offline integration testscargo testNetwork-dependent integration tests live in tests/integration_crawl.rs and are marked #[ignore] by default, so the default cargo test stays fast and offline. To run them explicitly (they crawl a live site):
cargo test --test integration_crawl -- --ignored --test-threads=1Add or update tests that cover your changes before submitting a pull request.
Code Quality
Section titled “Code Quality”Format your code and run the linter before committing:
# Format with rustfmt (edition 2024; see rustfmt.toml — max_width = 120)cargo fmt
# Lint with clippy (uses the pinned 1.96 toolchain from rust-toolchain.toml)cargo clippyCI treats clippy warnings as errors (cargo clippy -- -D warnings), so make sure your changes are clippy-clean. The pinned toolchain ensures the same lint set runs locally and in CI, avoiding “passes locally, fails in CI” drift.
Project Structure
Section titled “Project Structure”The Rust source lives under src/, organized into focused modules:
| Module | Responsibility |
|---|---|
engine/ | Crawl lifecycle — the initiator (CLI parsing → manager), the crawl manager, fetcher, URL queue, and orchestration. |
analysis/ | Analyzers (SEO, security, accessibility, best practices, caching, DNS, SSL/TLS, redirects, 404s, headers, fastest/slowest URLs, external links, and more). |
content_processor/ | Content-type processors that extract URLs and rewrite HTML/CSS/JS/XML (including Astro, Next.js, and Svelte framework handling). |
export/ | Exporters: HTML report, offline website clone, markdown, sitemap, file output, SMTP mailer, and upload. |
ai/ | Optional LLM features — providers, per-page actions, and the executive summary. |
browser/ | Chromium rendering over the Chrome DevTools Protocol (only compiled with the browser Cargo feature). |
scoring/ | Quality score (0.0–10.0) across five categories and the --ci quality gate. |
wizard/ | Interactive no-arguments wizard with preset crawl modes. |
options/ | CLI argument parsing, config-file handling, and option validation. |
output/ | Console (text), JSON, and text-file output. |
result/ | Result storage (in-memory or file-backed) and crawl status/statistics. |
Other top-level items include components/ (shared UI building blocks such as the table renderer), server.rs (the built-in HTTP server for browsing exports), and main.rs/lib.rs (entry points).
For a deeper walkthrough of how analyzers, exporters, and content processors are defined and registered, see Extending.
Optional Features
Section titled “Optional Features”The crawler has one optional Cargo feature:
browser— enables browser-rendering mode (--browser), screenshots, and console/JS/network diagnostics via the chromiumoxide (CDP) engine. It is on by default (default = ["browser"]inCargo.toml), so the default build and the pre-built binaries already include it — it adds only the ~6 MB CDP client, not a browser. For a lean binary without it, build withcargo build --release --no-default-features.
Contribution Workflow
Section titled “Contribution Workflow”-
Create a branch for your feature or bugfix:
Terminal window git checkout -b feature/my-feature # for featuresgit checkout -b fix/issue-123 # for bugfixes -
Make your changes, following the existing code style. Run
cargo fmtandcargo clippyand keep both clean. -
Add or update tests, then run
cargo test. -
Use conventional commits for clear, machine-readable history, for example:
feat: add custom analyzer for hreflang tagsfix: correct redirect chain detection for 308 responsesdocs: document the browser-rendering build -
Keep your fork up to date with the upstream
masterbranch before opening a PR:Terminal window git remote add upstream https://github.com/janreges/siteone-crawler.gitgit fetch upstreamgit rebase upstream/master -
Open a pull request against the repository’s
masterbranch. Include a clear description of what your change does and why, and reference any related issues. Maintainers will review your code; address any feedback, and once approved your PR will be merged.
Reporting Issues
Section titled “Reporting Issues”Report bugs and request features on GitHub Issues.
When reporting a bug, please include:
- Your operating system and the crawler version (
siteone-crawler --version). - The exact command-line options you used.
- The error message or unexpected behavior, with example output if possible.
- Steps to reproduce.
For feature requests, describe the problem you want to solve, a possible approach, and any alternatives you considered.