Skip to content

Contribution and Development

SiteOne Crawler is an open-source project that welcomes contributions from the community. Whether you’re fixing bugs, adding features, improving documentation, or suggesting ideas, your help is valuable in making the tool better for everyone.

  1. Fork the repository: Fork the SiteOne Crawler repository on GitHub to your own account.

  2. Clone your fork:

    Terminal window
    git clone https://github.com/YOUR_USERNAME/siteone-crawler.git
    cd siteone-crawler
  3. Install dependencies: The project uses a minimal set of dependencies that can be installed using Composer:

    Terminal window
    composer install
  4. Build the project:

    Terminal window
    php build.php

The main components of the project are organized as follows:

  • /src: Main source code
    • /Crawler: Core classes for crawling
    • /Crawler/Analysis: Analyzers for different aspects of websites
    • /Crawler/ContentProcessor: Content type processors
    • /Crawler/Export: Export formats and generators
    • /Crawler/HttpClient: HTTP client implementation
    • /Crawler/Options: Command-line options handling
    • /Crawler/Result: Results storage and statistics
  • /tests: Unit and integration tests
  • /bin: Build scripts and executable outputs
  • /assets: Static assets for the project
  1. Create a branch: Create a new branch for your feature or bugfix:

    Terminal window
    git checkout -b feature/my-feature # for features
    git checkout -b fix/issue-123 # for bugfixes
  2. Make your changes: Implement your feature or fix, following the coding standards described below.

  3. Write/update tests: Add tests that cover your changes. The project uses PHPUnit for testing.

  4. Run tests: Ensure all tests pass before submitting a pull request:

    Terminal window
    ./vendor/bin/phpunit
  5. Build and validate: Rebuild the project to ensure your changes integrate correctly:

    Terminal window
    php build.php

SiteOne Crawler follows PSR-12 coding standards with some specific conventions:

  1. Code Style:

    • Use strict type declarations (declare(strict_types=1);)
    • Follow PSR-12 style conventions
    • Use meaningful variable and method names
    • Add PHPDoc comments for classes, methods, and properties
  2. Architecture:

    • Follow existing design patterns in the codebase
    • Use interfaces for extensibility
    • Implement dependency injection where appropriate
    • Maintain backward compatibility when possible
  3. Error Handling:

    • Use exceptions for exceptional conditions
    • Validate input parameters
    • Provide clear error messages
    • Add proper type hints
  1. Update your fork: Before submitting a pull request, make sure your fork is up to date with the main repository:

    Terminal window
    git remote add upstream https://github.com/janreges/siteone-crawler.git
    git fetch upstream
    git rebase upstream/master
  2. Submit your PR: Create a pull request against the main repository’s master branch.

  3. PR Description: Include a clear description of your changes, referencing any related issues. Explain what problem your PR solves and how it implements the solution.

  4. Review Process:

    • Maintainers will review your code
    • Address any feedback or requested changes
    • Once approved, your PR will be merged

The project’s documentation is located in multiple places:

  1. In-code documentation: PHPDoc comments within the source code
  2. README.md: Basic overview and quick start guide
  3. Documentation website: Comprehensive guides and references built with Astro and Starlight

When contributing to documentation:

  1. Be clear and concise: Use simple language and avoid jargon
  2. Include examples: Provide code examples for complex features
  3. Keep it up-to-date: Ensure documentation matches the current functionality
  4. Add screenshots: Include screenshots for UI-related features
  5. Follow Markdown conventions: Use Markdown formatting consistently

When reporting bugs:

  1. Check existing issues: Make sure the bug hasn’t already been reported
  2. Use the issue template: Follow the bug report template
  3. Provide details: Include OS, PHP version, command-line options, and steps to reproduce
  4. Add example output: Share the error message or unexpected behavior

For feature requests:

  1. Describe the problem: Explain what problem your feature would solve
  2. Suggest a solution: Provide a potential implementation approach
  3. Consider alternatives: Mention any alternative solutions you’ve considered

SiteOne Crawler is designed to handle large websites efficiently. Keep these considerations in mind:

  1. Memory usage: Minimize memory allocation, especially for large crawls
  2. CPU efficiency: Optimize algorithms for CPU-intensive operations
  3. Concurrency: Ensure thread safety in concurrent operations
  4. Scalability: Design components to scale with website size
  1. Input validation: Validate all user inputs
  2. Safe file operations: Use secure file handling practices
  3. Network security: Implement secure HTTP client practices
  4. Dependency management: Keep dependencies updated and secure

The project follows semantic versioning:

  • Major version: Incompatible API changes
  • Minor version: New features in a backward-compatible manner
  • Patch version: Backward-compatible bug fixes

The release process includes:

  1. Version bump: Update version numbers in code
  2. Changelog update: Document changes in CHANGELOG.md
  3. Build release artifacts: Generate binaries for all platforms
  4. Create release tag: Tag the release in Git
  5. Publish release: Create GitHub release with notes

Join the community discussion:

  • GitHub Issues: For bug reports and feature discussions
  • Pull Requests: For code contributions
  • Discussions: For general questions and ideas