Website to Markdown Converter
The SiteOne Crawler can export or convert an entire website with all subpages to browsable markdown. This is particularly useful for feeding website content (like documentation) into AI tools that often handle markdown more effectively than raw HTML.
Features
Section titled “Features”- Exports the entire website with all subpages to browsable markdown.
- Optionally includes images and other files (PDF, etc.).
- Allows removing unwanted elements from the exported markdown using CSS selectors.
- Can move content before the main H1 heading to the end of the markdown.
- Implements code block detection and syntax highlighting.
- Converts HTML tables to markdown tables.
- Can combine all exported markdown files into a single large markdown file.
- Includes smart removal of duplicate website headers and footers in the combined single markdown file.
Command-line Options
Section titled “Command-line Options”Parameter | Description | Default |
---|---|---|
--markdown-export-dir | Path to directory where to save the markdown version of the website. Directory will be created if it doesn’t exist. | |
--markdown-export-single-file | Path to a file where to save the combined markdown files into one document. Requires --markdown-export-dir to be set. Ideal for AI tools that need to process the entire website content in one go. | |
--markdown-move-content-before-h1-to-end | Move all content before the main H1 heading (typically the header with the menu) to the end of the markdown. | |
--markdown-disable-images | Do not export and show images in markdown files. Images are enabled by default. | |
--markdown-disable-files | Do not export and link files other than HTML/CSS/JS/fonts/images - eg. PDF, ZIP, etc. These files are enabled by default. | |
--markdown-remove-links-and-images-from-single-file | Remove links and images from the combined single markdown file. Useful for AI tools that don’t need these elements. Requires --markdown-export-single-file to be set. | |
--markdown-exclude-selector | Exclude some page content (DOM elements) from markdown export defined by CSS selectors like ‘header’, ‘.header’, ‘#header’, etc. Can be specified multiple times. | |
--markdown-replace-content | Replace text content with foo -> bar or regexp in PREG format: /card[0-9]/i -> card . | |
--markdown-replace-query-string | Instead of using a short hash instead of a query string in the filename, just replace some characters. You can use simple format ‘foo -> bar’ or regexp in PREG format, e.g. /([a-z]+)=([^&]*)(&|$)/i -> $1__$2 . | |
--markdown-export-store-only-url-regex | For debug - when filled it will activate debug mode and store only URLs which match one of these PCRE regexes. Can be specified multiple times. | |
--markdown-ignore-store-file-error | Ignores any file storing errors. The export process will continue. |
💡Further development ideas
Section titled “💡Further development ideas”If you have ideas how to improve the Website to Markdown Converter, don’t be afraid to send a feature request (to desktop application, or to command-line interface) with a suggestion for improvement. We are happy to consider and implement it if it will benefit more users.