Skip to content

Website to Markdown Converter

The SiteOne Crawler can export or convert an entire website with all subpages to browsable markdown. This is particularly useful for feeding website content (like documentation) into AI tools that often handle markdown more effectively than raw HTML.

  • Exports the entire website with all subpages to browsable markdown.
  • Optionally includes images and other files (PDF, etc.).
  • Allows removing unwanted elements from the exported markdown using CSS selectors.
  • Can move content before the main H1 heading to the end of the markdown.
  • Implements code block detection and syntax highlighting.
  • Converts HTML tables to markdown tables.
  • Can combine all exported markdown files into a single large markdown file.
  • Includes smart removal of duplicate website headers and footers in the combined single markdown file.
ParameterDescriptionDefault
--markdown-export-dirPath to directory where to save the markdown version of the website. Directory will be created if it doesn’t exist.
--markdown-export-single-filePath to a file where to save the combined markdown files into one document. Requires --markdown-export-dir to be set. Ideal for AI tools that need to process the entire website content in one go.
--markdown-move-content-before-h1-to-endMove all content before the main H1 heading (typically the header with the menu) to the end of the markdown.
--markdown-disable-imagesDo not export and show images in markdown files. Images are enabled by default.
--markdown-disable-filesDo not export and link files other than HTML/CSS/JS/fonts/images - eg. PDF, ZIP, etc. These files are enabled by default.
--markdown-remove-links-and-images-from-single-fileRemove links and images from the combined single markdown file. Useful for AI tools that don’t need these elements. Requires --markdown-export-single-file to be set.
--markdown-exclude-selectorExclude some page content (DOM elements) from markdown export defined by CSS selectors like ‘header’, ‘.header’, ‘#header’, etc. Can be specified multiple times.
--markdown-replace-contentReplace text content with foo -> bar or regexp in PREG format: /card[0-9]/i -> card.
--markdown-replace-query-stringInstead of using a short hash instead of a query string in the filename, just replace some characters. You can use simple format ‘foo -> bar’ or regexp in PREG format, e.g. /([a-z]+)=([^&]*)(&|$)/i -> $1__$2.
--markdown-export-store-only-url-regexFor debug - when filled it will activate debug mode and store only URLs which match one of these PCRE regexes. Can be specified multiple times.
--markdown-ignore-store-file-errorIgnores any file storing errors. The export process will continue.

If you have ideas how to improve the Website to Markdown Converter, don’t be afraid to send a feature request (to desktop application, or to command-line interface) with a suggestion for improvement. We are happy to consider and implement it if it will benefit more users.