Offline Website Generator (clone, mirror)

Features

will help you export the entire website to offline form, where it is possible to browse the site through local HTML files (without HTTP server) including all document, images, styles, scripts, fonts, etc.
you can limit what assets you want to download and export (see --disable-* directives) .. for some types of websites the best result is with the --disable-javascript option.
you can specify by --allowed-domain-for-external-files from which external domains it is possible to download assets (JS, CSS, fonts, images, documents) including * option for all domains.
you can specify by --allowed-domain-for-crawling which other domains should be included in the crawling if there are any links pointing to them. You can enable e.g. mysite.* to export all language mutations that have a different TLD or *.mysite.tld to export all subdomains.
you can try --disable-styles and --disable-fonts and see how well you handle accessibility and semantics
you can use it to export your website to a static form and host it on GitHub Pages, Netlify, Vercel, etc. as a static backup and part of your disaster recovery plan or archival/legal needs
works great with older conventional websites but also modern ones, built on frameworks like Next.js, Nuxt.js, SvelteKit, Astro, Gatsby, etc. When a JS framework is detected, the export also performs some framework-specific code modifications for optimal results. For example, most frameworks can’t handle the relative location of a project and linking assets from root /, which doesn’t work with file:// mode.
try it for your website, and you will be very pleasantly surprised :-)
roadmap: we are also planning to release a version of the export compatible with Nginx that will preserve all original URLs for your website and allow you to host it on your own infrastructure.

💡Further development ideas

In the future we would like to extend support for some specific JS frameworks. In the case of conventional websites, where only the backend generates HTML, this works reliably. Some modern JS frameworks, even if they have SSR (server-side rendering), still modify the HTML in various ways after the page is displayed - replacing links in it, etc.

SiteOne Crawler can handle a lot of these situations today (e.g. in the form of path substitution in JS code, or generating URL prefixes to composite JS chunk paths), but it’s not perfect.

The ideal would be to generate these pages using headless browsers - but this would make the application stack much more complex and the whole process much slower. Now processing one HTML takes a millisecond, then it would be seconds.

Maybe it will make sense to us in the future… but we are not sure yet. If you have any ideas or suggestions, please let us know through the GitHub issue.