Simulate a tablet and crawl only the first 100 URLs
Internal password-protected web behind the proxy
SEO oriented analysis and output (ignore assets)
Stress test with 10 workers and 100 reqs/sec
Option --add-random-query-params is used to bypass the cache.
Option --analyzer-filter-regex='/nothing/i' will skip all analysis, save time, resources and output.
Analysis and export of a large website ~ 1 mio URLs
Generate an offline version of the website
Option --offline-export-dir=tmp/astro.build will activate export mode and save the website to the ./tmp/astro.build directory.
Option --allowed-domain-for-external-files='*' will ensure that all external JavaScripts, styles, fonts, avatar images from GitHub, or any external files from other domains to which the HTML URL points are also downloaded for offline use.
Option --allowed-domain-for-crawling='*.astro.build' will ensure that only URLs from the initial domain astro.build and all its subdomains will be crawled.
Generate sitemaps for the website
You can find the sitemap files in ./tmp/sitemap.xml and ./tmp/sitemap.txt.