Scraping and Parsing Sitemaps in Bash
A wise man once said that sitemaps are the window into a website’s soul, and I’m not inclined to disagree. Without a sitemap, a website is just a labyrinthian web of links between pages. It’s certainly possible to scrape sites by crawling those links, but things become much easier with a sitemap that lays out a site’s content in clear and simple terms. Sites which provide sitemaps are quite literally asking to be scraped; it’s a direct indication that the site operators intend for bots to visit the pages listed in the sitemaps.