Scraping and Parsing Sitemaps in Bash

A wise man once said that sitemaps are the window into a website’s soul, and I’m not inclined to disagree. Without a sitemap, a website is just a labyrinthian web of links between pages. It’s certainly possible to scrape sites by crawling those links, but things become much easier with a sitemap that lays out a site’s content in clear and simple terms. Sites which provide sitemaps are quite literally asking to be scraped; it’s a direct indication that the site operators intend for bots to visit the pages listed in the sitemaps.

How to Exit When Errors Occur in Bash Scripts

It’s a common issue that scripts written and tested on GNU/Linux don’t run correctly on macOS–or vice versa–because of differences between the GNU and BSD versions of the core utils. Error messages can get drowned in the script output, making it far from obvious that something isn’t executing correctly. There are a couple of easy fixes to avoid problems like this, but they rely on some bash features that you may not be familiar with if you don’t do a ton of scripting.

