One Million robots.txt Files The idea for this article actually started as a joke. We do a lot of web scraping here at Intoli and we deal with robots.txt files, overzealous ip bans, and all that jazz on a daily basis. A while back, I was running into some issues with a site that had a robots.txt file which was completely inconsistent with their banning policies, and I suggested that we should do an article on analyzing robots.
Have a great idea for an article?
We're always looking for guest contributors or article suggestions. Shoot us an email at email@example.com, we would love to hear yours!
Introduction Over the years, I’ve collected a handful of videos that I deeply enjoy and that have had a significant impact on me as a developer. These are videos that I love introducing people to and I’m happy to have the chance to share them with you here. I find them all inspirational in their own ways and they serve as a continuous reminder for me to keep an open mind and to take creative approaches to problems.
UPDATE: This article is updated regularly to reflect the latest information and versions. If you’re looking for instructions then skip ahead to see Setup Instructions. NOTE: Be sure to check out Running Selenium with Headless Chrome in Ruby if you’re interested in using Selenium in Ruby instead of Python. Background It has long been rumored that Google uses a headless variant of Chrome for their web crawls. Over the last two years or so it had started looking more and more like this functionality would eventually make it into the public releases and, as of this week, that has finally happened.