There’s a lot to love about CircleCI. First of all, continuous integration is just awesome in general. You can certainly develop fine software without it, but a good CI configuration can really make your life easier. Beyond that, CircleCI has a generous free tier, provides four free containers per open source project, allows the use of custom Docker images, and is reasonably easy to configure. There’s unfortunately also some stuff not to love about CircleCI.
Have a great idea for an article?
We're always looking for guest contributors or article suggestions. Shoot us an email at firstname.lastname@example.org, we would love to hear yours!
A few months back, I wrote a popular article called Making Chrome Headless Undetectable in response to one called Detecting Chrome Headless by Antione Vastel. The one thing that I was really trying to get across in writing that is that blocking site visitors based on browser fingerprinting is an extremely user-hostile practice. There are simply so many variations in browser configurations that you’re inevitably going to end up blocking non-automated access to your website, and–on top of that–you’re really not accomplishing anything in terms of blocking sophisticated web scrapers.
The easiest way to install the latest Chrome version on RHEL, CentOS, and Amazon Linux versions 6.X and 7.X. # This installs Chrome on any RHEL/CentOS/Amazon Linux variant. curl https://intoli.com/install-google-chrome.sh | bash A Universal Installation Script for Google Chrome on Amazon Linux and CentOS 6 CentOS, Amazon Linux AMI, and Red Hat Enterprise Linux are three closely related GNU/Linux distributions which are all popular choices for server installations. They offer excellent performance and stability, but package availability can often be lacking.
Detecting Headles Chrome A short article titled Detecting Chrome Headless popped up on Hacker News over the weekend and it has since been making the rounds. Most of the discussion on Hacker News was focused around the author’s somewhat dubious assertion that web scraping is a “malicious task” that belongs in the same category as advertising fraud and hacking websites. That’s always a fun debate to get into, but the thing that I really took issue with about the article was that it implicitly promoted the idea of blocking users based on browser fingerprinting.
A Bug on Linux? Why, I never! I’ve been using GNU/Linux for about fifteen years and, I’ve got to admit, it used to be pretty rough around the edges (to put it lightly). A lot can change over fifteen years though; most of the things that were once major problem areas haven’t required a second thought in years. Laptop suspension, WIFI, advanced function keys, sound, and pretty much everything else all typically “just work” these days, and this has been the case for quite a while.
Update: This article is regularly updated in order to accurately reflect improvements in Firefox’s headless browsing capabilities. Note: Check out Running Selenium with Healdess Chrome if you’d rather use Google’s browser. Using Selenium with Headless Firefox (on Windows) Ever since Chrome implemented headless browsing support back in April, the other major browsers started following suit. In particular, Mozilla has since then expanded support for Firefox’s headless mode from Linux to its Windows and macOS builds, and fixed a number of bugs that might have been in the way of early adopters.
UPDATE: This article is updated regularly to reflect the latest information and versions. If you’re looking for instructions then skip ahead to see Setup Instructions. NOTE: Be sure to check out Running Selenium with Headless Chrome in Ruby if you’re interested in using Selenium in Ruby instead of Python. Background It has long been rumored that Google uses a headless variant of Chrome for their web crawls. Over the last two years or so it had started looking more and more like this functionality would eventually make it into the public releases and, as of this week, that has finally happened.
Nightmare is a popular browser automation library specifically designed with ease of use in mind. A typical Nightmare script chains together semantically named user actions like goto and click to perform any given task, resulting in simple and readable code. These actions of course include a few methods for waiting on the page to fully load: you can wait for a selector to become available, for all static resources to load, or simply wait for a fixed amount of time.
The WebExtensions API In 2015, Mozilla announced that they would be deprecating XPCOM and XUL based addons in favor of their new WebExtensions API based on the Google Chrome Extension API. There were some vocal critics of this shift because it meant that some existing add-ons would be discontinued, but this was tremendously positive news for add-on and extension developers. Writing cross-browser extensions had previously been an absolutely miserable experience, and many developers understandably chose to only target Chrome due to its market share and relatively pleasant API.
Running Google Chrome with an extension installed is quite simple because Chrome supports a --load-extension command-line argument for exactly this purpose. This can be specified before launching Chrome with Selenium by creating a ChromeOptions instance and calling add_argument(). from selenium import webdriver from selenium.common.exceptions import NoSuchElementException # Configure the necessary command-line option. options = webdriver.ChromeOptions() options.add_argument('--load-extension=path/to/the/extension') # Initalize the driver with the appropriate options. driver = webdriver.Chrome(chrome_options=options) The above code will setup a Selenium driver for Chrome with the extension located at path/to/extension preinstalled.