Python

Intoli Smart Proxies

Want to use the smartest web scraping proxies available?

Get started now and find out why Intoli is the best in the business!

Running Selenium with Headless Firefox

Update: This article is regularly updated in order to accurately reflect improvements in Firefox’s headless browsing capabilities. Note: Check out Running Selenium with Healdess Chrome if you’d rather use Google’s browser. Using Selenium with Headless Firefox (on Windows) Ever since Chrome implemented headless browsing support back in April, the other major browsers started following suit. In particular, Mozilla has since then expanded support for Firefox’s headless mode from Linux to its Windows and macOS builds, and fixed a number of bugs that might have been in the way of early adopters.

Continue reading

Running Selenium with Headless Chrome

UPDATE: This article is updated regularly to reflect the latest information and versions. If you’re looking for instructions then skip ahead to see Setup Instructions. NOTE: Be sure to check out Running Selenium with Headless Chrome in Ruby if you’re interested in using Selenium in Ruby instead of Python. Background It has long been rumored that Google uses a headless variant of Chrome for their web crawls. Over the last two years or so it had started looking more and more like this functionality would eventually make it into the public releases and, as of this week, that has finally happened.

Continue reading

Why Python's for-else Clause Makes Perfect Sense, but You Still Shouldn't Use It

An interesting (and somewhat obscure) feature of Python is being able to attach an else block to a loop. The basic idea is that the code in the else block runs only if the loop completes without encountering a break statement. Here’s a trivial example in the form of a password guessing game: for i in range(3): password = input('Enter password: ') if password == 'secret': print('You guessed the password!

Continue reading

Using Firefox WebExtensions with Selenium

The WebExtensions API In 2015, Mozilla announced that they would be deprecating XPCOM and XUL based addons in favor of their new WebExtensions API based on the Google Chrome Extension API. There were some vocal critics of this shift because it meant that some existing add-ons would be discontinued, but this was tremendously positive news for add-on and extension developers. Writing cross-browser extensions had previously been an absolutely miserable experience, and many developers understandably chose to only target Chrome due to its market share and relatively pleasant API.

Continue reading

Using Google Chrome Extensions with Selenium

Running Google Chrome with an extension installed is quite simple because Chrome supports a --load-extension command-line argument for exactly this purpose. This can be specified before launching Chrome with Selenium by creating a ChromeOptions instance and calling add_argument(). from selenium import webdriver from selenium.common.exceptions import NoSuchElementException # Configure the necessary command-line option. options = webdriver.ChromeOptions() options.add_argument('--load-extension=path/to/the/extension') # Initalize the driver with the appropriate options. driver = webdriver.Chrome(chrome_options=options) The above code will setup a Selenium driver for Chrome with the extension located at path/to/extension preinstalled.

Continue reading

How to Clear the Firefox Browser Cache With Selenium WebDriver/geckodriver

If you use Selenium for automated testing or web scraping, you may have discovered that there is no built-in utility for clearing browser resources like cookies, cached scripts, and objects in local storage. This is not particularly surprising given that the WebDriver specification that Selenium uses behind the scenes has no provision for clearing the cache. However, lingering cached resources can cause your tests to pass when they shouldn’t, prevent your scrapers from quickly starting clean sessions on demand, and cause all sorts of undesirable behavior besides.

Continue reading

How to Run a Keras Model in the Browser with Keras.js

This article explains how to export a pre-trained Keras model written in Python and use it in the browser with Keras.js. The main difficulty lies in choosing compatible versions of the packages involved and preparing the data, so I’ve prepared a fully worked out example that goes from training the model to performing a prediction in the browser. You can find the working end-result in Intoli’s article materials repository, but do read on if you’d like just the highlights.

Continue reading

Resizing Matplotlib Legend Markers

How to Resize Matplotlib Legend Markers I frequently find myself plotting clusters of points in Matplotlib with relatively small marker sizes. This is a useful way to visualize the data, but the plot’s legend will use the same marker sizes by default and it can be quite difficult to discern the color of a single point in isolation. Let’s plot a few random clusters of points to see what this problem looks like in practice.

Continue reading

How to Clear the Chrome Browser Cache With Selenium WebDriver/ChromeDriver

Sometimes during the course of testing or web scraping with Google Chrome, you might desire to clear the browser cache and cookies with Selenium. You can of course call driver.close() on your current ChromeDriver instance and then provision a new one. The fresh instance of Chrome will start with a clean browser history, cookies, and cache. There are however times when this method loses other state that you may want to preserve.

Continue reading