How To

Intoli Smart Proxies

Want to use the smartest web scraping proxies available?

Get started now and find out why Intoli is the best in the business!

How to Create a Public Slack Community with Open Invites

We recently created a public Slack community dedicated to web scraping in order to provide a general forum for people to discuss topics related to browser automation, headless browsers, scraping frameworks, data pipelining, or anything else along those lines. We wanted it to be open to anyone who wanted to join, but Slack unfortunately doesn’t really provide any sort of open-access Slack communities or channels. If you want to make your Slack community open to anybody, then your options are to either send invitations to anyone who expresses interest, or to generate shared invite URLs which expire after four weeks.

Continue reading

Implementing a Custom Waiting Action in Nightmare JS

Nightmare is a popular browser automation library specifically designed with ease of use in mind. A typical Nightmare script chains together semantically named user actions like goto and click to perform any given task, resulting in simple and readable code. These actions of course include a few methods for waiting on the page to fully load: you can wait for a selector to become available, for all static resources to load, or simply wait for a fixed amount of time.

Continue reading

Using Firefox WebExtensions with Selenium

The WebExtensions API In 2015, Mozilla announced that they would be deprecating XPCOM and XUL based addons in favor of their new WebExtensions API based on the Google Chrome Extension API. There were some vocal critics of this shift because it meant that some existing add-ons would be discontinued, but this was tremendously positive news for add-on and extension developers. Writing cross-browser extensions had previously been an absolutely miserable experience, and many developers understandably chose to only target Chrome due to its market share and relatively pleasant API.

Continue reading

Using Google Chrome Extensions with Selenium

Running Google Chrome with an extension installed is quite simple because Chrome supports a --load-extension command-line argument for exactly this purpose. This can be specified before launching Chrome with Selenium by creating a ChromeOptions instance and calling add_argument(). from selenium import webdriver from selenium.common.exceptions import NoSuchElementException # Configure the necessary command-line option. options = webdriver.ChromeOptions() options.add_argument('--load-extension=path/to/the/extension') # Initalize the driver with the appropriate options. driver = webdriver.Chrome(chrome_options=options) The above code will setup a Selenium driver for Chrome with the extension located at path/to/extension preinstalled.

Continue reading

Running Selenium with Headless Chrome in Ruby

NOTE: Be sure to check out Running Selenium with Headless Chrome if you’re interested in using Selenium in Python instead of Ruby. Since Google added support to run Chrome and Chromium in headless mode as of version 59, it has become a popular choice for both testing and web scraping. There are a few Chrome-specific automation solutions out there, such as Puppeteer and Chrome Remote Interface, but Selenium remains a popular choice due to it’s uniform API across web browsers and it’s support for multiple programming languages.

Continue reading

Using Webpack to Render Markdown in React Apps

This article is a tutorial explaining how to set up your Webpack configuration for rendering and displaying Markdown documents in React components. Something like this could come in handy if you’re building a home-made static blog engine, or if you’re hoping to easily include some good-looking documentation in a frontend application. Since I tend to write a lot of code blocks in my Markdown documents, a good chunk of the tutorial will be focused on making them look good.

Continue reading

How to Clear the Firefox Browser Cache With Selenium WebDriver/geckodriver

If you use Selenium for automated testing or web scraping, you may have discovered that there is no built-in utility for clearing browser resources like cookies, cached scripts, and objects in local storage. This is not particularly surprising given that the WebDriver specification that Selenium uses behind the scenes has no provision for clearing the cache. However, lingering cached resources can cause your tests to pass when they shouldn’t, prevent your scrapers from quickly starting clean sessions on demand, and cause all sorts of undesirable behavior besides.

Continue reading

How to Run a Keras Model in the Browser with Keras.js

This article explains how to export a pre-trained Keras model written in Python and use it in the browser with Keras.js. The main difficulty lies in choosing compatible versions of the packages involved and preparing the data, so I’ve prepared a fully worked out example that goes from training the model to performing a prediction in the browser. You can find the working end-result in Intoli’s article materials repository, but do read on if you’d like just the highlights.

Continue reading

How to Exit When Errors Occur in Bash Scripts

It’s a common issue that scripts written and tested on GNU/Linux don’t run correctly on macOS–or vice versa–because of differences between the GNU and BSD versions of the core utils. Error messages can get drowned in the script output, making it far from obvious that something isn’t executing correctly. There are a couple of easy fixes to avoid problems like this, but they rely on some bash features that you may not be familiar with if you don’t do a ton of scripting.

Continue reading

Resizing Matplotlib Legend Markers

How to Resize Matplotlib Legend Markers I frequently find myself plotting clusters of points in Matplotlib with relatively small marker sizes. This is a useful way to visualize the data, but the plot’s legend will use the same marker sizes by default and it can be quite difficult to discern the color of a single point in isolation. Let’s plot a few random clusters of points to see what this problem looks like in practice.

Continue reading