WebExtensions are a frequently underappreciated tool for the purposes of web scraping and browser automation. They provide an easy way to access an extremely powerful API that’s cross browser compatible out of the box, and that API provides functionality that extends far beyond that of more specialized automation APIs like the Chrome DevTools Protocol or Firefox’s Marionnette. For example, the WebExtensions API provides a mechanism for containerizing individual tabs–Selenium and Puppeteer can’t do that!
Intoli Smart Proxies
Want to use the smartest web scraping proxies available?
Get started now and find out why Intoli is the best in the business!
Browser automation frameworks–like Puppeteer, Selenium, Marionette, and Nightmare.js–strive to provide rich APIs for configuring and interacting with web browsers. These generally work quite well, but you’re inevitably going to end up running into API limitations if you do a lot of testing or web scraping. You might find yourself wanting to conceal the fact that you’re using a headless browser, extract image resources from a web page, set the seed for Math.
Update: This article is regularly updated in order to accurately reflect improvements in Firefox’s headless browsing capabilities. Note: Check out Running Selenium with Healdess Chrome if you’d rather use Google’s browser. Using Selenium with Headless Firefox (on Windows) Ever since Chrome implemented headless browsing support back in April, the other major browsers started following suit. In particular, Mozilla has since then expanded support for Firefox’s headless mode from Linux to its Windows and macOS builds, and fixed a number of bugs that might have been in the way of early adopters.
The WebExtensions API In 2015, Mozilla announced that they would be deprecating XPCOM and XUL based addons in favor of their new WebExtensions API based on the Google Chrome Extension API. There were some vocal critics of this shift because it meant that some existing add-ons would be discontinued, but this was tremendously positive news for add-on and extension developers. Writing cross-browser extensions had previously been an absolutely miserable experience, and many developers understandably chose to only target Chrome due to its market share and relatively pleasant API.
If you use Selenium for automated testing or web scraping, you may have discovered that there is no built-in utility for clearing browser resources like cookies, cached scripts, and objects in local storage. This is not particularly surprising given that the WebDriver specification that Selenium uses behind the scenes has no provision for clearing the cache. However, lingering cached resources can cause your tests to pass when they shouldn’t, prevent your scrapers from quickly starting clean sessions on demand, and cause all sorts of undesirable behavior besides.