Running Selenium with Headless Firefox

By Andre Perunicic | June 22, 2017
Follow @prncc

Using Selenium with Headless Firefox (on Windows)

Selenium uses the WebDriver protocol to remotely control browsers. Chrome, Firefox, Safari, and other major browsers already work with this API, and support for headless browsing has been improving ever since Google implemented it in Chrome back in April. As of today, Firefox expanded its headless mode from Linux to its nightly Windows builds. macOS support is lagging behind a bit, but should be coming pretty soon too.

This is obviously pretty cool from the automated testing and and web scraping perspectives, so in this article I will describe how to connect Selenium WebDriver to Firefox’s new headless mode. I will primarily explain things for users of Windows, but you should be able to follow along on other operating systems with some minor modifications.

Setup

Let’s get started by installing all the requirements. First, download and install Firefox Nightly from Mozilla’s website. You will also need geckodriver, the layer used for connecting Selenium and Firefox, which has download links included on its GitHub releases page. Once downloaded, extract the package and place it somewhere in your Path. For example, if you place geckodriver.exe into C:\bin\ you can ensure it is in your user’s Path by running

[Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\bin\", "User")

from the powershell. While you’re at it, make sure that python is also in your Path:

[Environment]::SetEnvironmentVariable("Path", "$env:Path;C:\Python27\;C:\Python27\Scripts\", "User")

Aleternatively, you can perform these steps through the GUI by searching for “Path” from the start menu and navigating through the “Edit environment variables for your account” settings panel.

The required binaries should now be visible, so start a command prompt with cmd and install virtualenv:

pip install virtualenv

Then, create a new project and install selenium

mkdir selenium-firefox
cd selenium-firefox

virtualenv env
env\Scripts\activate
pip install selenium

The final step before actually using selenium for driving headless Firefox is to start geckodriver in the background. Open a new cmd and run

geckodriver

Connecting Selenium to Headless Firefox

Before we perform any tests, let’s make sure we can connect to headless Firefox in the first place. Create a new script called test-intoli.py. The first task is to instruct selenium to use the correct Firefox binary in headless mode. Since we are using the nightly build, that can be done as follows:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver

binary = FirefoxBinary('C:\\Program Files\\Nightly\\firefox.exe', log_file=sys.stdout)
driver = webdriver.Firefox(firefox_binary=binary)

To make sure this works, let’s visit the Email Spy: A New Open Source Browser Extension for Lead Generation article page on this website and pull out text from its heading:

# Visit a website.
driver.get("https://intoli.com/blog/email-spy/")

# Grab the heading element.
heading_element = driver.find_element_by_xpath('//*[@id="heading-breadcrumbs"]')

This gives us the element, so let’s extract and clean the Firefox-specific textContent property for the text:

if heading_element:
    print(heading_element.get_property('textContent').strip())
else:
    print("Heading element not found!")

Before running, close up the connection:

driver.close()

Finally, test out the connection by running the script. Typically we would run Firefox in headless mode through something like

binary.add_command_line_options('-headless')

but that still seems to have no effect. Instead, we need to set the MOZ_HEADLESS environment variable before executing the python script.

set MOZ_HEADLESS=1
python test-intoli.com

After a while we’ll see that the script ran successfully:

Email Spy: A new open source browser extension for lead generation

One curiosity is that omitting the log_file parameter in FirefoxBinary can get Firefox stuck! If you have floating Firefox processes, you can kill them all easily with

taskkill /im firefox.exe /f

Driving a Unit Test with Selenium and Headless Firefox

Let’s round out the tutorial by creating a non-trivial unit test for this very site. In particular, we will test the mailing list subscription box that is shown to first-time readers of our blog.

Mailling list signup

We try to be respectful towards our readers by only displaying this advertisment once. That is, if you sign up for the mailing list or dismiss the invitation box, you shouldn’t see it again on any other post. To make sure that all of this works as expected, our test will execute the following steps:

  1. Scroll far enough down a blog post for the subscription box to show up and try to grab it from the page.
  2. Visit a different article and make sure that the ad is not displayed at all this time.

We’ll be making use of the standard library’s unittest module to actually implement these steps. Start by moving creating a placeholder unittest.TestCase:

import unittest

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

class MailingListTest(unittest.TestCase):
    def setUp(self):
        binary = FirefoxBinary('C:\\Program Files\\Nightly\\firefox.exe', log_file=sys.stdout)
        self.driver = webdriver.Firefox(firefox_binary=binary)

    def test_two_visits(self):
        self.fail("There is nothing here!")

    def tearDown(self):
        self.driver.close()

if __name__ == '__main__':
    unittest.main()

You can run this test with the following command at any time, thanks to the last two lines in the file.

python test-intoli.py

The setUp and tearDown methods are self-explanatory, and deal with managing the connection to the browser. The meat should be within the test_two_visits method so let’s build it up. All the code below should live within that method. Start by giving self.driver a convenient local reference.

driver = self.driver

Then, clear cookies to ensure a clean start and then head over to the Email Spy post

driver.delete_all_cookies()
driver.get("https://intoli.com/blog/email-spy/")

and scroll down 80% of body height using JavaScript:

driver.execute_script("window.scrollTo(0, document.body.scrollHeight*0.8);")

At this point we start polling our element until it’s found for at most 10 seconds. This is done by instructing the driver to “implicitly wait.” If the element is not obtained after 10 seconds the driver raises an exception and the test fails.

driver.implicitly_wait(10)
try:
    driver.find_element_by_id("PopupSignupForm_0")
except:
    self.fail("Could not find element for 10s the first time. :(")
else:
    print("Found element the first time! :)")

Running the test at this point should show that we grabbed the element successfully. Next we visit a different post (you can actually visit the same one again) and only fail when there is no exception from get_element_by_id:

driver.get("https://intoli.com/blog/running-selenium-with-headless-chrome/")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight*0.8);")
try:
    driver.find_element_by_id("PopupSignupForm_0")
except:
    print("Did *not* find element the second time! :)")
else:
    self.fail("Found element the second time! :(")

Running the test with python test-intoli.py finally produces the desired result:

> python test-intoli.py
INFO:MailingListTest.test_two_visits:Found element the first time! :)
INFO:MailingListTest.test_two_visits:Did *not* find element the second time! :)
.
----------------------------------------------------------------------
Ran 1 test in 26.063s

OK

Check out the complete example script.

This short guide on getting started using Selenium and headless Firefox just scratches the surface of what’s possible. Stay tuned for other cool content from our blog and if you need us to do advanced website testing or headless scraping don’t hesitate to get in touch.