How to Clear the Firefox Browser Cache With Selenium WebDriver/geckodriver

By Andre Perunicic | January 27, 2018

If you use Selenium for automated testing or web scraping, you may have discovered that there is no built-in utility for clearing browser resources like cookies, cached scripts, and objects in local storage. This is not particularly surprising given that the WebDriver specification that Selenium uses behind the scenes has no provision for clearing the cache. However, lingering cached resources can cause your tests to pass when they shouldn’t, prevent your scrapers from quickly starting clean sessions on demand, and cause all sorts of undesirable behavior besides. Fortunately, there’s still a way out! In this article I’ll describe how to clear the Firefox browser cache with Selenium. The code will be written in Python, but you should be able to adapt it to other languages without much difficulty.

If all you’re interested in is the end-result, take a look at the finished utility in Intoli’s article code repository. The rest of the article will describe what this utility actually does. The technique is quite similar to the one used to clear the Chrome browser cache, also published on our blog, so head on over there if you prefer using Chrome with Selenium.

Clearing the Cache

Update: These instructions have been updated to work with Firefox 61, released on June 26, 2018. The cache clearing process is quite similar for older versions of Firefox, and the slightly different cache clearing script written for Firefox 57 is still available in our article materials repository.

We’ll clear the cache by emulating how a human would accomplish the task: by visiting Firefox’s preferences page and going through the appropriate UI interactions. The options we care about are in the Browser Privacy section of the Preferences page which you can access by visiting about:preferences#privacy in Firefox. Let’s first review the steps that the cache clearing script will need to perform, and then describe how to actually go about implementing them with Selenium.

Our script first needs to click the “Clear Data…” button which is highlighted in red in the screenshot below.

Preferences Page on Firefox 61

This will open an in-page dialog listing the types of data to be cleared. Since both “Cookies and Site Data” and Cached Web Content" are preselected in this dialog, all our script has to do here is to click the “Clear” button.

Clear Data Dialog

Clicking the “Clear” button will launch an alert, which the script will have to accept in order to successfully clear the cache, cookies, and site data.

Clear Data Alert

Writing the Cache Clearing Script

Now that we know what needs to be done, let’s examine the page in detail and figure out how to execute each step. The Preferences page is coded in Mozilla’s XML-based XUL interface-building language. Fortunately, XUL is similar enough to HTML that both Selenium and JavaScript can interact with elements on the page. I’ll first show you the completed script, and then talk through it piece by piece:

from selenium.webdriver.common.alert import Alert
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait


dialog_selector = '#dialogOverlay-0 > groupbox:nth-child(1) > browser:nth-child(2)'

accept_dialog_script = (
    f"const browser = document.querySelector('{dialog_selector}');" +
    "browser.contentDocument.documentElement.querySelector('#clearButton').click();"
)


def get_clear_site_data_button(driver):
    return driver.find_element_by_css_selector('#clearSiteDataButton')


def get_clear_site_data_dialog(driver):
    return driver.find_element_by_css_selector(dialog_selector)


def get_clear_site_data_confirmation_button(driver):
    return driver.find_element_by_css_selector('#clearButton')


def clear_firefox_cache(driver, timeout=10):
    driver.get('about:preferences#privacy')
    wait = WebDriverWait(driver, timeout)

    # Click the "Clear Data..." button under "Cookies and Site Data".
    wait.until(get_clear_site_data_button)
    get_clear_site_data_button(driver).click()

    # Accept the "Clear Data" dialog by clicking on the "Clear" button.
    wait.until(get_clear_site_data_dialog)
    driver.execute_script(accept_dialog_script)

    # Accept the confirmation alert.
    wait.until(EC.alert_is_present())
    alert = Alert(driver)
    alert.accept()

First up, let’s talk about the “Clear Data” button that we need to start with. While right-clicks are not available and we cannot examine the target buttons directly, we can pull up Firefox’s developer tools and locate the elements we need to interact with in the Inspector tab. Searching for “Clear Data” in the Inspector reveals that the first button we need to click is identified by id #clearSiteDataButton.

Clear Cache Button Source

Clicking a button in Selenium is pretty easy. In the script above, I use the get_clear_site_data_button() function to find the button on the page, wait for it to become available with wait.until(), and then click it with .click(). Using a dedicated function like this is convenient because it allows us to ensure that the target element is available before attempting to interact with it.

Next, we want to accept the resulting modal dialog. Click the “Clear Data” button to open the dialog, and then jump into the Inspector once again (I searched for “Clearing all cookies” from the dialog’s description to find it). You’ll see that the button has #clearButton as its id. However, the button clicking code I just described won’t work in this case. You can also verify this by evaluating

document.querySelector('#clearButton')

in the development tools’ console, and seeing that it results in null.

This happens because the modal dialog is nested inside an iframe-like XUL browser element. The fact that it’s not actually an iframe seems to prevent Selenium from interacting with it properly, but fortunately we can get to the button pretty easily from JavaScript. Although we could stuff more of the script into JavaScript, separating the steps lets use waits consistently, and can help diagnose at which point the script fails if the interface changes in the future.

To write the script, right click on the browser element in the Inspector and get its selector from there:

Getting browser’s Selector

Then, navigate to the nested document (similarly to how it’d be done for an iframe), and click the button by evaluating the following bit of JavaScript with driver.execute_script().

const browser = document.querySelector(
  '#dialogOverlay-0 > groupbox:nth-child(1) > browser:nth-child(2)'
);
browser.contentDocument.documentElement.querySelector('#clearButton').click();

Finally, we dismiss the alert using standard Selenimum utilties for the task.

Test-Driving the Utility

Let’s verify that this works as expected by visiting a site which uses a liberal amount of caching, and then clearing the cache via the method above. The script in this section visits overstock.com, goes to Firefox’s Browser Privacy page, then clears the cache and waits 10 seconds before quitting.

Installing Selenium and geckodriver

To run this script, you need to have Selenium v3.14.0 or above installed, as well as the appropriate version of geckodriver. You can install geckodriver by downloading the binaries from their releases page, or if your’re on Linux, using your distribution’s package manager.

To install Selenium, you can just use pip. I like to work in a virtualenv, which you can create and activate with

mkdir clear-firefox-cache
cd clear-firefox-cache
virtualenv env
. env/bin/activate

Install Selenium with pip install selenium in the virtualenv. If you’re working globably and have Selenium installed, you can upgrade it with pip install -U --user selenium.

First, place the completed script from the last section into a file, say clear_cache.py. Then, create a script named evaluate-clear-cache.py with the following contents.

from time import sleep
from selenium import webdriver
from clear_cache import clear_firefox_cache

# Start a firefox driver (make sure that geckodriver is running first)
driver = webdriver.Firefox()

# Visit a website that places data in local storage.
driver.get('https://overstock.com')

# Stay on the Privacy page for a few seconds to verify that the cache is not empty.
driver.get('about:preferences#privacy')
sleep(5)

# Clear the cache and hang around some more before quitting.
clear_cache(driver)
sleep(5)

driver.quit()

You can then run the script with

python evaluate-clear-cache.py

A Selenium-driven Firefox window will pop up, and you should be able to see that there are no cached resources from the interface.

Turning Off the Cache Completely

As a side note, note that it’s possible to completely turn off caching from the get-go by providing Selenium with a customized Firefox profile. You can get the list of relevant preferences by browsing about:config in the version of Firefox you intend to use. Then, you can manually construct a profile with the desired customizations.

from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium import webdriver

profile = FirefoxProfile()
profile.set_preference('browser.cache.disk.enable', False)
profile.set_preference('browser.cache.memory.enable', False)
profile.set_preference('browser.cache.offline.enable', False)
profile.set_preference('network.cookie.cookieBehavior', 2)

driver = webdriver.Firefox(firefox_profile=profile)

Using 2 for the value of network.cookie.cookieBehavior is equivalent to never accepting cookies.

Conclusion

This post described how to clear the Firefox cache and cookies with Selenium. On the way, we learned how to accomplish a few other tasks: click buttons, dismiss alerts, and deal with XUL’s browser elements when Selenium’s driver.switch_to_iframe won’t do the trick. Here at Intoli, we specialize in browse automation and web scarping, and write about these topics on our blog. If you like our posts, consider subscribing to our mailing list.

Suggested Articles

If you enjoyed this article, then you might also enjoy these related ones.

Breaking Out of the Chrome/WebExtension Sandbox

By Evan Sangaline
on September 14, 2018

A short guide to breaking out of the WebExtension content script sandbox.

Read more

Recreating Python's Slice Syntax in JavaScript Using ES6 Proxies

By Evan Sangaline
on June 28, 2018

A gentle introduction to JavaScript proxies where we use them to recreate Python's extended slice syntax.

Read more

Building a YouTube MP3 Downloader with Exodus, FFmpeg, and AWS Lambda

By Evan Sangaline
on May 21, 2018

A short guide to building a practical YouTube MP3 downloader bookmarklet using Amazon Lambda.

Read more

Comments