Navigator

The window.navigator property in a browser is one of the most widely used for browser fingerprinting. It implements the Navigator interface and often includes over 40 different properties which convey detailed information about the system's operating system and hardware, the browser type and version, various plugins and settings, the browser's capabilities, and even the physical location of the machine that's running the browser. These various pieces of information can all be gathered as part of the browser fingerprinting process, and even minor inconsistencies can trigger bot-detection systems.

In this section, we'll explore some of the most commonly collected pieces of information from window.navigator. There are too many window.navigator properties to cover in detail here, so we'll focus on examples of how inconsistent navigator properties are detected using simplified code from the popular MIT-licensed fingerprintjs2 browser fingerprinting library. This isn't the most sophisticated library out there by a long shot, but it's actually widely used in production.

This particular library was chosen because the simplicity of its techniques makes it a good introduction to understanding the importance of consistent window.navigator fingerprints. More advanced bot-detection systems consider the entirety of the browser fingerprint at once and trigger on any statistical anomalies. These fingerprinting libraries are also typically closed-source and heavily obfuscated when you see them in the wild.

Automation

The navigator.webdriver property is less about inconsistent fingerprints, and more just a dead giveaway that a browser is being automated by a bot. The property gets it's name from the WebDriver protocol behind Selenium, but it will be set to true regardless of whether you're using Selenium, Puppeteer, or any other browser automation framework. This is the most direct indication that a browser is being controlled by a bot, and it's checked by virtually all bot-mitigation and fingerprinting libraries.

Browser

There are a variety of navigator properties which make details about the browser version accessible to any web page running JavaScript. The most well-known of these is probably navigator.userAgent which corresponds to the same value as the User-Agent header that each browser includes on requests to identify itself. Here's an example of what a common user agent string might look like.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0

This instantly reveals that the browser is purporting to be Firefox 63 running on a 64 bit Windows 10 machine. Changing the user agent is typically one of the first things that people do while web scraping, but it will be clear that you've done this if you don't also update the different navigator properties that are expected to be consistent with this.

Say that you were using the above user agent to present yourself as using Firefox when you were actually running headless Chrome. Here are couple of the main navigator properties that would be inconsistent with the user agent.

  • navigator.productSub - Indicates the build number of the browser, and would incorrectly return 20030107 instead of 20100101 when using Chrome.
  • navigator.vendor - Indicates the company that produces the browser, and would incorrectly return Google Inc. instead of an empty string when using Chrome.

There's a method in fingerprintjs2 called getHasLiedBrowser() which checks navigator.userAgent and navigator.productSub for consistency. A stripped down version of the code might look something like this.

var getHasLiedBrowser = function () {
  var userAgent = navigator.userAgent.toLowerCase();
  var productSub = navigator.productSub;

  var browser
  if (userAgent.indexOf('firefox') >= 0) {
    browser = 'Firefox';
  } else if (userAgent.indexOf('chrome') >= 0) {
    browser = 'Chrome';
  } else {
    browser = 'Other';
  }

  if (browser === 'Chrome' && productSub !== '20030107') {
    return true;
  }

  if (browser === 'Firefox' && productSub === '20030107') {
    return true;
  }

  return false;
}

This first checks the user agent to determine whether a browser reports to be Chrome, Firefox, or something else. It then determines that browser is lying about its user agent if it either claims to be 1) a Chrome browser with a navigator.productSub value other than 20030107, or 2) a Firefox browser with a navigator.productSub value equal to 20030107. The full getHasLiedBrowser() method actually checks things in much more detail than this, but the simplified version makes it clear how something as simple as changing a user agent can actually make you more likely to get detected as a bot if you don't have a sophisticated set of browser overrides in place.

Languages

There are two closely related navigator properties which indicate information about a user's language preferences. These are navigator.language which indicates a user's preferred language, and navigator.languages which indicates an array of languages sorted by decreasing user preference. The first entry in navigator.languages should match the value of navigator.language, and the getHasLiedLanguages() method of fingerprintjs2 checks exactly that.

var getHasLiedLanguages = function () {
  // We check if navigator.language is equal to the first language of navigator.languages
  var firstLanguages = navigator.languages[0];
  if (firstLanguages !== navigator.language) {
    return true;
  }
  return false;
}

This code simply compares the first language listed in the navigator.languages array to the language specified in navigator.language, and then determines that the browser is "lying" if they don't match. For example, a navigator.languages array of [en-US, zh-CN] would be considered consistent with a navigator.language preference of en-US but not one of zh-CN.

It might not be immediately obvious why you would want to change the language preferences at all, but it's often important to specify a language that's consistent with the content that you're scraping as well as the proxy IP addresses that you're using. Several bot-mitigation services even use an additional navigator.geolocation property to find the location of a device, and then correlate this with both the language settings and the IP address. If any of these seem out of the ordinary, then a request will be seen as suspicious and potentially blocked.

Operating System

Web scraping infrastructures most commonly run on Linux, but the vast majority of desktop users run either Windows or MacOS. In practice, it's important to use browser fingerprints from all three operating systems in order to create a realistic mix of traffic. Changing the navigator.userAgent property is the first step in this direction because it can be used to infer the operating system, but it's also important that other navigator properties paint a consistent picture.

After navigator.userAgent, the navigator property that is most relevant to a browser's host operating system is navigator.platform. This property returns values like Linux x86_64, MacIntel, and Win32 which can indicate information about both the operating system and the CPU. Similar information can be retrieved from navigator.oscpu. This property encodes the operating system and CPU in slightly differently formatted strings like Windows NT 10.0; Win64; x64, Linux x86_64, and Intel Mac OS X 10.9.

The fingerprintjs2 library does have a getLiedOs() function in analogy to some of the ones that we've look at above, but the implementation is quite long and difficult to simplify. Instead, let's take a look at just some of the properties that we would need to modify in order to create a realistic Windows fingerprint.

First, we need need to change navigator.userAgent because it reveals the browser, the browser version, the operating system, and whether the operating system is 64 bits or 32 bits. Here's the same user agent that we talked about earlier that corresponds to Firefox 63.0 on a 64 bit version of Windows 10.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0

We saw before that this requires changing navigator.productSub to 20030107 and navigator.vendor to an empty string in order to appear as a Firefox build. To match the Windows and CPU information, we'll also need to set navigator.oscpu to Windows NT 10.0; Win64; x64 and navigator.platform to Win32. Note that Win32 isn't a typo here either, it's actually the navigator.platform value that a 64-bit Windows build should have in Firefox.