Core Concepts

If you're new to the Intoli Smart Proxy service or proxies in general, then there are a few general ideas that will be helpful to familiarize yourself with before you dig into the rest of the documentation. We recommend at least skimming through this material to make sure that you understand some of the basic concepts behind residential proxies, browser rendering, and scraping projects before moving on to the service features and integrations.

Residential Proxies

A proxy is a service which allows you to make HTTP requests through another machine so that the requests appear to be coming from a different IP address. Proxies are widely used for web scraping, ad verification, SEO monitoring, and price monitoring because they provide an easy way to create realistic traffic which is difficult to detect as automated. Without a proxy, all of the requests coming from a single server would be identifiable as coming from that machine, and that makes them very easy to detect and block.

There are two types of proxies which are commonly used to avoid blocking and create realistic traffic patterns:

  • Datacenter Proxies - These proxies use IP addresses associated with servers in a datacenter, like the IPs associated with servers from hosting providers such as Digital Ocean or Amazon Web Services. The primary advantage of datacenter IP addresses are that they're inexpensive, but they're also very easy to detect and are commonly blocked. Each IP address is associated with the company who owns the IP block, and datacenter proxies often consist of contiguous IP subnets which are very easy to detect. Datacenter IPs are appropriate for some use cases, but they run a high risk of being blacklisted and blocked by websites.
  • Residential Proxies - These proxies provide IP addresses from residential Internet Service Providers (ISPs). Real users allow their IP addresses to be used for web scraping in residential proxy networks, and this means that the IP addresses are indistinguishable from those of real users. Residential proxies tend to be more expensive because the users need to be compensated, but they're also the most effective type of proxy and the hardest to detect or block.

The Intoli Smart Proxy uses only the highest quality residential IP addresses in order to prevent blocking and cloaking.

Browser Rendering

As bot mitigation services like Distil, Incapsula, and Perimeter X have become more common and sophisticated, it has grown increasingly important to render web page requests in full browsers. These services perform browser fingerprinting via JavaScript that runs in a page, and then only load the true page content after a backend analysis of the fingerprint. Making requests from an incongruent browser, or making repeated requests with the same browser fingerprint, will result in instant blocking.

Frameworks like Puppeteer and Selenium can be used to launch and control browsers, but they leave automation signatures that are easy to detect. For example, the window.navigator.webdriver returns true when either of these libraries is used. Bot mitigation services not only check this property, but perform hundreds of other fingerprinting tests ranging from font detection to canvas fingerprinting. Configuring browsers to use realistic fingerprint overrides is very difficult, and even a small mistake or omission can result in blocking.

Intoli's Smart Proxy service makes it easy to render requests in full browsers that are preconfigured with realistic browser fingerprints to avoid detection. You can simply enable the browser rendering option for one of your scraping projects, and all requests will automatically be rendered in a real browser with a randomized fingerprint before a response is returned to you. This allows you to use any standard HTTP library or scraping framework while taking full advantage of Intoli's advanced browser fingerprint overrides.

The browser rendering option is also compatible with sessions. These allow you to make a sequence of requests that will all use the same browser fingerprints, and which will store cookies and local storage persistently. Browser rendering sessions allow you to gain all the benefits of using an automation framework like Puppeteer without needing to manage your own infrastructure or develop your own browser fingerprint overrides.

Just be sure to only enable browser rendering if you aren't also running your own copy of Puppeteer or Selenium locally. Doing so will result in all of the various JavaScript, CSS, and image files on every page you visit each being rendered in their own remote browser instance! This will make things load much more slowly than they normally would, and can also result in unnecessary data usage.

Projects

The Intoli Smart Proxy service allows you to configure an unlimited number of projects within your account. Projects are simply an abstraction which help you to organize and maintain different settings and access credentials for your proxy account. Each project can be given its own name, has its own access credentials, and can be configured with its own settings.

Projects are a great way to quickly switch between configurations in your code, manage credentials and data limits, and to monitor analytics separately between your different scraping projects. You'll find a lot more information about configuring individual projects in the Configuration Overview.

Sessions

While web scraping, it's often important to maintain persistent sessions that visit a number of pages in sequence. Cookies are required on many sites in order to manage login credentials, or to track the state of a web application between pages. They're also heavily used by bot-mitigation services to record browser fingerprint information that is used for blocking. If the browser cookies, IP address, and browser fingerprint don't all line up consistently, then it makes it easy to get blocked.

The Intoli Smart Proxy service offers support for persistent sessions that mitigate all of these problems. You can append a hyphen followed by an arbitrary session identifier to your public key in any request, and then subsequent requests with the same session identifier will use the same IP address. For example, you could add session10 as a session identifier to a public key of <publicKey> to create a combined public key of <publicKey>-session10.

Using sessions in conjunction with browser rendering makes them even more powerful. Browser rendering sessions allow you to reuse cookies, local storage, IP addresses, and full browser fingerprints between requests. All you need to do is enable browser rendering in your project and use a session identifier when you authenticate with the proxy; everything else is handled automatically!