Configuration Overview

This page provides a detailed explanation of the various configuration options found on each project's Settings tab. Understanding project settings is a great way to learn about the capabilities of the Intoli Smart Proxy service in detail. For a higher-level overview of projects, see the Intoli Smart Proxy Service overview page.

General

Settings in this section allow you configure your project's access credentials, usage restrictions, and retry rules. They apply to all requests issued through the project, whether or not the Browser Rendering is enabled. To persist any change to your project's settings, press the "Save" button on the bottom of the settings form.

Project Name

project-name.png

Your project's name is a descriptive name to help you differentiate between your projects more easily. You can set this value to whatever you want, but including the name of the site you're scraping will allow you to more easily monitor your data usage.

If you're scraping multiple sites per project, then naming projects after important configuration parameters can help you more easily select the appropriate project to use. For example, naming a project with browser rendering enabled "Browser Rendering Project" would naturally suggest that you should use it with sites which only work when accessed in a full browser environment.

Public Key

public-key.png

Your project's public key uniquely identifies your project to the proxy server. Intoli Smart Proxy supports Basic HTTP authentication, and the public key is used in the username when authenticating your request.

If you do not wish to use sessions—in which case every one of your requests will be assigned a new IP address—then the public key forms the entirety of your proxy URL's username:

http://<public key>:<private key>@proxy.intoli.com

Sessions allow you to retain the same IP address for up to 10 minutes. If you do wish to use sessions, then your project's proxy URL should include both the public key and a random session identifier separated by a - character in the proxy URL's username:

http://<public key>-<session identifier>:<private key>@proxy.intoli.com

The public key is set automatically when a project is created and it can't be changed afterwards. To copy the public key into your clipboard, press the copy button next to the input.

Private Key

private-key.png

Your project's private key serves as the password when using Basic HTTP authentication to connect to the proxy. You should store your private key securely, and regenerate it if it's made public. To do so, press the refresh icon next to the private key's input box, and then Save project settings. To copy the public key into your clipboard, press the copy button next to the refresh button.

Require Private Key

require-private-key.png

The Require Private Key setting allows you to omit the password from your project's proxy URL, and rely exclusively on IP whitelisting to control access to the proxy. If this setting is disabled, you must include at least one IP in the Whitelisted IPs input field. Without a password, your proxy URL takes the following form:

http://<public key>@proxy.intoli.com

Whitelisted IPs

whitelisted-ips.png

This setting allows you to specify the IPs or CIDR blocks which are allowed to connect to the proxy. If any entries are present, all requests must be made through one of the whitelisted IPs. You must add at least one whitelisited IP to this input box if the Require Private key setting is disabled.

To add a new entry, type into the input box and press enter. To remove an entry, press the "x" icon next to it.

Data Limit

data-limit.png

This setting specifies the amount of data can be transferred using the project's credentials during the current billing cycle. It comes in handy if you wish to prevent unexpectedly high usage within a scraping project.

Note that the the account limit will take precedence when determining whether a request will be fulfilled. Press the "Clear" button next to the input to remove data usage restrictions from the project.

Retry Limit

retry-limit.png

This setting determines the number of times failing requests are retried before giving up. A request "fails" if we detect blocking or it returns an error status code. Press the "Clear" button next to the input field to disable request retries altogether.

Ignore TLS Errors

ignore-tls-errors.png

This setting allows the proxy to make requests to servers with invalid certificates. Normally, when you make a secure request to a site whose certificate is expired or otherwise invalid, the browser or networking library that you are using will issue an error. Some browsers display a prompt asking you whether to ignore such errors when they are encountered, and you can think of this setting as the equivalent of that browser prompt. When enabled, this setting will cause the proxy to return responses from domains with invalid certificates, instead of returning errors.

Note that this setting does not change how locally installed certificates are used when you make HTTPS requests. In order to support intelligent routing, request retries, and browser rendering, Intoli Smart Proxy needs to issue its own signed certificates when routing HTTPS requests. This requires you to either tell your request library to ignore certificate errors, or to specify that the Intoli Root CA Certificate should be trusted. Please refer to the Tools section of the documentation for instructions on how to accomplish this with different tools in various programming languages.

Browser Rendering

Enabled

enabled.png

This setting causes your requests to be issued from headless browsers configured specifically for web scraping and provisioned on Intoli's infrastructure. Each request issued through a project with browser rendering enabled will automatically initiate a session and be assigned a random browser fingerprint. Sessions are automatically assigned to your initial requests because modern web pages typically issue a large number of secondary requests for ancillary resources such as scripts, stylesheets, and images. By using a session, we are ensuring that all of those resources are loaded from the same IP.

You can indicate a session manually by appending -<session identifier> to the username portion of your proxy URL, as described in the documentation for the Public Key setting. In this case, your IP, browser fingerprint, and browser data like cookies and local storage will all be stored on Intoli's infrastructure, and then reused on subsequent requests arriving with the same session identifier.

Note that if you're using an automation library like Puppeteer or Selenium, then you will want to disable browser rendering and always use a manually created session. With Browser Rendering enabled, each of the ancillary requests issued from a webpage would first be processed in a browser on Intoli's infrastructure, then passed onto your own computer. This would cause your scraper to be unnecessarily slow, load the page you're trying to access incorrectly, or to fail in other unpredictable ways.

Device Type

device-type.png

This setting allows you to customize the type of browser fingerprint that is loaded into remote browsers processing processing your requests.

Wait Condition

wait-condition.png

This setting determines the point at which to return the response from the page that is processing your request. Note that each request will automatically time out if the page has not completed rendering in 30 seconds. The options are as follows.

  • DOMContentLoaded: wait until the DOMContentLoaded event is fired, which happens when the initial HTML document has been loaded and parsed, without waiting for ancillary resources like images and stylesheets to finish loading.
  • Load: wait until the load event is fired, which happens when the page and the ancillary resources it loads have finished loading.
  • Network Idle: wait until no requests are transferred via the browser for at least half a second.

Blocked Resources

blocked-resources.png

With this setting you can specify which requests to block while loading a page in the remotely running browser. Enabling any of these options will reduce data usage and speed up page load times, but increase your chances of being detected. The resources you can block are the following.

  • Media: images, videos, audio, and other types of media files.
  • Styles: CSS stylesheets, and fonts.
  • Scripts: JavaScript scripts. Checking this option may prevent you from scraping pages which are rendered using in-browser rendering libraries like React, Angular, or Vue.
  • XHR: any AJAX requests issued from the page.
  • WebSockets: any WebSocket traffic on the page.