Intoli Blog

Intoli Smart Proxies

Want to use the smartest web scraping proxies available?

Get started now and find out why Intoli is the best in the business!

The tech videos that have most impacted me as a developer

Introduction Over the years, I’ve collected a handful of videos that I deeply enjoy and that have had a significant impact on me as a developer. These are videos that I love introducing people to and I’m happy to have the chance to share them with you here. I find them all inspirational in their own ways and they serve as a continuous reminder for me to keep an open mind and to take creative approaches to problems.

Continue reading

Predicting Hacker News article success with neural networks and TensorFlow

Hacker News Title Tool Enter a potential title for a Hacker News submission below to see how likely it is to succeed or to be flagged dead. Once you play around a bit you can read on to learn how exactly these predictions are made. Background Submitting an article to Hacker News can be a little stressful if you’ve invested a lot of time in writing it. An article’s success really hinges upon getting the initial four or five votes that will push it on to the front page where it can reach a broader audience.

Continue reading

Email Spy: A new open source browser extension for lead generation

Introduction Lead generation is a top priority for most successful companies and helping businesses find potential clients is a big part of what we do here at Intoli. Today, we’re pleased to announce a new open source marketing tool that makes it possible to find contact emails for any web domain with a single click. It’s called Email Spy and you can get the source on GitHub or install it directly as a Chrome extension or a Firefox addon.

Continue reading

Running Selenium with Headless Chrome

UPDATE: This article is updated regularly to reflect the latest information and versions. If you’re looking for instructions then skip ahead to see Setup Instructions. NOTE: Be sure to check out Running Selenium with Headless Chrome in Ruby if you’re interested in using Selenium in Ruby instead of Python. Background It has long been rumored that Google uses a headless variant of Chrome for their web crawls. Over the last two years or so it had started looking more and more like this functionality would eventually make it into the public releases and, as of this week, that has finally happened.

Continue reading

Scraping and Parsing Sitemaps in Bash

A wise man once said that sitemaps are the window into a website’s soul, and I’m not inclined to disagree. Without a sitemap, a website is just a labyrinthian web of links between pages. It’s certainly possible to scrape sites by crawling those links, but things become much easier with a sitemap that lays out a site’s content in clear and simple terms. Sites which provide sitemaps are quite literally asking to be scraped; it’s a direct indication that the site operators intend for bots to visit the pages listed in the sitemaps.

Continue reading

A New Dark Pattern: Tricking Browsers into Making Repeated Notification Requests

I recently stumbled upon an advertisement that was so obnoxious that I felt instantly compelled to share it. The site requested my permission to send notifications using the browser’s Notification API. When I refused, it simply wouldn’t take no for an answer. As they say, a picture is worth a thousand words. A website typically only gets one chance to request notification permissions. If they get blocked than that’s it. This malicious site worked around this limitation by responding to failed notification requests with immediate redirects to a different subdomain that served up the same content.

Continue reading

Using Puppeteer to Scrape Websites with Infinite Scrolling

Infinite scrolling has become a ubiquitous design pattern on the web. Social media sites like Facebook, Twitter, and Instagram all feature infinitely scrolling feeds to keep users engaged with an essentially unbounded amount of content. Here’s what that looks like on Instagram, for example. This mechanism is typically implemented by using JavaScript to detect when the user has scrolled far enough down the existing feed, and then querying an underlying API endpoint for the next batch of data that gets processed and dynamically injected into the page.

Continue reading

Implementing a Custom Waiting Action in Nightmare JS

Nightmare is a popular browser automation library specifically designed with ease of use in mind. A typical Nightmare script chains together semantically named user actions like goto and click to perform any given task, resulting in simple and readable code. These actions of course include a few methods for waiting on the page to fully load: you can wait for a selector to become available, for all static resources to load, or simply wait for a fixed amount of time.

Continue reading

Using Firefox WebExtensions with Selenium

The WebExtensions API In 2015, Mozilla announced that they would be deprecating XPCOM and XUL based addons in favor of their new WebExtensions API based on the Google Chrome Extension API. There were some vocal critics of this shift because it meant that some existing add-ons would be discontinued, but this was tremendously positive news for add-on and extension developers. Writing cross-browser extensions had previously been an absolutely miserable experience, and many developers understandably chose to only target Chrome due to its market share and relatively pleasant API.

Continue reading

Using Google Chrome Extensions with Selenium

Running Google Chrome with an extension installed is quite simple because Chrome supports a --load-extension command-line argument for exactly this purpose. This can be specified before launching Chrome with Selenium by creating a ChromeOptions instance and calling add_argument(). from selenium import webdriver from selenium.common.exceptions import NoSuchElementException # Configure the necessary command-line option. options = webdriver.ChromeOptions() options.add_argument('--load-extension=path/to/the/extension') # Initalize the driver with the appropriate options. driver = webdriver.Chrome(chrome_options=options) The above code will setup a Selenium driver for Chrome with the extension located at path/to/extension preinstalled.

Continue reading