A few months back, I wrote a popular article called Making Chrome Headless Undetectable in response to one called Detecting Chrome Headless by Antione Vastel. The one thing that I was really trying to get across in writing that is that blocking site visitors based on browser fingerprinting is an extremely user-hostile practice. There are simply so many variations in browser configurations that you’re inevitably going to end up blocking non-automated access to your website, and–on top of that–you’re really not accomplishing anything in terms of blocking sophisticated web scrapers.
Have a great idea for an article?
We're always looking for guest contributors or article suggestions. Shoot us an email at firstname.lastname@example.org, we would love to hear yours!
Browser automation frameworks–like Puppeteer, Selenium, Marionette, and Nightmare.js–strive to provide rich APIs for configuring and interacting with web browsers. These generally work quite well, but you’re inevitably going to end up running into API limitations if you do a lot of testing or web scraping. You might find yourself wanting to conceal the fact that you’re using a headless browser, extract image resources from a web page, set the seed for Math.
In this post, I will highlight a few ways to save images while scraping the web through a headless browser. The simplest solution would be to extract the image URLs from the headless browser and then download them separately, but what if that’s not possible? Perhaps the images you need are generated dynamically or you’re visiting a website which only serves images to logged-in users. Maybe you just don’t want to put unnecessary strain on their servers by requesting the image multiple times.