What’s so dangerous about pickles? Those pickles are very dangerous pickles. I literally can’t begin to tell you how really dangerous they are. You have to trust me on that. It’s important, Ok? – “Explosive Disorder” by Pan Telare Before we get elbow deep in opcodes here, let’s cover a little background. The Python standard library has a module called pickle that is used for serializing and deserializing objects.
Have a great idea for an article?
We're always looking for guest contributors or article suggestions. Shoot us an email at firstname.lastname@example.org, we would love to hear yours!
If you work with data in Python, chances are that you’ve heard of the pandas data manipulation library. You can think of pandas as a way to programmatically interact with spreadsheets. It works well with huge datasets, unlike its desktop counterparts like Google Sheets and Microsoft Excel, and implements a number of common database operations like merging, pivoting, and grouping. Moreover, being backed by numpy and efficient algorithm implementations makes it fast and easily integrated with other tools in the vast Python data science landscape.
One Million robots.txt Files The idea for this article actually started as a joke. We do a lot of web scraping here at Intoli and we deal with robots.txt files, overzealous ip bans, and all that jazz on a daily basis. A while back, I was running into some issues with a site that had a robots.txt file which was completely inconsistent with their banning policies, and I suggested that we should do an article on analyzing robots.
There’s a First Time for Everything Like some 75 million other Americans, I am playing fantasy football this year. Unlike most of them, I know virtually nothing about football. I would estimate that I’ve watched somewhere around five games total in my life, most of them Super Bowls. I don’t know the rules beyond the very basics and I can’t name a single NFL player off the top of my head.
This article was originally published as a guest post on ScrapingHub’s blog. ScrapingHub is the company that wrote Scrapy, which this article is about, so read on to see why they liked it! Introduction The Steam game store is home to more than ten thousand games and just shy of four million user-submitted reviews. While all kinds of Steam data are available either through official APIs or other bulk-downloadable data dumps, I could not find a way to download the full review dataset.
Detecting Headles Chrome A short article titled Detecting Chrome Headless popped up on Hacker News over the weekend and it has since been making the rounds. Most of the discussion on Hacker News was focused around the author’s somewhat dubious assertion that web scraping is a “malicious task” that belongs in the same category as advertising fraud and hacking websites. That’s always a fun debate to get into, but the thing that I really took issue with about the article was that it implicitly promoted the idea of blocking users based on browser fingerprinting.
Update: This article is regularly updated in order to accurately reflect improvements in Firefox’s headless browsing capabilities. Note: Check out Running Selenium with Healdess Chrome if you’d rather use Google’s browser. Using Selenium with Headless Firefox (on Windows) Ever since Chrome implemented headless browsing support back in April, the other major browsers started following suit. In particular, Mozilla has since then expanded support for Firefox’s headless mode from Linux to its Windows and macOS builds, and fixed a number of bugs that might have been in the way of early adopters.
UPDATE: This article is updated regularly to reflect the latest information and versions. If you’re looking for instructions then skip ahead to see Setup Instructions. NOTE: Be sure to check out Running Selenium with Headless Chrome in Ruby if you’re interested in using Selenium in Ruby instead of Python. Background It has long been rumored that Google uses a headless variant of Chrome for their web crawls. Over the last two years or so it had started looking more and more like this functionality would eventually make it into the public releases and, as of this week, that has finally happened.
How to Resize Matplotlib Legend Markers I frequently find myself plotting clusters of points in Matplotlib with relatively small marker sizes. This is a useful way to visualize the data, but the plot’s legend will use the same marker sizes by default and it can be quite difficult to discern the color of a single point in isolation. Let’s plot a few random clusters of points to see what this problem looks like in practice.