Blogs

Scraping User-Submitted Reviews from the Steam Store

This article was originally published as a guest post on ScrapingHub’s blog. ScrapingHub is the company that wrote Scrapy, which this article is about, so read on to see why they liked it! Introduction The Steam game store is home to more than ten thousand games and just shy of four million user-submitted reviews. While all kinds of Steam data are available either through official APIs or other bulk-downloadable data dumps, I could not find a way to download the full review dataset.

Continue reading

Making Chrome Headless Undetectable

Detecting Headles Chrome A short article titled Detecting Chrome Headless popped up on Hacker News over the weekend and it has since been making the rounds. Most of the discussion on Hacker News was focused around the author’s somewhat dubious assertion that web scraping is a “malicious task” that belongs in the same category as advertising fraud and hacking websites. That’s always a fun debate to get into, but the thing that I really took issue with about the article was that it implicitly promoted the idea of blocking users based on browser fingerprinting.

Continue reading

Markov's and Chebyshev's Inequalities Explained

Confidence Values If you’ve ever learned any basic statistics or probability then you’ve probably encountered the 68-95-99.7 rule at some point. This rule is simply the statement that, for a normally distributed variable, roughly 68% of values will fall within one standard deviation of the mean, 95% of values within two standard deviations, and 99.7% within three standard deviations. These confidence values are quite useful to memorize because values that are computed from data are often approximately normally distributed due to the central limit theorem.

Continue reading

Patching a Linux Kernel Module

A Bug on Linux? Why, I never! I’ve been using GNU/Linux for about fifteen years and, I’ve got to admit, it used to be pretty rough around the edges (to put it lightly). A lot can change over fifteen years though; most of the things that were once major problem areas haven’t required a second thought in years. Laptop suspension, WIFI, advanced function keys, sound, and pretty much everything else all typically “just work” these days, and this has been the case for quite a while.

Continue reading

Understanding Neural Network Weight Initialization

Choosing Weights: Small Changes, Big Differences There are a number of important, and sometimes subtle, choices that need to be made when building and training a neural network. You have to decide which loss function to use, how many layers to have, what stride and kernel size to use for each convolution layer, which optimization algorithm is best suited for the network, etc. With so many things that need to be decided, the choice of initial weights may, at first glance, seem like just another relatively minor pre-training detail, but weight initialization can actually have a profound impact on both the convergence rate and final quality of a network.

Continue reading

Intoli Joins the NVIDIA Inception Program

Intoli is Joining the NVIDIA Family We’re very pleased to announce today that Intoli will officially be joining the NVIDIA Inception Program for exceptional technology startups who are revolutionizing their industries with advances in artificial intelligence (AI) and data science. NVIDIA has been instrumental in the resurgence of neural networks in machine learning over the last several years. The rise of GPU-accelerated neural network training has allowed for major advances in the field of deep learning and NVIDIA’s GPU lines, Deep Learning SDK, and investment in AI startups have all undoubtedly played an immense role in that.

Continue reading

Running Selenium with Headless Firefox

Using Selenium with Headless Firefox (on Windows) Selenium uses the WebDriver protocol to remotely control browsers. Chrome, Firefox, Safari, and other major browsers already work with this API, and support for headless browsing has been improving ever since Google implemented it in Chrome back in April. As of today, Firefox expanded its headless mode from Linux to its nightly Windows builds. macOS support is lagging behind a bit, but should be coming pretty soon too.

Continue reading

Finding Pareto Optimal Blogs on Hacker News

Introduction I’ve been doing a lot of technical writing recently and, with that experience, I’ve grown to more deeply appreciate the writing of others. It’s easy to take the effort behind an article for granted when you’ve grown accustomed to there being new high-quality content posted every day on Hacker News and Twitter. The truth is that a really good article can take days or more to put together and it isn’t easy to write even one article that really takes off, let alone a steady stream of them.

Continue reading

Why I still don't use Yarn

But Isn’t Yarn the Best Node Package Manager? If you’re only comparing it to npm, then the answer is unequivocally yes. Yarn is generally much faster than npm and gives you deterministic builds by default, built-in integrity checking, license management tools, and a host of other goodies. Despite all of that, I still usually don’t use yarn. I avoid yarn for one simple reason: disk space usage. I feel like a bit of a curmudgeon here, but I find it a little absurd that it can easily take 100 MB, or more, to store a project consisting of a couple hundred lines of JavaScript if you want to use modern tooling (e.

Continue reading

The tech videos that have most impacted me as a developer

Introduction Over the years, I’ve collected a handful of videos that I deeply enjoy and that have had a significant impact on me as a developer. These are videos that I love introducing people to and I’m happy to have the chance to share them with you here. I find them all inspirational in their own ways and they serve as a continuous reminder for me to keep an open mind and to take creative approaches to problems.

Continue reading