Intoli Remote Browser Tour

Scraping Hacker News

Let's look at one last example that touches on how you could use Remote Browser in the context of web scraping. We'll visit Hacker News, and then scrape the title, URL, and score for each story. There's an implementation of this in the code editor to the right. You'll notice that all of the actual scraping code is just written in plain JavaScript using the HTML page context API. We don't need to use a custom selector API, we're dealing with the actual HTML elements instead of remote object handles, and our interaction with the Remote Browser API is fairly minimal.

Remote Browser is still in development, but we hope that you've enjoyed this brief introduction. We have a lot of big plans for expanding Remote Browser, and building higher-level libraries on top of it. We're really looking forward to these, and we hope that you are too! Be sure to sign up for our monthly newsletter to hear about open source releases and other new content from Intoli!

7/7

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import Browser from 'remote-browser';
// Launch a remote browser instance.
const browser = new Browser();
await browser.launch();
// Navigate to Hacker News.
const tab = await browser.tabs.update({ url: 
    'https://news.ycombinator.com' });
// Extract all of the stories on the front page
    .
const stories = await browser[tab.id](() => (
  Array.from(document.getElementsByClassName
      ('athing'))
    .map((tr, position) => {
      const storyLink = tr
          .getElementsByClassName('storylink'
          )[0];
      const score = tr.nextSibling
          .getElementsByClassName('score')[0];
      return {
        position: position + 1,
        title: storyLink.innerHTML,
        url: storyLink.href,
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX