How to scrape with Node.js

3 min readMar 22, 2020

H i everyone ! Have you ever tried To create an app but can’t find data To fill the content. Fortunately there is a site with thèse data. So, now we have a site with our data, we can use a magic method: The scraping.

What is Scraping?

The concept of Scraping is to load and extract data from a third source, for example a webpage… Then we can convert, adapt and use the data somewhere else.

How to get these data from a Webpage?

It exists multiple methods to realize web scraping… We will use these tools:

NodeJS
Cheerio (Scrapper package)
Axios (To get our page)

Initialize the project

To begin our project, we will create a new folder and run:

yarn init

Then, install the required packages:

yarn add cheerio axios

Let’s code our scrapper

In this article, we will scrape a page about the CAC40.

Historique CAC 40 PX1 FR0003500008 Euronext Paris - Boursorama

Données historiques sur CAC 40 PX1 FR0003500008 Euronext Paris avec Boursorama

www.boursorama.com

On that page, we have a history of CAC40 stocks. We need to find the Html elements that we want to scrape. We will use Google Chrome “Inspect Element”

Import our packages

You can create a .js file and begin to code!

const cheerio = require('cheerio');
const axios = require('axios');

Get page data

Now, we need to get the page. Axios will do the job:

const siteUrl = 'https://www.boursorama.com/bourse/indices/cours/historique/1rPCAC';

const fetchData = async () => {
   const result = await axios.get(siteUrl);
   return cheerio.load(result.data);
};

Write the scrape function

Axios gave us the website then Cheerio will help us to extract data from it. First we get the columns to have the key names for our future objects then the lines to fill the objects.

We can do a lot of different actions with cheerio. Here is the doc:

cheerio

Tiny, fast, and elegant implementation of core jQuery designed specifically for the server

www.npmjs.com

// Scraping method
const scrape = async () => {
   const $ = await fetchData();

   // Get the column names
   const keys = [];
   await $('*[data-period-history-view] .c-table > thead > tr').each((index, element) => {
      $('th', element).each((idx, el) => {
         keys.push($(el).text().replace(/\s/g, ''));
      });
   });

   // Get each lines of the table
   const data = [];
   await $('*[data-period-history-view] .c-table > tbody > tr').each((index, element) => {
      const object = {};

      $('td', element).each((idx, el) => {
         object[keys[idx]] = $(el).text().replace(/\s/g, '');
      });

      data.push(object);
   });

   return data;
};

Run the code

node app.js

On that screenshot, we have the result of the scraping. You can save it in a JSON file, in dB, etc..

Wrap Up

In this article we have seen how to scrape a webpage. It stays basic but can be very useful in some cases.

Here is the code:

GitLab Repo: dmg.link/blog-scraping-repo.

You can find my other articles and follow me here. Thanks for reading, I hope you learned something new today 🚀

Automate your React Native App with Fastlane

Simplify Screenshots, beta deployment, app store deployment and signing in your React Native App 🚀

levelup.gitconnected.com

How to Use Face ID With React Native or Expo

Bring biometric auth to your React Native or Expo app. Let your users sign in with it.

medium.com