How to scrape with Node.js

Yaard Studio
3 min readMar 22, 2020

--

Unsplash: Kelly Sikkema

H i everyone ! Have you ever tried To create an app but can’t find data To fill the content. Fortunately there is a site with thèse data. So, now we have a site with our data, we can use a magic method: The scraping.

What is Scraping?

The concept of Scraping is to load and extract data from a third source, for example a webpage… Then we can convert, adapt and use the data somewhere else.

How to get these data from a Webpage?

It exists multiple methods to realize web scraping… We will use these tools:

  • NodeJS
  • Cheerio (Scrapper package)
  • Axios (To get our page)

Initialize the project

To begin our project, we will create a new folder and run:

yarn init

Then, install the required packages:

yarn add cheerio axios

Let’s code our scrapper

In this article, we will scrape a page about the CAC40.

On that page, we have a history of CAC40 stocks. We need to find the Html elements that we want to scrape. We will use Google Chrome “Inspect Element”

Get the elements

Import our packages

You can create a .js file and begin to code!

const cheerio = require('cheerio');
const axios = require('axios');

Get page data

Now, we need to get the page. Axios will do the job:

const siteUrl = 'https://www.boursorama.com/bourse/indices/cours/historique/1rPCAC';

const fetchData = async () => {
const result = await axios.get(siteUrl);
return cheerio.load(result.data);
};

Write the scrape function

Axios gave us the website then Cheerio will help us to extract data from it. First we get the columns to have the key names for our future objects then the lines to fill the objects.

We can do a lot of different actions with cheerio. Here is the doc:

// Scraping method
const scrape = async () => {
const $ = await fetchData();

// Get the column names
const keys = [];
await $('*[data-period-history-view] .c-table > thead > tr').each((index, element) => {
$('th', element).each((idx, el) => {
keys.push($(el).text().replace(/\s/g, ''));
});
});

// Get each lines of the table
const data = [];
await $('*[data-period-history-view] .c-table > tbody > tr').each((index, element) => {
const object = {};

$('td', element).each((idx, el) => {
object[keys[idx]] = $(el).text().replace(/\s/g, '');
});

data.push(object);
});

return data;
};

Run the code

node app.js

On that screenshot, we have the result of the scraping. You can save it in a JSON file, in dB, etc..

Wrap Up

In this article we have seen how to scrape a webpage. It stays basic but can be very useful in some cases.

Here is the code:

GitLab Repo: dmg.link/blog-scraping-repo.

You can find my other articles and follow me here. Thanks for reading, I hope you learned something new today 🚀

--

--

No responses yet