How to scrape with Node.js
H i everyone ! Have you ever tried To create an app but can’t find data To fill the content. Fortunately there is a site with thèse data. So, now we have a site with our data, we can use a magic method: The scraping.
What is Scraping?
The concept of Scraping is to load and extract data from a third source, for example a webpage… Then we can convert, adapt and use the data somewhere else.
How to get these data from a Webpage?
It exists multiple methods to realize web scraping… We will use these tools:
- NodeJS
- Cheerio (Scrapper package)
- Axios (To get our page)
Initialize the project
To begin our project, we will create a new folder and run:
yarn init
Then, install the required packages:
yarn add cheerio axios
Let’s code our scrapper
In this article, we will scrape a page about the CAC40.
On that page, we have a history of CAC40 stocks. We need to find the Html elements that we want to scrape. We will use Google Chrome “Inspect Element”
Import our packages
You can create a .js file and begin to code!
const cheerio = require('cheerio');
const axios = require('axios');
Get page data
Now, we need to get the page. Axios will do the job:
const siteUrl = 'https://www.boursorama.com/bourse/indices/cours/historique/1rPCAC';
const fetchData = async () => {
const result = await axios.get(siteUrl);
return cheerio.load(result.data);
};
Write the scrape function
Axios gave us the website then Cheerio will help us to extract data from it. First we get the columns to have the key names for our future objects then the lines to fill the objects.
We can do a lot of different actions with cheerio. Here is the doc:
// Scraping method
const scrape = async () => {
const $ = await fetchData();
// Get the column names
const keys = [];
await $('*[data-period-history-view] .c-table > thead > tr').each((index, element) => {
$('th', element).each((idx, el) => {
keys.push($(el).text().replace(/\s/g, ''));
});
});
// Get each lines of the table
const data = [];
await $('*[data-period-history-view] .c-table > tbody > tr').each((index, element) => {
const object = {};
$('td', element).each((idx, el) => {
object[keys[idx]] = $(el).text().replace(/\s/g, '');
});
data.push(object);
});
return data;
};
Run the code
node app.js
On that screenshot, we have the result of the scraping. You can save it in a JSON file, in dB, etc..
Wrap Up
In this article we have seen how to scrape a webpage. It stays basic but can be very useful in some cases.
Here is the code:
GitLab Repo: dmg.link/blog-scraping-repo.
You can find my other articles and follow me here. Thanks for reading, I hope you learned something new today 🚀