Introduction to Cheerio
Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It is used for web scraping and parsing HTML in a Node.js environment. With Cheerio, you can manipulate and extract data from HTML documents easily.
Installation
npm install cheerio
Basic Usage
Here’s how you can use Cheerio to load an HTML document and manipulate its elements:
const cheerio = require('cheerio');
const html = '- Apple
- Orange
- Banana
';
const $ = cheerio.load(html);
$('li').each(function(i, elem) {
console.log($(this).text());
});
APIs and Methods
1. load
Loads an HTML document and returns a Cheerio instance.
const cheerio = require('cheerio');
const $ = cheerio.load('...');
2. html
Get the HTML contents of the selected elements or set the HTML contents:
$('ul').html();
$('ul').html('Pineapple ');
3. text
Get the combined text contents of each element in the set of matched elements:
$('ul').text();
$('ul').text('Grapes');
4. find
Get the descendants of each element filtered by a selector:
$('ul').find('li');
5. attr
Get the value of an attribute for the first element in the set of matched elements or set one or more attributes for every matched element:
$('a').attr('href');
$('a').attr('href', 'https://example.com');
Application Example
An example of a simple web scraper using Cheerio:
const axios = require('axios');
const cheerio = require('cheerio');
async function scrapeWebsite(url) {
try {
const { data } = await axios.get(url);
const $ = cheerio.load(data);
const scrapedData = [];
$('article').each((i, element) => {
const title = $(element).find('h1').text();
const content = $(element).find('.content').text();
scrapedData.push({ title, content });
});
console.log(scrapedData);
} catch (error) {
console.error('Error scraping website:', error);
}
}
scrapeWebsite('https://example.com');
Hash: 93e4b2003605b5a2df76eb9840eccabd4bea1affe79e205cee1112beb675c6fa