Enhance Your Web Scraping with Cheerio The Lightweight and Efficient Library for Node.js


Cheerio – The Lightweight and Efficient Library for HTML Parsing and Web Scraping

Introduction to Cheerio

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It is used for web scraping and parsing HTML in a Node.js environment. With Cheerio, you can manipulate and extract data from HTML documents easily.

Installation

npm install cheerio

Basic Usage

Here’s how you can use Cheerio to load an HTML document and manipulate its elements:


    const cheerio = require('cheerio');
    const html = '
  • Apple
  • Orange
  • Banana
'; const $ = cheerio.load(html); $('li').each(function(i, elem) { console.log($(this).text()); });

APIs and Methods

1. load

Loads an HTML document and returns a Cheerio instance.


    const cheerio = require('cheerio');
    const $ = cheerio.load('...');
  

2. html

Get the HTML contents of the selected elements or set the HTML contents:


    $('ul').html();
    $('ul').html('
  • Pineapple
  • ');

    3. text

    Get the combined text contents of each element in the set of matched elements:

    
        $('ul').text();
        $('ul').text('Grapes');
      

    4. find

    Get the descendants of each element filtered by a selector:

    
        $('ul').find('li');
      

    5. attr

    Get the value of an attribute for the first element in the set of matched elements or set one or more attributes for every matched element:

    
        $('a').attr('href');
        $('a').attr('href', 'https://example.com');
      

    Application Example

    An example of a simple web scraper using Cheerio:

    
        const axios = require('axios');
        const cheerio = require('cheerio');
    
        async function scrapeWebsite(url) {
          try {
            const { data } = await axios.get(url);
            const $ = cheerio.load(data);
            const scrapedData = [];
    
            $('article').each((i, element) => {
              const title = $(element).find('h1').text();
              const content = $(element).find('.content').text();
              scrapedData.push({ title, content });
            });
    
            console.log(scrapedData);
          } catch (error) {
            console.error('Error scraping website:', error);
          }
        }
    
        scrapeWebsite('https://example.com');
      

    Hash: 93e4b2003605b5a2df76eb9840eccabd4bea1affe79e205cee1112beb675c6fa

    Leave a Reply

    Your email address will not be published. Required fields are marked *