Comprehensive Guide to Cheerio for Scraping and Parsing HTML – Enhance Your Web Development Skills

Introduction to Cheerio

Cheerio is a fast, flexible, and lean implementation of core jQuery designed specifically for the server. It allows you to parse HTML and XML, manipulate the DOM, and efficiently scrape web pages.

Setting Up Cheerio

First, install Cheerio using npm:

 
  npm install cheerio
 

Loading and Parsing HTML

Load HTML and manipulate it as you would with jQuery:

 
  const cheerio = require('cheerio');
  const html = `<ul id="fruits"><li class="apple">Apple</li><li class="orange">Orange</li><li class="pear">Pear</li></ul>`;
  const $ = cheerio.load(html);
  console.log($('ul').attr('id'));  // 'fruits'
  console.log($('.apple').text());  // 'Apple'
 

Cheerio API Examples

Selecting Elements

Use familiar jQuery selectors:

 
  console.log($('#fruits').find('li').length);  // 3
  console.log($('li[class=orange]').html());    // 'Orange'
 

Manipulating DOM

Modify HTML content with ease:

 
  const pear = $('.pear').text();
  $('.pear').text('Grape');
  console.log($('.pear').text());  // 'Grape'
 

Attributes and Properties

Access and set attributes and properties:

 
  console.log($('ul').attr('id'));        // 'fruits'
  $('ul').attr('id', 'newID');
  console.log($('ul').attr('id'));        // 'newID'
 

Traversal

Move around the DOM tree with powerful traversal methods:

 
  console.log($('.apple').next().text()); // 'Orange'
  console.log($('.pear').prev().text());  // 'Orange'
 

Removing Elements

Remove elements from the DOM:

 
  $('.apple').remove();
  console.log($('#fruits').html()); // Only 'Orange' and 'Pear' remain
 

Sample Application

Here is an example of a small scraper application using the above-mentioned Cheerio APIs:

 
  const axios = require('axios');
  const cheerio = require('cheerio');

  axios.get('https://example.com')
    .then(response => {
      const $ = cheerio.load(response.data);

      // Extract the title of the page
      const title = $('title').text();
      console.log('Page title:', title);

      // Get all links with their text
      $('a').each((index, element) => {
        const text = $(element).text();
        const href = $(element).attr('href');
        console.log(text, href);
      });
    })
    .catch(error => {
      console.error('Error fetching the page:', error);
    });
 

With Cheerio, web scraping and DOM manipulation become very intuitive and powerful, enabling developers to handle tasks efficiently.

Hash: 93e4b2003605b5a2df76eb9840eccabd4bea1affe79e205cee1112beb675c6fa

Leave a Reply

Your email address will not be published. Required fields are marked *