Introduction to Puppeteer Cluster

Puppeteer Cluster is a library that leverages Puppeteer, the popular headless browser automation tool, to run multiple parallel instances of headless browsers. This is particularly useful for tasks such as web scraping, automated testing, or crawling web pages, where performance and scalability are critical.

Getting Started

To install Puppeteer Cluster, you need to have Node.js installed. Run the following command to install the library:

  
    npm install puppeteer-cluster

Here’s a basic example to get you started:

  
    const { Cluster } = require('puppeteer-cluster');

    (async () => {
      const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 5,
      });

      await cluster.task(async ({ page, data: url }) => {
        await page.goto(url);
        const title = await page.title();
        console.log(`Title of ${url} is ${title}`);
      });

      await cluster.queue('http://www.google.com');
      await cluster.queue('http://www.github.com');

      await cluster.idle();
      await cluster.close();
    })();

Useful APIs

Cluster.launch

Creates and launches a new cluster with specified options.

      
        const cluster = await Cluster.launch({
          concurrency: Cluster.CONCURRENCY_CONTEXT,
          maxConcurrency: 10,
          puppeteerOptions: {
            headless: true,
          },
        });

Cluster.task

Defines a task for the cluster. The task should contain the code to be executed for each job.

      
        await cluster.task(async ({ page, data: url }) => {
          await page.goto(url);
          const bodyHandle = await page.$('body');
          const html = await page.evaluate(body => body.innerHTML, bodyHandle);
          console.log(html);
          await bodyHandle.dispose();
        });

Cluster.queue

Adds a job to the cluster’s queue.

      
        await cluster.queue('http://www.example.com');
        await cluster.queue('http://www.wikipedia.org');

Cluster.idle

Waits until all queued tasks are finished.

      
        await cluster.idle();

Cluster.close

Closes the cluster and all the browser instances it manages.

      
        await cluster.close();

Example Application

Below is a more comprehensive example that demonstrates fetching the title and the first paragraph of multiple web pages:

  
    const { Cluster } = require('puppeteer-cluster');

    (async () => {
      const cluster = await Cluster.launch({
        concurrency: Cluster.CONCURRENCY_CONTEXT,
        maxConcurrency: 3,
      });

      await cluster.task(async ({ page, data: url }) => {
        await page.goto(url);
        const title = await page.title();
        const firstParagraph = await page.$eval('p', el => el.innerText);
        console.log(`Title: ${title}`);
        console.log(`First paragraph: ${firstParagraph}`);
      });

      const urls = [
        'http://www.google.com',
        'http://www.github.com',
        'http://www.wikipedia.org',
      ];

      for (const url of urls) {
        await cluster.queue(url);
      }

      await cluster.idle();
      await cluster.close();
    })();

This demo script runs a cluster to process multiple URLs concurrently, efficiently fetching and logging the desired elements from each site.

Hash: 841a252390a4102790ecb57fc628af9d3fdc24375780dd6504a43af5cfcce02e

Puppeteer Cluster Unleashing the Power of Scalable Headless Browsing for SEO

Introduction to Puppeteer Cluster

Getting Started

Useful APIs

Cluster.launch

Cluster.task

Cluster.queue

Cluster.idle

Cluster.close

Example Application

Leave a Reply Cancel reply

Introduction to Puppeteer Cluster

Getting Started

Useful APIs

Cluster.launch

Cluster.task

Cluster.queue

Cluster.idle

Cluster.close

Example Application

Leave a Reply Cancel reply

Related Posts

Comprehensive Guide to Redlock Distributed Locking System

Enhance Your Python Code with lib-decorators A Comprehensive Guide with Examples

Optimize Efficient Code Management with cached-path-relative

Understanding and Utilizing NSSocket for Efficient Networking Solutions