Introduction to Puppeteer Cluster
Puppeteer Cluster is a powerful Node.js library that allows you to run multiple Puppeteer instances in parallel. It is designed to make web scraping and automation tasks faster and more efficient by utilizing the power of concurrency.
Getting Started
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 2,
});
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
const title = await page.title();
console.log(`Title of ${url} is ${title}`);
});
cluster.queue('http://www.wikipedia.org/');
cluster.queue('http://www.google.com/');
await cluster.idle();
await cluster.close();
})();
Useful APIs
Cluster.launch
Initializes a new Puppeteer cluster.
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 2,
});
Cluster.task
Defines the task that will be executed for each job in the queue.
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
const title = await page.title();
console.log(`Title of ${url} is ${title}`);
});
Cluster.queue
Adds a URL or a task to the queue.
cluster.queue('http://www.wikipedia.org/');
cluster.queue('http://www.google.com/');
Cluster.idle
Waits until all tasks are executed and the cluster becomes idle.
await cluster.idle();
Cluster.close
Closes all puppeteer instances and frees resources.
await cluster.close();
Complete Example App using Puppeteer Cluster
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 2,
});
// Task to scrape titles from URLs
await cluster.task(async ({ page, data: url }) => {
await page.goto(url);
const title = await page.title();
console.log(`Title of ${url} is ${title}`);
});
// Adding URLs to the queue
const urls = [
'http://www.wikipedia.org/',
'http://www.google.com/',
'http://www.github.com/',
'http://www.stackoverflow.com/'
];
urls.forEach(url => cluster.queue(url));
// Wait for the cluster to finish
await cluster.idle();
await cluster.close();
})();
With Puppeteer Cluster, you can efficiently manage multiple Puppeteer instances and perform web scraping at scale, boosting your productivity and performance.
Hash: 841a252390a4102790ecb57fc628af9d3fdc24375780dd6504a43af5cfcce02e