Introduction to Puppeteer
Puppeteer is a Node library that provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium. Puppeteer is useful for various applications like web scraping, automated testing, and more.
This guide will introduce you to Puppeteer and provide several useful API examples with code snippets.
Getting Started with Puppeteer
First, you need to install Puppeteer. You can do it with npm:
npm install puppeteer
Launching a Browser Instance
To launch a browser instance:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
await browser.close();
})();
Taking Screenshots
Taking a screenshot of a page:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'example.png' });
await browser.close();
})();
PDF Generation
Generating a PDF of a page:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.pdf({ path: 'example.pdf' });
await browser.close();
})();
Intercepting Network Requests
Intercept and manipulate network requests:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'image') {
request.abort();
} else {
request.continue();
}
});
await page.goto('https://example.com');
await browser.close();
})();
Filling Forms
Filling out and submitting a form:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/form');
await page.type('#name', 'John Doe');
await page.type('#email', 'johndoe@example.com');
await page.click('button[type="submit"]');
await page.waitForNavigation();
console.log('Form submitted successfully!');
await browser.close();
})();
Creating an App with Puppeteer
Here is an example of a simple app using Puppeteer APIs:
const puppeteer = require('puppeteer');
const express = require('express');
const app = express();
const port = 3000;
app.get('/screenshot', async (req, res) => {
const url = req.query.url;
if (!url) {
return res.status(400).send('URL is required');
}
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const screenshot = await page.screenshot({ encoding: 'binary' });
await browser.close();
res.contentType('image/png');
res.send(screenshot);
});
app.listen(port, () => {
console.log(`App listening at http://localhost:${port}`);
});
This example sets up an Express server that listens for GET requests at the /screenshot endpoint and takes screenshots of the URLs passed as query parameters.
Conclusion: Puppeteer is a powerful tool for web scraping and browser automation. With its rich API, you can perform a wide range of tasks programmatically, making it an invaluable asset for developers.
Hash: c83c91fa43a3754027874a9dcd77b910adf602f6415e7904a9f377219e48bf4a