Comprehensive Guide to Puppeteer APIs for Web Automation and Scraping

Introduction to Puppeteer

Puppeteer is a Node.js library that provides a high-level API to control headless Chrome or Chromium browsers over the DevTools Protocol. It’s perfect for web scraping, automated testing, capturing screenshots, and much more.

Getting Started

  
    const puppeteer = require('puppeteer');

    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://example.com');
      await page.screenshot({ path: 'example.png' });

      await browser.close();
    })();
  

Dozens of Useful APIs

1. Launch a Browser

  
    const browser = await puppeteer.launch({
      headless: true // Set to false to see the browser UI
    });
  

2. Create a New Page

  
    const page = await browser.newPage();
  

3. Navigate to a URL

  
    await page.goto('https://example.com');
  

4. Taking Screenshots

  
    await page.screenshot({ path: 'example.png' });
  

5. Generating PDFs

  
    await page.pdf({ path: 'example.pdf', format: 'A4' });
  

6. Evaluating JavaScript in the Browser Context

  
    const title = await page.evaluate(() => document.title);
    console.log(title);
  

7. Interacting with Elements

  
    await page.type('#search', 'puppeteer');
    await page.click('#searchButton');
  

8. Waiting for Selectors

  
    await page.waitForSelector('#result');
  

9. Handling Dialogs

  
    page.on('dialog', async dialog => {
      console.log(dialog.message());
      await dialog.dismiss();
    });
  

10. Taking Full Page Screenshots

  
    await page.screenshot({ path: 'fullpage.png', fullPage: true });
  

App Example Using Puppeteer

Let’s build a small application that demonstrates some of these APIs by navigating to a website, taking a screenshot, and extracting some data.

  
    const puppeteer = require('puppeteer');

    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      
      await page.goto('https://example.com');
      await page.type('#search', 'puppeteer');
      await page.click('#searchButton');
      
      await page.waitForSelector('#result');
      const results = await page.evaluate(() => {
        let items = [];
        document.querySelectorAll('#result .item').forEach(item => {
          items.push({
            title: item.querySelector('.title').innerText,
            link: item.querySelector('a').href
          });
        });
        return items;
      });

      console.log(results);
      await page.screenshot({ path: 'example_search.png', fullPage: true });

      await browser.close();
    })();
  

By using Puppeteer, you can automate complex workflows on the web, making it a powerful tool for developers and testers alike.


Hash: c83c91fa43a3754027874a9dcd77b910adf602f6415e7904a9f377219e48bf4a

Leave a Reply

Your email address will not be published. Required fields are marked *