Pyppeteer: The Ultimate Guide to Web Automation and Testing
Pyppeteer is a Python port of the popular Node.js library Puppeteer which provides a high-level API to control headless browsers over the DevTools Protocol. It is excellent for web automation, scraping, and testing websites. In this guide, we will introduce several useful Pyppeteer APIs with code snippets and provide a practical app example.
Installing Pyppeteer
First, to get started with Pyppeteer, you need to install it via pip:
pip install pyppeteer
Launching a Browser
The first step to using Pyppeteer is to launch the browser. Here is how you can do it:
import asyncio from pyppeteer import launch async def main(): browser = await launch() await browser.close() asyncio.get_event_loop().run_until_complete(main())
Opening a New Page
Once the browser is up and running, you can open a new page:
async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com') await browser.close() asyncio.get_event_loop().run_until_complete(main())
Taking a Screenshot
Need to capture a screenshot of a page? It’s simple with Pyppeteer:
async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com') await page.screenshot({'path': 'example.png'}) await browser.close() asyncio.get_event_loop().run_until_complete(main())
Generating a PDF
Pyppeteer can also be used to generate PDFs:
async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com') await page.pdf({'path': 'example.pdf'}) await browser.close() asyncio.get_event_loop().run_until_complete(main())
Filling Out Forms
Here’s how to fill out and submit forms with Pyppeteer:
async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com/form') await page.type('#name', 'John Doe') await page.type('#email', 'john.doe@example.com') await page.click('#submit') await browser.close() asyncio.get_event_loop().run_until_complete(main())
Waiting for Selectors
Sometimes, you need to wait for specific elements before performing actions:
async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com') await page.waitForSelector('#element', {'visible': True}) await page.click('#element') await browser.close() asyncio.get_event_loop().run_until_complete(main())
Building a Web Scraper App
Now, let’s create a simple web scraper that extracts titles from a webpage:
async def scrape_titles(url): browser = await launch() page = await browser.newPage() await page.goto(url) titles = await page.evaluate(''' () => { const elements = document.querySelectorAll('h1, h2, h3'); return Array.from(elements).map(element => element.textContent); } ''') await browser.close() return titles async def main(): url = 'https://example.com' titles = await scrape_titles(url) print(titles) asyncio.get_event_loop().run_until_complete(main())
This simple scraper launches a browser, navigates to a given URL, extracts all the text contents of <h1>
, <h2>
, and <h3>
elements, and prints them.
With Pyppeteer, the possibilities for web automation and testing are endless. Whether you need to scrape data, test E2E flows, or automate complex interactions, Pyppeteer has you covered.
Hash: 66a178c1ef573329a69001705943827320dae265c24a0ea16665ebe56f6493fb