Introduction to iCrawler
iCrawler is a pythonic and productive web scraping framework that empowers users with a variety of functions to extract data efficiently from multiple sources. This article delves into the capabilities of iCrawler, exploring its diverse API with detailed code snippets and an app example to illustrate its practical usage.
Getting Started with iCrawler
To begin using iCrawler, you’ll need to install it via pip:
pip install icrawler
Basic Usage
The basic structure of an iCrawler script involves creating a crawler instance, specifying search criteria, and starting the crawl. The following code demonstrates a simple example using the GoogleImageCrawler:
from icrawler.builtin import GoogleImageCrawler
google_crawler = GoogleImageCrawler(storage={'root_dir': 'images'})
google_crawler.crawl(keyword='puppies', max_num=10)
Advanced Usage
iCrawler offers more advanced functionality such as checking the downloading status and handling errors. Below is an example showcasing these advanced features:
from icrawler.builtin import GoogleImageCrawler
google_crawler = GoogleImageCrawler(storage={'root_dir': 'images'})
def on_downloaded(url, file_path):
print(f'Image downloaded from {url} to {file_path}')
google_crawler.crawl(
keyword='kittens',
max_num=10,
min_size=(200, 200),
max_size=None,
file_idx_offset='auto',
on_downloaded=on_downloaded
)
Integrating APIs in an App
Combining multiple iCrawler APIs offers powerful opportunities. Here is a simple Flask web app example that allows users to enter a search term and retrieve images using GoogleImageCrawler:
from flask import Flask, request, render_template
from icrawler.builtin import GoogleImageCrawler
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == 'POST':
keyword = request.form['keyword']
google_crawler = GoogleImageCrawler(storage={'root_dir': 'static/images'})
google_crawler.crawl(keyword=keyword, max_num=5)
return render_template('results.html', keyword=keyword)
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True)
Conclusion
iCrawler is a versatile and extensive tool for web scraping tasks in Python. With its robust API and user-friendly interface, it facilitates efficient data extraction from various web sources. Utilize the code snippets and app example in this guide to harness the full potential of iCrawler in your data projects.
Hash: d25d24bf5fcc81022fbc563ff42d4fb8c89e4f07a384908773554e3018dfabcf