Scrapy
Scrapy framework is a robust and complete web scraping tool that allows you to:
- explore a whole website from a single URL (crawling)
- rate-limit the exploration to avoid getting banned
- generates data export in CSV, JSON, and XML
- storing the data in S3, databases, etc
- cookies and session handling
- HTTP features like compression, authentication, caching
- user-agent spoofing
- robots.txt
- crawl depth restriction
- and more
However, this framework can be a bit hard to use, especially for beginners. If you want to learn this framework, check out our Scrapy tutorial .
If you only need to scrape some simple webpages, we suggest you use a standard Python HTTP client and BeautifoulSoup .