Web scraping is an essential skill for data analysts and developers who want to extract data from websites. However, finding reliable sources to learn and discuss web scraping techniques can be challenging. Fortunately, several subreddits on Reddit are dedicated to web scraping, data analysis, and programming-related discussion.
In this article, we'll explore the 11 best subreddits for web scraping and share why each of these subreddits might be useful for you on your web scraping journey.
Here is the list of subreddits we will be focusing on:
- r/WebScraping
- r/Scrapy
- r/Selenium
- r/Puppeteer
- r/ProxiesTalk
- r/ProxyLists
- r/ShoeBots
- r/dataisbeautiful
- r/learnprogramming
- r/datanalysis
- r/dataengineering
r/WebScraping
Statistics:
- Total subscribers: 12,329
- Posts per day: 4
- Comments per day: 7
- Growth in subscribers (since last year): 4,812
The first one on our list is r/WebScraping . This subreddit is very useful for keeping up to date with the latest news in the world of web scraping. If you ever have any trouble with scraping a website, this should be the first place you look for help. The community is generally very responsive and supportive, and members are often willing to offer guidance or share their solutions. You will find novel ideas shared here as well. For example, this GPT-powered web scraper!
r/Scrapy
Statistics:
- Total subscribers: 5,529
- Posts per day: 1
- Comments per day: 2
- Growth in subscribers (since last year): 733
Scrapy is a very powerful, robust, and mature Python web scraping framework used by companies of all sizes. The /r/scrapy subreddit is dedicated to all things Scrapy. It is the official community hub of Scrapy and so you will often find Scrapy experts hanging out there.
If you encounter an issue or have questions related to Scrapy, this subreddit is an excellent resource to seek help and guidance.
r/Selenium
Statistics:
- Total subscribers: 10,955
- Posts per day: 3
- Comments per day: 5
- Growth in subscribers (since last year): 1,777
No matter where you are on your web scraping journey, chances are that you will have heard of Selenium at one point. It is a framework for browser automation and is an indispensable tool for scraping experts. The /r/selenium subreddit will help you learn how to effectively use this powerful tool. You can ask all sorts of questions regarding Selenium. For example, a recent question was asking about what else prevents finding elements besides IFrames in Selenium.
r/Puppeteer
Statistics:
- Total subscribers: 430
- Posts per day: 1
- Comments per day: 3
- Growth in subscribers (since last year): 138
Puppeteer is a popular browser automation library maintained by Google. It is an alternative to Selenium and Playwright . The /r/puppeteer subreddit is not very active but you can find solutions to Puppeteer-related issues and Puppeteer-specific tips in old posts. You might be able to get help by making a new post but we wouldn't suggest you count on it much as as only approved members can make posts.
r/ProxiesTalk
Statistics:
- Total subscribers: 195
- Posts per day: 1
- Comments per day: 1
- Growth in subscribers (since last year): 104
Good reliable proxies are always a necessity when doing any kind of serious web scraping. You don't always have to rely on paid proxies. You can use subreddits like /r/ProxiesTalk to ask proxy-related questions and potentially source some cheap, low-cost proxies.
r/ProxyLists
Statistics:
- Total subscribers: 468
- Posts per day: 4
- Comments per day: 2
- Growth in subscribers (since last year): 248
/r/ProxyLists subreddit is similar to /r/ProxiesTalk . It is dedicated to sharing free proxies. However, you might also find posts by companies pushing their paid proxy lists. If you have time and can sift through the noise, you can score some cheap (free) and reliable proxies in this subreddit.
r/ShoeBots
Statistics:
- Total subscribers: 40,981
- Posts per day: 3
- Comments per day: 7
- Growth in subscribers (since last year): 2,694
Shoe scalpers very often use bots for automated purchases on shoe websites like Nike and Adidas . These websites often have the most stringent anti-bot measures to thwart scalping and automated buying efforts. The /r/ShoeBots subreddit is a superb resource for exploring the scalping and shoebots side of web scraping. The posts in this subreddit might give you some clever ideas to bypass some of the strongest anti-bot measures in the industry. We have also covered some anti-bot evasion techniques in the past.
r/dataisbeautiful
Statistics:
- Total subscribers: 19,368,352
- Posts per day: 39
- Comments per day: 2292
- Growth in subscribers (since last year): 2,374,536
If you are interested in web scraping from the perspective of data analysis, the /r/dataisbeautiful subreddit will give you a ton of fun visualization ideas. It is an extremely active community with over 39 posts made every day and quite a few of them regularly make it to Reddit's front page. You also don't need to be interested in data analysis to be mesmerized by the engaging visualizations shared by the /r/dataisbeautiful community.
r/learnprogramming
Statistics:
- Total subscribers: 3,731,058
- Posts per day: 131
- Comments per day: 434
- Growth in subscribers (since last year): 1,343,900
Any serious web scraping requires getting your hands dirty with some coding. Oftentimes your scraping questions can more appropriately be defined as coding/programming questions. For such questions, /r/learnprogramming is the best place to seek help. It is a very welcoming community where beginners are encouraged to ask all kinds of programming-related questions. With the huge amount of active members, most posts get a response very quickly.
r/dataanalysis
Statistics:
- Total subscribers: 47,494
- Posts per day: 26
- Comments per day: 43
- Growth in subscribers (since last year): 26,175
/r/dataanalysis is similar to /r/dataisbeautiful . However, this subreddit is targeted more toward asking questions, sourcing feedback, and sharing tips and tricks regarding data analysis. You can ask data analysis-related questions in this subreddit and once your analysis is complete, you can share a visualization in the /r/dataisbeautiful subreddit.
r/dataengineering
Statistics:
- Total subscribers: 91,858
- Posts per day: 40
- Comments per day: 173
- Growth in subscribers (since last year): 39,381
Gathering data is just one part of web scraping; storing that data is a whole other beast. If you have a ton of data and need help, guidance, or suggestions related to different data storage options and data pipelines, the /r/dataengineering subreddit is your best friend. Ask whichever question you might have related to data engineering and you will get a response pretty quickly. Quite a few veterans with data engineering experience hang out here.
Recap
Subreddit | Total Subscribers | Posts per day | Comments per day | Growth in subscribers (since last year) |
---|---|---|---|---|
r/WebScraping | 12,329 | 4 | 7 | 4,812 |
r/Scrapy | 5,529 | 1 | 2 | 733 |
r/Selenium | 10,955 | 3 | 5 | 1,777 |
r/Puppeteer | 430 | 1 | 3 | 138 |
r/ProxiesTalk | 195 | 1 | 1 | 104 |
r/ProxyLists | 468 | 4 | 2 | 248 |
r/ShoeBots | 40,981 | 3 | 7 | 2,694 |
r/dataisbeautiful | 368,352 | 39 | 2292 | 2,374,536 |
r/learnprogramming | 731,058 | 131 | 434 | 1,343,900 |
r/dataanalysis | 47,494 | 26 | 43 | 26,175 |
r/dataengineering | 91,858 | 40 | 173 | 39,381 |
Conclusion
This was a short roundup of some of the best subreddits we found related to web scraping. We hope that you use these resources to improve your knowledge and take your web scraping expertise to the next level. If you have any scraping-related questions that are not covered by any of these subreddits, you can browse through our blog which is choke-full of informative posts.
Pierre is a data engineer who worked in several high-growth startups before co-founding ScrapingBee. He is an expert in data processing and web scraping.