4chan Scraper

Screenshot 2019-04-10 15.23.17.png

I wrote a simple python script to scrape the posts, replies, and files from the front page of a 4chan imageboard.

GitHub link

This simple Python script uses 4chan's read-only APIs (https://github.com/4chan/4chan-API) to scrape the information from the front page of a given imageboard. In addition to saving every image posted to the board, the script will also generate multiple CSV files that record which threads were on the front page at a given time. A folder is generated for each thread's images, as well as an individual CSV file that records each reply in the thread as well.

I'm doing research on anonymous online communities, the ways they communicate with one another, and how they're able to influence real events in the physical world. Rather than manually browsing and downloading content from 4chan imageboards, I built this script to automatically scrape the most recent content from a given 4chan imageboard.

Ben Pettis