Reddit Blackout<!-- --> | <!-- -->Ben Pettis
Ben Pettis

Reddit Blackout

June 12, 2023

Modified version of the reddit logo - a white cartoon alien stands with a red blindfold over its eyes. The word 'reddit' appears next to the alien

Starting June 12, 2023, many Reddit communities (subreddits) will be "going dark" - or changing to private mode - as a protest in response to Reddit's plans to change its API access policies and fee structure. Supporters of the protest criticize the planned changes for being prohibitively expensive for 3rd party apps. Beyond 3rd party apps, there is significant concern that the API changes are a move by the platform to increase monetization, degrade the user experience, and eventually kill off other custom features such as the old.reddit.com interface, the Reddit Enhancement Suite browser extension, and more. Additionally, there are concerns that the API changes will impede the ability of subreddit moderators (who are all unpaid users) to access tools to keep their communities on-topic and free of spam.

I am a Reddit user, but I am also an internet researcher. I'm working on a dissertation about how people are constructed as "Users" by platforms, and how this includes and excludes certain groups from assumptions of who the internet is "for." In one section of my dissertation, I am examining moderators and volunteer fact-checkers—users who are in an "elevated" role over other users, but still not employees of the platform. How do they understand their own role? How do they make sense of the power they have over other users? And how do they influence what a platform "is" - even beyond what its corporate owners imagine? So when I learned about the Reddit Blackout, I knew it would be a great opportunity to see these questions I'm thinking about play out in real time.

To that end, I have come up with some ways to preserve content related to the blackout. These scripts will pull the list of participating subreddits that has been collated in the /r/ModCoord subreddit. Then, using that list, another script looks in those subreddits for stickied announcement posts - e.g. how a subreddit's moderators are explaining their decision to their community.

Python Scripts

These scripts (along with some basic documentation) can be found at: https://github.com/bpettis/reddit-blackout-announcements

The first script, list-subreddits.py looks at three reddit posts and grabs the list of participating subreddits:

The next, get-stickies.py is a slightly modified version of the script that I had previously written to preserve other subreddit info: https://github.com/bpettis/reddit-scrape_mods-rules. It will create a CSV file for each of the listed subreddits. Each row of the CSV represents a stickied post. There currently isn't any logic to try and detect which post is the one announcing the blackout. I'm just saving all of them.


Data Access

If you're just interested in taking a look at the content that I scraped, I will be making that available in a public Google Cloud Storage bucket:

The above URL for all stickied posts will give you an XML file of all contents in the storage bucket. Each <Contents> element represents a file:

<Contents>
    <Key>2023-06-11/stickies/funny.csv</Key>
    <Generation>1686507430668697</Generation>
    <MetaGeneration>1</MetaGeneration>
    <LastModified>2023-06-11T18:17:10.671Z</LastModified>
    <ETag>"ceab42346825f636b26ec470b417aa8d"</ETag>
    <Size>2620</Size>
</Contents>

You can use the <Key> to build the URL like this: https://storage.googleapis.com/reddit-blackout-announcements + <Key>

For example: https://storage.googleapis.com/reddit-blackout-announcements/2023-06-11/stickies/funny.csv

Download Data by Subreddit

It can be a bit cumbersome to manually look through the XML summary and find the specific file that you want. Here's a list of all the subreddits that I have a CSV file for.

I just used a slightly modified version of Google's example script to create a list of the files:

from google.cloud import storage

def list_blobs(bucket_name):
    """Lists all the blobs in the bucket."""
    # bucket_name = "reddit-blackout-announcements"

    storage_client = storage.Client()

    # Note: Client.list_blobs requires at least package version 1.17.0.
    blobs = storage_client.list_blobs(bucket_name)

    # Note: The call returns a response only when the iterator is consumed.
    for blob in blobs:
        prefix = "https://storage.googleapis.com/reddit-blackout-announcements/"
        url = prefix + blob.name
        print(url)

if __name__ == '__main__':
    list_blobs("reddit-blackout-announcements")

I directed the output of the script into a file, and then put those URLs here. I probably should have made another script to do that part for me, but I was too lazy to be lazy and did some manual copying/pasting. They're sorted by letter of the alphabet, but not necessarily alphabetized within each category. Sorry about that. The order within each letter roughly corresponds to the order the files were downloaded.

I have not filtered this list in any way. There are porn subreddits, and other NSFW communities as well represented here. While each CSV file only contains the text of each subreddit's stickied posts, there may be content that is not appropriate for everyone.

There are 1658 subreddits represented here.

Click Here to download the entire list (.txt)

#123
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
S
T
U
V
W
X
Y
Z