Other Projects<!-- --> | <!-- -->Ben Pettis
Ben Pettis

Other Projects

HTML Search and Record

other

Posted: May 20, 2022

A cartoon image of an open file folder with the Google reCAPTCHA logo appearing out of it

This is a Chrome extension that detects the presence of reCAPTCHAs on a Web page and invites the user to record and preserve their interaction. By detecting specified HTML elements within a Web page, the extension enables researchers to preserve users’ interactions with an interface without needing to continuously (and invasively) record their browsing. The extension aims to balance the priorities of Web preservation with user privacy and autonomy. This represents a new approach to Web preservation that may be useful to other digital humanities projects by attending to ephemeral user interactions that other preservation tools are not as well-suited for.

Read More...

Adobe Acrobat PDF Fixer

other

Posted: January 1, 2022

A screenshot of the Adobe Acrobat interface. It is in dark mode, so the windows are a dark grey and there is a single page of text in the center

Do you have a professor who provides PDFs that look like they’ve been run through a meat grinder? Are you a professor who provides PDFs that contain multiple scanned pages on a single page? We’ve all been there – someone has generously provided us with a PDF copy of a book chapter or article. But there’s one problem – none of the text is actually searchable. And each page of the PDF actually contains two pages of scanned text. PDFs in this format are not accessible to people using screen readers, and keeping multiple scanned pages on each PDF page can make navigating the document inconvenient and cumbersome. Adobe Acrobat can make those PDFs more usable, but manually fixing your scans each time is tedious. This Acrobat Action will semi-automatically process a scanned PDF to separate each scanned page onto its own PDF page and run the entire document through OCR to create searchable and selectable text.

Read More...

The Mall

other

Posted: January 1, 2020

A brown background with several white rectangles with small drawings on them scattered throughout. The white rectangles are connected to each other with small orange lines, like a web

For some reason, when I was in elementary school I became obsessed with creating and drawing imaginary storefronts and the various items that each of them sold. The drawing skills are sub-par at best, and the actual spatial organization of the mall lacks any logic or structure whatsoever. But it was the third grade. Sue me. Years later, while visiting my parents and cleaning out some of my old stuff I rediscovered a folder full of these drawings. Now armed with the technical know-how (aka the confidence to Google and poke around with basic JavaScript), I set out to scan these old drawings and finally connect all these stores with one another like younger-me had always envisioned.

Read More...

4chan Scraper

other

Posted: January 1, 2019

A screenshot of the 4chan /pol/ politically incorrect imageboard.

This simple Python script uses 4chan's read-only APIs to scrape the information from the front page of a given imageboard. In addition to saving every image posted to the board, the script will also generate multiple CSV files that record which threads were on the front page at a given time. A folder is generated for each thread's images, as well as an individual CSV file that records each reply in the thread as well. I have done some research on anonymous online communities, the ways they communicate with one another, and how they're able to influence real events in the physical world. Rather than manually browsing and downloading content from 4chan imageboards, I built this script to automatically scrape the most recent content from a given 4chan imageboard.

Read More...