HTML Search and Record
May 20, 2022
Neils Brügger has described five “web strata” or scopes to use the Web as an object of analysis.1 These strata are:
- the web element
- the web page
- the website
- the web sphere
- the web as a whole
Tools such as the Internet Archive’s WayBack machine or other web scrapers are good at targeting the web page or the website, but preserving individual web elements along with how users encounter them in their original contexts is more challenging. I have developed this Chrome extension to be a versatile tool to detect and record any arbitrary HTML element.
In my own work, I am using this for the study of Google’s reCAPTCHA. This test to tell Computers and Humans Apart is commonly found on online forms and throughout the Web and is used to minimize automated spam submissions and other supposedly non-genuine interactions.2 Google’s reCAPTCHA is a multi-faceted site of production where users perform labor by providing valuable AI/ML training data. Following Maurizio Lazzarato’s definition of immaterial labor, I view reCAPTCHAs as an interface element that contributes to the production of the informational and cultural content of a given website.
One approach to preserving reCAPTCHAs might see them as an individual fragment of a web page, and attempt to isolate it from its context. Though I am interested in the reCAPTCHA element specifically, I also want to preserve the entire user experience of solving the reCAPTCHA. Accordingly, I follow Niels Brügger’s proposed method of the screen movie—recording the screen as a user navigates a website to include the context of the reCAPTCHA along with how long it takes for the user to solve it.
The extension works by monitoring the current DOM for a specified HTML element with a specific attribute, and counting the number of appearances. In the case of reCAPTCHA, I search for any <iframe>
element with the attribute title="recaptcha"
to detect any reCAPTCHA element. The user can choose to receive a browser alert in these cases, which prompts them to being a screen recording that is saved locally to their computer. As my project develops, I want to develop a method for users to be able to easily preview these recordings and decide whether to share them with a researcher. This method preserves individual autonomy and control over privacy while also facilitating researchers' ability to preserve user interactions with specific Web elements.
I've released a version of the extension (along with its source code) under an MIT License so that other researchers may be able to use this approach in their own work. This version of the extension only saves videos locally, but retains the ability to specify any arbitrary HTML element and attribute. The extension can serve as a model for other researchers to use in their studies of online spaces and Web histories. It can be easily modified to detect and record interactions with any kind of Web content. Because the extension functions by detecting a specified HTML element, this can easily be customized to search for and record any arbitrary string of HTML code that a researcher is interested in.