Exploring web archives as data: An introduction to data analysis and instruction

National Gallery of Art Library | July 25, 2023

Photo of the National Gallery of Art's East Building in 2018

The Internet Archive and the New York Art Resources Consortium (NYARC) invite applications for Exploring web archives as data: An introduction to data analysis and instruction. The daylong workshop will be hosted by the National Gallery of Art Library in Washington, DC, on July 25, 2023, preceding ARCHIVES*RECORDS 2023.


Every day, significant cultural production occurs globally across the web. As a medium that can be both the repository for records and a form of record itself, the web presents challenges of scale and complexity for those that seek to preserve modern history and integrate it in research and teaching.

Participants in this workshop will learn web archive research use cases, how to create archival research collections from the web, and how to analyze web archive collection data computationally. They will gain hands-on experience using the Internet Archive’s software services Archive-It and ARCH, and widely supported tools for cultural data exploration and visualization. Workshop materials will include web archive data collected by the Collaborative ART Archive (CARTA).

Support for participation:

Photo of the National Gallery of Art's East Building in 2018

This full-day workshop led by Internet Archive staff is free, however registration is limited. Thanks to the generous support from the Institute for Museum and Library Services, travel stipends (up to $1,000 each) are available to offset participants’ air and/or ground transportation, parking, two nights’ lodging, and food costs incurred to join us at NGA.

Call for applications

The application period has closed.

Event Details:


National Gallery of Art Library
4th St NW & Constitution Ave. NW
Washington, DC 20001


Date: Tuesday, July 25, 2023

Time (approx.): 9:30am – 4:30pm

Preliminary agenda:

We will keep updating this as our program takes shape…

9:30am – Registration/Coffee

10:00am – Welcome, review agenda, introductions

10:30am – Web archives as primary sources

11:00am – Web archiving workshop

12:00pm – Lunch (provided)

1:00pm – Introduction to ARCH and web archive collection data

1:30pm – Data analysis workshop

2:30pm – Break

3:00pm – Sharing workshop outputs/insights/etc

3:30pm – Brainstorm/Discuss future research and instruction needs

4:30pm – Wrap up

5:00pm – Building closes

Hotels / Accommodation:

The Washington Hilton Hotel is the conference hotel for ARCHIVES*RECORDS, and attendees of the conference can book a room at a discounted rate.

Many thanks to the Archives Unleashed team, whose comprehensive web archive datathons provided inspiration for this event.