Starting small but dreaming big: A beginner’s journey to web archiving

September 1st, 2021

by Yoo Young Lee, Head, Information Technology, University of Ottawa Library

This blog post was adapted and modified from my presentation at the IIPC Web Archiving Conference on June 16, 2021.

My journey to web archiving started suddenly, with an invitation to preserve the web content of the National Aboriginal Health Organization (NAHO) in 2017, when I had just joined the University of Ottawa Library as a Web and Digital Initiatives librarian. I had no prior web archiving experience, but the official NAHO website was scheduled to come down by the end of that year and needed a long-term home.

Fortunately, I had all of the necessary tools available to me: an Archive-It subscription from the Ontario Council of University Libraries (OCUL) via Scholars Portal; and a list of web content to be archived, including but not limited to the NAHO website. I still needed to figure out how to use the Archive-It tools, review crawl reports, conduct quality assurance (QA), and add descriptive metadata, so I focused on the steps to ensure that the proper material was archived before it was too late.

Screenshot from the NAHO web archive

Screenshot from the NAHO web archive.

My go-to resource as a beginner who’d never used these web archiving tools before was the Archive-It Help Center. I followed its extensive curriculum, attended live technology training webinars, and submitted lots of requests when I wasn’t sure about my next best step.

​​With the first successful web archiving project completed at the library, our Web Archiving Working Group formed in 2018 to develop web archiving policy, adopted from the Website Collection Policy (PDF) by our peers at the University of Victoria Libraries, to illustrate the selection and scope of such an activity at a small scale. Due to limited resources, the library’s main approach is to follow the lead of website owners who need the service, not to seek them out.

In order to advance my knowledge further and learn more about selection and appraisal from my colleagues, I joined the Canadian Web Archiving Coalition (CWAC), participated in its communities of practice, and attended the 2019 Archives Unleashed Datathon in Washington, DC. From these meetings, I learned how others organize web archiving and how an institution like ours could document hyper-local yet critical events even when there is no dedicated staff to initiate a project.

In Canada, Library and Archives Canada (LAC) leads web archiving projects at broad and scales, to document important events in Canadian history as they unfold. While we can’t contribute at that scale and we don’t want to duplicate efforts, the University of Ottawa Library can record historical moments in our local context which might be missed by more automatic and systematic harvesting.

For example, Marina Bokovay (Head, Archives and Special Collections) and I came up with a list of archivable websites, tweets, news articles, and uOttawa’s official response to a racial discrimination incident on campus in June, 2019. This incident outraged the community and inspired conversations on social media with hashtag #BlackOnCampus. uOttawa announced a wide-ranging approach to combat racism and discrimination and promote diversity, inclusion, and equity, and it is important for us to document progress across updates on several websites. Using Archive-It and Twarc, I was able to document changes in how and when the University addresses these issues.

Screenshot of one #uOttawa #BlackOnCampus seed’s capture calendar

Screenshot of one #uOttawa #BlackOnCampus seed’s capture calendar.

My concern at that time was to preserve conversations and community’s responses, so I focused on collecting as much as possible, even if I could not be as selective as usual in our archival practice. The uOttawa library also has a mandate to collect English and French materials, which doubled my work. Most tools and platforms are designed to optimize for an English language, so it was often difficult for me to distinguish one language from the other automatically while collecting.

However, this broad and immediate collecting strategy could also cause further harm to the same marginalized individuals and communities on campus, so I decided to restrict access to this new web archive collection until I can review and establish more complete processes for archiving live events. The Documenting the Now White Paper (PDF) has been a great resource to me as a beginner, to consider the various perspectives and ethical considerations of social media especially.

​​With a small team responsible for web archiving, the library has still been able to archive several websites and live events, including COVID-19. Our strategy is to capture local topics at uOttawa that could be missed by national efforts. In the meantime, I work with faculty to archive their online scholarly work that is at risk of disappearing due to server decommission or migration.

uOttawa’s web archiving effort is small compared to other leading institutions’, but we learn and incorporate from their systematic practices. What I’ve learned from this journey is that web archiving doesn’t need to be universal or perfectly complete to preserve “all” of the right information. It can start small, dream big, and eventually I hope to add to the knowledge base of a community of web archivists!