Teaching Digital Curation with Archive-It

March 9th, 2016

by Karl-Rainer Blumenthal

Many of our university-based partners take advantage of their Archive-It accounts by using them as teaching tools in their courses on digital curation, digital preservation, and related topics across the spectrum of library, archival, and computer science disciplines. These educational partnerships come in all shapes and sizes, from the semester-long course to the short exploratory assignment. To learn more about them (and start devising your own!), be sure to check out our new Help Center’s page: Educational Partnerships.

We recently talked with San Jose State University School of Information lecturer Alyce Scott and her student Jim Broutzos about the educational uses of Archive-It. Scott’s recent survey course on digital curation themes and practices introduced graduate students to web archiving by way of a group assignment to build and describe a collection with Archive-It, which Boutzos and team used to create this notable collection: Just KITTEN Around!. We asked each about what web archiving and Archive-It in particular can teach the next generation of digital curators.



Following the Tracks, as archived by San Jose State University students for their Freighthopping in the United States collection.


Karl-Rainer Blumenthal: Alyce, your seminar course, “Tools, Services, and Methodologies for Digital Curation,” includes a comprehensive module on web archiving. What core competencies of digital curation does web archiving help you to build among your students?

Alyce Scott: The official core competencies that the course, and the web archiving section in particular, address are the basic concepts and principles related to the selection, evaluation, organization, and preservation of physical and digital information items. What this means, in practical terms, is that my students (through the reading, lectures, and the web archiving assignment) gain an understanding of four main things:

  1. Project planning and management
  2. Defining a collection
  3. Creating useful metadata
  4. Basic knowledge of US copyright laws (I hope to expand this to include some international copyright as well)

KB: What products do your students deliver in order to demonstrate their understanding of web archiving?

AS: I organize the students in small groups, and they are required to design, execute, and critique a web crawl on a topic of their choice (generally humanities-based) using Archive-It to harvest and preserve the collection. I ask them to remember that they are archiving for future generations as well as users in the present. This specifically impacts how they shape the collection description. The group scopes the collection; troubleshoots media file format issues; creates metadata; deals with those pesky robot.txt files and copyright issues; and learns about the architecture of the web.

I encourage the students to seek out topics/sites that have not previously been archived, but that is not a strict requirement. My students have been quite resourceful in this respect though – and they have made good use of the Wayback Machine! The main thing is that they curate a collection of sites that have some common thread or subject.

One of the key aspects here is the determination of copyright status. Although I believe that their web archiving can be considered fair use, I do require them to look at the targeted websites for any copyright notices and/or notices about what users are allowed to do (e.g. archive for personal use; archive for public use in libraries; archive to publish on the web). I also ask them to contact the site owner or webmaster to obtain permission to archive the site. The interesting this about this is the lack of knowledge and/or understating of the importance of web archiving among website owners. My students are required, as you may have noticed in their collection metadata, to create a rights management statement.

Finally the group must submit a document that contains the following information (they also present this information in a formal presentation):

  • Description of and rationale for the web archive collection. What is the theme or topic of the collection, and how did they arrive there?
  • A list of the seeds that make up the collection
  • A discussion of how the collection was scoped or filtered; any scoping adjustments; filters created to define the types of files to capture
  • What was chosen to capture for each site or seed: the entire site, one or more directories, or one or more subdomains? How were these decisions made?
  • What type of content was archived in the course of the crawls? What major rendering problems were encountered, and what troubleshooting was performed? Any other technical issues (e.g., crawl traps, robots.txt files, etc.)?
  • What are some of the major takeaways from this project? What did you learn, and what surprised you?

KB: Did any aspect of your students’ approach to this topic surprise you, or teach you anything new about them?

AS: I learned that, even under duress, graduate students can still maintain a sense of humor! Seriously though, I was (and still am) constantly curious about their thought processes during the topic selection. They generally seem to tend toward more serious collections – Black Lives Matter, Ebola Outbreak, Wind Energy Resources of the Eastern United States. The collections I find most interesting are centered on the more unusual and obscure topics: Codicology, Freighthopping in the United States, and Just KITTEN Around!

KB:  Jim, how did you choose your collecting theme for this assignment?

Jim Broutzos: As it is with any group project of this type, my teammates and I spent a great deal of time brainstorming possible topics for archiving. Through these discussions, we discovered that we are all “crazy cat people,” and that when procrastinating, we often turn to internet cats for diversion. The dominance of felines on the internet (especially social media) cannot be understated, and we decided that was enough to make the phenomenon an ideal candidate for archiving. As we conducted preliminary research, we realized that cat humor was nothing new, and that the appeal far transcends the internet. Humans have been dressing up cats, and using them to represent our emotions for centuries — such as Harry Pointer’s cartes de visite in the 1870s, and more famously, Harry Whittier Frees’s postcards of the early 20th Century. It was then that my group realized the cultural importance of archiving these cat sites. It is clear that cat humor will never go out of style. My group was confident that cats wearing dresses on the internet would be just as culturally relevant in the future as the works of Harry Pointer and Harry Whittier are today. We think that is worth preserving.



Dress a Cat, an interactive site, as archived by Jim and teammates for the Just KITTEN Around! collection.

Harry Whittier Frees

Harry Whittier Frees, “Rosie Bufkins Gave Jennie an Airing.” Illustration from The Little Folks of Animal Land, 1915.


KB:  What aspects of your experience might impact your approach to curating digital resources more generally in the future?

JB: This assignment underscored the challenges of creating metadata that effectively describe websites. Namely, how do you choose terms that accurately describe a website featuring cats bouncing across a computer screen? How will these terms denote meaning to users in the future? One of the biggest challenges relating to metadata for this assignment came from the fact that many of the sites deal with humor, which of course is very subjective. This challenged us to think outside of the box when creating our metadata, and highlighted the fact that describing digital resources presents many challenges that are not so prevalent when dealing with materials in the physical world.

Obtaining permissions from copyright holders is time-consuming, and sometimes it is like pulling teeth. This assignment taught us to conduct copyright research, and to seek copyright permissions, as early in the game as possible. While many of the website owners were accommodating, there were still a few who were not necessarily willing participants in the beginning, and it took a lot of back-and-forth to finally get them to agree.

Choosing the theme of your collection was also another important takeaway. Why is this particular collection worthy of preservation over others? What lasting value will this have in the future? These were questions we had to ask ourselves throughout the planning process – especially in light of the fact that we chose to archive a collection of funny cat websites.