February Collection Spotlight: Gold Medal Edition

February 25th, 2014

Screen Shot 2014-02-18 at 3.26.14 PM


Library of Congress’s Signal Blog posted about the International Internet Preservation Consortium (IIPC) Sochi Olympic collection, a collaboration between IIPC member institutions,  the Internet Archive, and Archive-It.

It is a great read if you are curious how an international collaborative web archiving project is organized, and how websites are selected to be archived.

For this current 2014 Olympics project, the various IIPC member institutions are all recommending their own list of websites to be included.   For example, the Library of Congress has recommended 131 web sites.  As described by Michael Neubert, Supervisory Digital Projects Specialist here at the Library: “The selection of most sites for such collections is mechanical, in that we know we want sites for the various US teams – each team sport has its own site, for example, then along with that site there will be various social media sites/channels.  In order to optimize the crawls, we nominate the social media separately. In addition to the team sites, we also chose a limited number of news media sites where the coverage of the Olympics seemed segregated from the rest of the site.”

After the URLs are selected to be archived, Internet Archive crawl engineer Adam Miller captures the sites on a weekly basis and Archive-It partner specialist Sylvie Rollason-Cass oversees the process in the Archive-It web application, including scoping, crawl management, quality assurance, and metadata upload. While still a work in progress, the collection is publicly available on Archive-it.org here.

Read more on the Signal.