The Next Big Thing: Archiving Tumblr in the shadow of GeoCities preservation

May 21st, 2013

The Yahoo corporation made recent headlines for the 1.2 billion dollar acquisition of the social blogging site Tumblr.com. With over 108 million blogs, Tumblr is a social networking and media sharing phenomenon. While many of the post popular Tumblr sites are related to pop culture, topics covered in the network include everything from political activism to creative writing, and notable institutions and leaders manage their own accounts (see The Smithsonian and The White House). In addition, many publishing and news organizations such as college newspapers are making the move to Tumblr-powered blogs.

t

A number of Archive-It Partners are archiving Tumblr sites, often capturing their own institution’s presence on the network, as well as other relevant content. You can search for Tumblr across Archive-it public collections and see the collecting organizations on the left of the search results.  This content is preserved in perpetuity and publicly accessible to patrons as it originally existed in it’s original form on the live web- an especially important consideration for Tumblr as the individual blog designs and the presentation of text, images, audio, and video can be as important as the text content itself.

Preserving Tumblr content is especially critical when considering the history of GeoCities.

In 1999 Yahoo notoriously purchased the popular website hosting service GeoCities for $3.57 billion in stock at the height of the dot-com bubble. After the purchase, a series of terms of use and pay for use changes that are not uncommon today (think Facebook/Instragram incidents earlier this year) incited a dramatic decrease in use among users who flocked to more modern web hosting and social networking sites. Once the third most popular website on the web, Yahoo notified users in 2009 that they were shutting down GeoCities.

Geocities homepage in 1998, a year before the Yahoo acquisition.  http://wayback.archive.org/web/19980703095403/http://www1.geocities.com/

Geocities homepage in 1998, a year before the Yahoo acquisition.
http://wayback.archive.org/web/19980703095403/http://www1.geocities.com/

Through the heroic effort of the volunteer based Archive Team, the Internet Archive, and countless other volunteer and collaborative groups, GeoCities sites that were live in 2009 were archived before Yahoo permanently shut it down. While the archive process was ad-hoc and incomplete, it was impressive considering the scope of the content archived the time frame presented. In addition, it was a non-profit effort comprised mainly of passionate internet users and archivists who correctly assumed that this was important cultural material that needed to be preserved and should not simply disappear when Yahoo! decided it would no longer host the content.

The GeoCities passing of hands in the 90’s is an important reminder for us that the networks and publishing platforms that seem so prominent and important to our online experience today may eventually fade and could potentially be lost forever. In addition, every new phenomenon online provides interesting challenges for web archivists. Consider for example that GeoCities sites were made in basic HTML, and  rarely exceeded 15MB with little to no dynamic content. A Tumblr site, on the other hand, could contain gigabytes of content including images, GIFs, and embedded audio and video. In addition, the social networking aspect of Tumblr means that while users and organizations have their own blogs, the content they post is often part of a complicated network of sharing, reposting, and remixing.

A Tumblr created by FEN, an Arab art and culture magazine. Archived by the Arab National Museum using Archive-It. http://archive-it.org/organizations/512

A Tumblr created by FEN, an Arab art and culture magazine. Archived by the Arab National Museum using Archive-It. http://archive-it.org/organizations/512

With over 250 partner organizations using Archive-It, we hope to see more archiving activity around Tumblr content at its peak today, before the next big social networking site comes along and we potentially see yet another shut down notice for users who entrust their content to third parties.