Only 41% of Occupy Movement URLs accessible on live web

July 29th, 2014

Screen Shot 2014-07-24 at 2.10.21 PM

The Occupy Leeds homepage, archived at


On July 22nd, Partner Specialists Maria La Calle and Scott Reed presented a poster at the annual Digital Preservation conference in Washington DC. The poster focused on an analysis of the Occupy Web Archive, a collection started in December 2011 by the Archive-It team and the larger web archiving community to collaboratively identify and capture websites related to the quickly mobilizing Occupy Movement.

Ultimately 933 seed URLS were selected as part of the collection, including city-specific occupy websites, social media sites, and news articles from alternative and traditional media outlets. The collection is available to the public here.

More than two years later, we wanted to understand just how ephemeral this web content was. If 933 websites were archived, how much of this content is live on the web today? Is there a significant difference between the lifespan of news media versus social media pages? Are Occupy movement websites, often the primary announcement hub for activists, still accessible at their original locations on the web?

Each URL was visited on the live web in April 2014 to determine if the site was serving a 404 error. Using a human to check the URL, rather an automated process, allowed for closer analysis of the live content to determine if it was on topic. Many of these domains had fallen victim to “cyber squatting”, or the process of purchasing domains after they have expired from their original owners and placing what is often automatically generated ad content.

For example, Occupy Leeds became Occupy “Leads”:

occupy blog leads


The findings of the research are as follows:

Of 582 movement websites archived, only 41% are still live on the web.

Of 203 social media URLs archived, 85% are still live on the web.

Of 163 news articles archived, 90% are still live on the web.

The limited lifespan of the Occupy Movement websites illustrates the relative urgency of capturing sites like these while they’re active. Our research showed that a significant amount of content related to the movement on major news and social networking sites was still accessible on the live web (85%), however over half (59%) of all movement websites were either serving 404s or had been taken over by cyber squatters. While we in the web archiving community understand the sensitive lifespan of a website, this project further elucidates the ephemerality of political and activist content on the web.

The complete poster is accessible here.