Introducing Archive-it 4.9 and Umbra

March 13th, 2014

Today Archive-It released version 4.9, which includes an important first step in implementing Umbra, an additional capture mechanism that works along with Heritrix to help archive dynamic web content. While Heritrix has been improved greatly over the years and continues to be the de-facto solution for web archiving, including sites with Javascript, the rapidly changing web which includes social media and other dynamic content requires new solutions.

Starting today, Archive-It partners will benefit from the integration of Umbra into the web application. This is a completely “under the hood” improvement, and will not impact the user experience in Archive-It.

What is Umbra?

Umbra works in conjunction with the Heritrix crawler and improves the capture of dynamic web content, most commonly seen in social networking sites like Facebook that utilize “client-side scripting”, which can be archive-unfriendly. While Heritrix is a web crawler, Umbra works much like a graphical browser that can mimic certain user behaviors including scrolling through a page and loading prior content on a Facebook timeline, for example.

Today, we are happy to announce that Archive-It partners can use this technology to archive videos posted on In addition, partners can use Umbra to capture content from Facebook and Flickr seeds.

What’s Next:

Together, Heritrix and Umbra are an excellent team to archive content from a rapidly changing web and we can expect these tools to continue to improve and be built upon as we develop our next release, Archive It 5.0,  a major overhaul of the web application and user experience.