Let’s Talk About Web Archiving: NEA Web Archiving Birds of a Feather Recap

April 17th, 2013

By Tessa Fallon

Tessa Fallon (Outreach and Education Consultant) and Lori Donovan (Partner Specialist) will be coordinating “Birds of a Feather” sessions at 2013 conferences. The first BOAF was at the New England Archivists (NEA) spring conference on March 22 . The next BOAF will take place at the Midwest Archives Conference (MAC) in Indianapolis, Friday April 19.


From Flickr User See-ming Lee

Why Birds of a Feather (BOAF)?  There are a number of international conferences and standalone sessions at national conferences that relate to web archiving, but we thought it would be great to give people a chance to ask questions in a more informal setting.  BOAF sessions give aspiring web archivists a chance to ask questions to more experienced web archivists; as well as provide an opportunity for colleagues to speak with each other about ongoing projects and initiatives.

We held our inaugural Birds of a Feather Session at the New England Archivists Conference in March 2013, and I was pleasantly surprised to find that 82 people had registered.  As a result we had to deviate a bit from our planned roundtable discussion format, but there were still a lot of great questions and discussion.  In addition to the questions that arose in the session, we also gave folks an opportunity to ask questions via a pre-conference survey.  From that survey arose the “Top Ten Web Archiving Questions” many of which were addressed during an opening presentation by Lori:

1.What is web archiving?

2.What is being archived?

3.How is it being archived?

4.Who is archiving websites?

5.How are web archives used?

6.How does our institution get started?

7.What skills do you need for web archiving?

8.What standards are used in web archiving?

9.What about social media?

10.How do I learn more about web archiving?

The floor was then opened to general questions.  Some folks in the audience had web archiving experience but many did not, and it was helpful to have a range of web archiving knowledge among participants.  I was also pleased to see a number of students in the audience who are currently studying web archiving in their respective programs.

Here are a just a couple of the many questions (and answers) discussed during the session:

Question: How might it be possible to avoid duplicating efforts in web archiving (for example, different institutions collecting the same materials)?

Answer: Direct collaboration between collecting organizations can certainly reduce duplication of effort. In addition, it can be important for web archivists to stay “in the know” about publicly available directories of web archives. For example, the Archive-It public site allows for browsing across all organizations. Other tools are being developed within the web archiving field, including websites and even browser based tools to check and see which URLs are part of the most common web archives (see the Memento project tools). Archive-It partners also have the option of publishing their metadata through the Open Archives Initiative feed, allowing their collections to be discovered via services like Worldcat.

Ultimately, duplication might not have such a negative connotation in web archiving as no one organization can capture all versions of a continually updated website.  In addition, a website may have been archived, but what is to say that it was archived completely or in the same context? Other considerations for a collecting organization confronted with archiving a website that may already exist within another collection is access to the archived files, curation within their own collections, and integration with their own records management system or mandate to collect, for example.

Question: What about private information on the web? Will that be archived?

Answer: The notion that information not publicly available will inadvertently end up in web archives is a common misconception. It is important to understand that web crawlers used by web archives will only collect publicly available material and do not hack accounts or websites. Archive-It’s 4.8 release will allow for crawling of password-protected material by supplying the crawler with a valid ID and password, but many crawlers do not have that capacity.  For Archive-It, no content behind a login screen will be archived without the correct credentials supplied by the user, and the same public access restrictions available to Archive-It partners remain, so institutions can decide for themselves what content to share with their patrons.


Check the blog soon for more questions and answers from the Web Archiving BOAF sessions! If you will be attending the Midwest Archives Conference this April, be sure to stop by our MAC Web Archiving Birds of a Feather session on April 19th at 12:20 p.m!

Resources and links discussed at the NEA BOAF presentation:

   •    IIPC www.netpreserve.org

   •    Internet Archive http://archive.org

   •    LC Digital Preservation Blog “The Signal” http://blogs.loc.gov/digitalpreservation

   •    SAA Web Archiving Roundtable http://webarchivingrt.wordpress.com

   •    Twitter hashtag: #webarchiving

   •    UK Web Archive blog http://britishlibrary.typepad.co.uk/webarchive

   •    Nicholas Taylor, “Tool Academy: Web Archiving” presentation (slides) 2012

   •    Archive-It, Web Archiving Lifecycle Model (pdf) 2013