Expand the reach of Archive-It collections with these access integrations

September 6th, 2018

By Jillian Lohndorf, Web Archivist for Archive-It


Archive-It partners can enable direct access to their captures, collection descriptions, and other related metadata beyond the default portal of archive-it.org, customizing where and how patrons find and explore their web archives. Have you thought about integrating your Archive-It web collections into your own front-end access systems?  Not sure what your options are, or where to start? Wondering how much technical know-how is needed? There are lots of options! Partners have, for instance, designed their own custom portals to web archives that integrate archived web content right into their own websites by using the tools described below. For more information and instructions to use each of these yourself, see the Archive-It Help Center entry: Archive-It Access Integrations.


Full-text search

It’s possible to integrate Archive-It’s full-text search capabilities into your own website or catalog by using the OpenSearch API. This can be used to create a standalone search box, as Princeton Theological Seminary has created, to search and display results from their web archives on a  page on their own website.


OpenSearch portal designed by Princeton Theological Seminary

Custom web archive search portal powered by OpenSearch


OpenSearch can furthermore be used to pull data into an existing catalog of other items or item types. The New York Art Resources Consortium (NYARC), for instance, uses the OpenSearch API to integrate full-text search results from Archive-It into NYARC Discovery, their custom-built federated search layer. This means that results from their Archive-It web archive collections appear alongside physical and other digital results.  


Web archives appearing among full-text search results in NYARC Discovery

Web archival captures appear among the search results in NYARC Discovery



Describing web archives at the collection level is a key way that many partners inform users about their organizations, collecting policies, or point to other relevant content, so having those descriptions  appear across as many access points as possible can be a strong advantage. One useful tool that Archive-It partners can use to this end is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). By enabling OAI-PMH (just the click of a button), partners expose collection descriptions to any endpoints that harvest metadata from it.  Archive-It’s OAI-PMH feed is automatically harvested by OCLC’s WorldCat Gateway service, so using it means that web collections will show up in searches on WorldCat.


Example of a web archive collection harvested by OAI-PMH in WorldCat


Collection-specific capture index

The capture index (CDX) guides end users of web archives to the specific captures, much like the index at the back of a book guides them to the information on a given page. This, for instance, is how Wayback calendar pages are populated with access points to specific dates, but partners can customize and enhance these connections to make even more kinds of browsable interfaces. Princeton Theological Seminary, for example, pulls results from querying the CDX/C API into a faceted display of capture times for seeds in their collections.


Browse-by-date interface designed by Princeton Theological Seminary using the CDX/C API

Faceted browsing of capture dates pulled from the Archive-It CDX/C API


Make it what you want!

Looking forward, Internet Archive staff are developing an Archive-It partner data API to further extend options for sharing descriptive and technical/provenance metadata about crawls and captures between systems. The University at Albany SUNY, for instances, has piloted requesting crawl information like crawl start and end times, duration, and any scoping rules that were in place for a crawl, to automatically populate the records in their public access point for web archives. Are there more data points that you would include in yours? Let Archive-It’s web archivists know if you would like to beta test this new access integration.


Description of a web archive by University at Albany SUNY using the partner data API

Partner data API calls enrich the “acquisition information” fields of SUNY Albany’s web archive descriptions.


Doing something neat with your Archive-It data, but don’t see yourself on this list?  Please, let Archive-It’s web archivists and your peers know by commenting here!