Setting Up a Repository for Harvest
- Instructions for new hubs of the Mountain West Digital Library
The Mountain West Digital Library welcomes new hosting hubs to the network of digital collections about our region. These guidelines will help get you started. Please contact Sandra McIntyre, program director, at firstname.lastname@example.org, for more information or to offer your repository for harvest.
Offering Your Repository for Harvest
What we need to know:
- The baseURL of the repository's OAI provider, including the port, if specified
- The metadata format of the records, either "qdc" for Qualified Dublin Core (preferred) or "oai_dc".
- The setSpec and setName of each collection you wish to have harvested, along with the collection partner that manages each set (if you are hosting collections for other collection partners besides your own institution)
How we use OAI to harvest:
- Our harvesting system in Ex Libris Primo sends a standard "Identify" request first to verify that the OAI repository is functioning.
- Then it sends a "ListRecords" request with "from" and "until" parameters to obtain the first batch of metadata records from the repository.
- Additional "ListRecords" requests with appropriate "resumptionToken" parameters, are sent as needed to get the full listing of records.
- We run a number of normalization routines on the harvested records to transform the Dublin Core metadata into Primo normalized XML.
- Normally, we do a full initial harvest only once and then do incremental harvests weekly, using the "from" and "until" parameters on the weekly "ListRecords" request.
IP Addresses and Systems Administration:
- If your systems admininstrators have the practice of keeping whitelists for access to your OAI provision, the address they will need to add to ensure that we will be able to harvest your collections is 22.214.171.124
For information about checking your repository's OAI stream, see Open Archives Initiative (OAI) Queries.
Providing Metadata using the Open Archives Initiative Protocol
MWDL can harvest any OAI-compliant stream. The requirements for the OAI protocol for metadata harvesting are spelled out at http://openarchives.org.
A few notes:
- We recommend that you use a digital assets management system, such as CONTENTdm or DSpace, that includes built-in OAI metadata provision that is easy to configure.
- In CONTENTdm, on the server configuration tab, ensure that the "Enable OAI" setting is set to "Yes." Ensure that the "Enable compound object pages" setting is set to "No." We want to harvest your metadata at the object level only.
- Our default metadata format for harvest is "qdc" (Qualified Dublin Core) but we can accept "oai_dc" if need be, although it sacrifices a lot of the metadata complexity.
- If you have a repository that does not have built-in OAI metadata provision, please implement one of the many open-source or low-cost OAI provider tools. We strongly advise against creating your own OAI provider module. The OAI protocol seems simple and straightforward, but it has multiple functions that must be implemented precisely, and it is more complicated and time-consuming to program and test than it initially appears. Please understand that we do not have time to assist in developing or testing "home-grown" OAI providers.
- Take advantage of the OAI "sets" implementation to separate the different collections. MWDL can harvest separate sets. Or we can harvest all records and tag only certain sets for display.
- We recommend that you implement deleted record status. This is not required of OAI repositories but, without it, we have no way to remove from our harvester the records that you delete locally, except by a full delete-and-reload of your entire repository, which we prefer not to do (very often).
- While any system for assigning unique identifiers is acceptable to OAI, we recommend you generate a meaningful OAI identifier that is related to the setSpec and item number in your digital assets management repository. An example of such a identifier is "oai:images.archives.gov:ead/6", where "images.archives.gov" is the domain, "ead" is the set and "6" is the item number. This makes it easy for our harvester to create links to the items in your repository without having to resort to the <dc:identifier> field.
- If you must develop your own OAI provider, run it through the OAI Repository Explorer at http://re.cs.uct.ac.za to test for compliance with the OAI protocol.
- There is an oai-implementers listserve that provides a great place to ask questions.
MWDL includes digital collections for search on our search portal at http://mwdl.org only by explicit permission of the repository managers. To request that metadata for your digital collection(s) be added to or deleted from the MWDL database, please contact Sandra McIntyre, program director, at email@example.com.