General Guidelines for Digital Metadata
- Instructions for collection managers
- Accompanies the MWDL Dublin Core Application Profile, Version 2.0, July 11, 2011 (pdf)
The Guidelines below were developed by the Metadata Task Force of the Utah Academic Library Consortium Digitization Committee in 2010-2011 as part of the revision of the MWDL Dublin Core Application Profile. The Guidelines represent the current understanding of the Task Force and are subject to change. While the Guidelines were developed to enhance the interoperability of digital collections within the Mountain West Digital Library, many of them are relevant to other OAI-harvested environments, such as Scientific Commons and OAIster, as well.
To reference this document, please point to "the General Guidelines for Digital Metadata posted on Mountain West Digital Library website at http://mwdl.org."
Please notify the UALC Digitization Committee Metadata Task Force or the MWDL Program Director if you have corrections or additions to these Guidelines.
Mapping to Dublin Core
- Fields that are to be shared via OAI for harvesting should be mapped to Qualified Dublin Core (QDC) or simplified Dublin Core (DC). Mountain West Digital Library harvests QDC from servers where QDC is provided, and DC from other servers. Note: All CONTENTdm servers provide both QDC and DC by default.
- Local fields that you do not wish to share for harvesting should be mapped to "None". In CONTENTdm, fields that are set to be hidden from display are also unavailable for harvesting.
- Multiple local fields may be mapped to the same QDC or DC field. These fields will be shared via OAI as distinct fields with the same DC/QDC tag. Keep in mind that the harvester may concatenate these distinct fields into one field in the harvested environment. Therefore, to avoid the values of those fields being run together illegibly, place a semicolon at the end of each entry.
- At times you may wish to refrain from mapping more than one field to the same QDC or DC field. For example, if you are using both a Title and a Filing Title, both of which are mapped to <dcterms:title>, then they will both appear in the harvested "Title" field. This could be confusing to users. Decide on one of them to be mapped to "Title" (dcterms:title) and map the other to "Alternative" (dcterms:alternative) or to "None".
- You can view exactly what the MWDL harvester and other Open Archives Initiative (OAI) harvesters can retrieve from your digital assets management system by requesting the OAI stream via queries in a Web browser. Instructions for doing this are on the MWDL website page on Open Archives Initiative (OAI) Queries.
- Locally searchable:
In CONTENTdm, published collections whose metadata is not restricted are searchable. Within those collections, a field whose "Searchable" property is set to "Yes" is searchable within the local CONTENTdm environment. If a user searches within this field's collection only, the search will use the local field names. If a user searches across more than one collection, the search will use the Dublin Core-mapped or Qualified Dublin Core-mapped field names.
- Centrally searchable:
Only fields with these characteristics are shared for harvesting: (a) mapped to Dublin Core or Qualified Dublin Core and (b) in CONTENTdm and perhaps other systems, not hidden. (In CONTENTdm, hidden fields are not shared via OAI.) Therefore only those fields will be searchable in a central harvested environment such as Mountain West Digital Library.
Note: These two searchability characteristics are independent. Therefore, a field whose "Searchable" property is set to "No" could still be shared and searchable in the harvested environment by virtue of being mapped to DC or QDC. Also, a field whose "Searchable" property is set to "Yes" might not be searchable in the harvested environment by virtue of not being mapped to DC or QDC, or, in the case of CONTENTdm and perhaps other systems, by virtue of being hidden and therefore not shared via OAI.
Placeholder Data in Required Fields
It may happen that information necessary for required fields is not yet known or not yet included when a collection is first uploaded, or even published. In such a case, enter a placeholder to both fulfill the entry requirement and be able to find records for follow-up. The recommended placeholder is the word "Pending". Example:
Local field name vs. DC mapping
Each collection can have its own local field names. The "labels" indicated in the MWDL Dublin Core Application Profile are just indicative, and you are free to name your fields as you wish. However, you have to map the fields correctly. Only the mapping matters when a collection is harvested. If a field is not mapped, it will not appear in the MWDL record. If it is mapped to the wrong "DC map", the metadata will appear in the wrong MWDL field. Example:
The entity primarily responsible for making the resource has to be mapped to "Creator" (dc:creator). But the local name can be what you want: "Creator", "Artist", "Author", "Photographer", etc. If relevant to your collection, you may create several fields mapped to "Creator".
The value of the required field Identifier is the URI of the resource. This field is automatically created and mapped in CONTENTdm. You do not have to create this field and enter a value.
If you create additional Identifier fields in your collection, map them to "None", not to "Identifier". Only the automatically generated "reference URL" from CONTENTdm is allowed to be mapped to "Identifier".
When setting up the fields for your collection and starting to enter values, remember to treat Date fields differently. Here are some tips about configuring field properties and formatting dates.
Date Fields Setup
You can establish several different kinds of dates, if you like. The metadata standard requires you to enter the Date (original date). In CONTENTdm and possibly other systems, the field must not be hidden; in CONTENTdm, hidden fields are not shared via OAI and therefore can not be harvested. Also, we suggest you set the Date field to be searchable.
- Date (original date): Set this required field to have the data type of "Date" and to be searchable. Go to "fields" on the "collections" tab in the CONTENTdm Administration interface, click "edit" next to the Date field and set its properties:
- Field name: Date (or "Date.Original" if you prefer)
- Dublin Core mapping: Date
- Data type: Date
Note: The data type lets CONTENTdm know what sort of data to expect: text, date, or full text search. In CONTENTdm, this data type will constrain the format of your entry of metadata to one of the date formats that CONTENTdm allows.
- Searchable: Yes
- Hidden: No
Note: This date must not be hidden. In CONTENTdm and possibly other systems, hidden fields are not shared via OAI.
- Digitized Date (Date.Digital): You may wish to record the date that a resource was digitized, for local reference. Do not map this field. Only one field should be mapped to Date.
- Field name: Digitized Date (or "Date.Digital" if you prefer)
- Dublin Core mapping: None (to prevent it being harvested and creating confusion downstream)
- Data type: Date
- Searchable: No
- Hidden: No (or Yes if you prefer)
Unsure how to format dates? You can look at the CONTENTdm Help page on "Entering Dates". However, this is not quite a complete list. Here is a modified list that we think is more accurate for CONTENTdm 4.3 and above:
- Acceptable formats for import:
- When using the Media Editor, the Project Spreadsheet, or the Template Creator:
- dd-month yyyy
- When using a tab-delimited file for multiple-file imports:
- Stored formats: Dates are automatically converted in CONTENTdm to one of these storage formats when the record is imported or when the Media Editor is saved. This is also how the dates are shared via OAI (regardless of the display format chosen in CONTENTdm; see Display formats below). These formats are all compliant with the international standard for dates, ISO 8601.
- yyyy; yyyy; yyyy; yyyy [date ranges are converted to semicolon-separated list of single years]
- Display formats within CONTENTdm viewers: How dates are displayed in the Web templates can be configured. This does not affect the formats under which they are stored.
- Non-standard dates: In CONTENTdm, setting the data type of the Date field to "Date" constrains the entry of metadata to one of the date formats above, all of which require a four-digit year. None of these formats allows entry of Before Common Era (BCE)/BC dates, Common Era (CE)/AD dates before 1000, and other calendar systems. Label such non-standard dates appropriately, and set the field's data type to "Text" in order to allow non-date-formatted entry. Examples:
- BCE Date: 48 BCE;
- BCE Date: 1000-800 BCE;
- Date: 915 CE;
- Date: 404-415 AD;
- Hebrew Date: 5750;
- Islamic Date: Hijri 1350;
- Julian Date: 1849 AD;
If a collection consists of both standard dates and non-standard dates, it is recommended to set up two fields both mapped to Dublin Core date. One may be date-formatted while the other remains text to accommodate some of the forms above. Within the local CONTENTdm environment this limits searching within dates, but will allow the display of all forms of date information locally and in OAI-harvested records.
The Rights field in MWDL metadata records may contain information regarding copyright ownership or physical ownership. Physically owning something does not always mean copyright ownership. Making a digital version of a work also does not merit copyright protection because, according to the Bridgeman decision, http://www.law.cornell.edu/copyright/cases/36_FSupp2d_191.htm, it lacks sufficient original creativity (one of the tests to meet for copyright protection). For an explanation of the difference between copyright and physical ownership, see the following succinct overview: http://www.library.yale.edu/special_collections/copyright.html.
In formulating copyright statements, refer to your institution's copyright page. Or, if your institution does not have one, see Marriott Library's copyright resource page at http://tinyurl.com/5dy84f for more information and a list of tools to determine copyright status, etc.
Do you have rights to the material you are adding to your digital collection?
Copyright protects the creators of original literary, dramatic, musical, artistic, and certain other intellectual works (Title 17, U.S. Code). The protection extends to both published and unpublished material. "Section 106 of the 1976 Copyright Act generally gives the owner of copyright the exclusive right to do and to authorize others to do the following" (Copyright Basics, US Copyright Office):
- To reproduce the work
- To prepare derivative works
- To distribute copies of the work
- To perform the work publicly
- To display the work publicly
Use these questions as an initial guide:
Are you the original creator/author?
If yes, then you are the rights holder.
Did someone else create the work?
If so, then you are most likely not the rights holder.
Did someone assign rights to you through a written assignment?
If yes, then you are the rights holder.
If someone else created the work and did not assign rights to you, you will need to determine who the rights holder is. Determining a work's copyright status requires a bit of investigation, but there are many tools to assist with this.
Step 1: Research
Research U.S. Copyright Office registration records, http://cocatalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&PAGE=First. The catalog contains records from 1978 to the present. For works older than 1978, use Stanford's Copyright Renewal Database, http://collections.stanford.edu/copyrightrenewals/bin/page?forward=home. Search by author, creator, publisher, or title.
Step 2: Ask (if needed)
If you find a record, it usually means there's a rights holder and that entity (not necessarily the library) should be listed as the copyright holder and you may need to consider getting permission to digitize. See "The Basics of Getting Permission" for more information at http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/chapter1/1-b.html.
Step 3: Use Public Domain Slider
If there's not a record, check the Public Domain slider, http://librarycopyright.net/digitalslider/, to determine if it fits the criteria for public domain.
Once you've done some investigation and have an informed idea of the work's copyright status, consider using these sample copyright statements. The sample statements below also include wording in the case of unknown copyright status. In the sample wordings below, replace underlined text with applicable local information.
- For copyrighted works with all rights reserved:
- For copyrighted works with some permission built-in (Creative Commons):
- For public domain works:
- For works where copyright status is unknown:
© Personal/Corporate name, year, email/web address (if available). Transmission or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. Works not in the public domain cannot be commercially exploited without permission of the copyright owner. Responsibility for any use rests exclusively with the user.
© Personal/Corporate name, year, email/web address (if available). Use of this file is allowed in accordance with the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License; http://creativecommons.org/licenses/by-nc-nd/3.0/us/
Material in the public domain. No restrictions on use. If you wish to purchase print copies or a high-resolution version of the image, see [local site].
Copyright status unknown. Some material in these collections may be protected by the U.S. Copyright Law (Title 17, U.S.C.). In addition, the reproduction and/or commercial use of some materials may be restricted by gift or purchase agreements, donor restrictions, privacy and publicity rights, licensing agreements, and/or trademark rights. Distribution or reproduction of materials protected by copyright beyond that allowed by fair use requires the written permission of the copyright owners. To the extent that restrictions other than copyright apply, permission for distribution or reproduction from the applicable rights holder is also required. Responsibility for obtaining permissions and for any use rests exclusively with the user.
Let's say a digital collection contains a digital copy of an original photograph taken in 1907. The photograph is likely in the public domain (check the Public Domain slider). In this case the digital reproduction of the original is not eligible for copyright protection because it lacks sufficient creativity/originality. The Rights field indicates that the photograph is in the public domain with a statement like "Material in the public domain. No restrictions on use." However, the library that digitized the photograph offers prints of it for a fee, so the Rights statement explains that users can order copies of the digital image for a fee and provides a link to an order form and pricing information. The resulting statement looks like this:
Material in the public domain. No restrictions on use. To purchase print copies or a high-resolution version of the image, see [URL for webpage describing how to order].
No HTML tags within metadata
Metadata should be kept free of tags and formatting codes as much as possible since it is shared as text via OAI with MWDL and other harvesters like Scientific Commons. Because it is not predictable how metadata will be used, crosswalked, or formatted at the harvesting end, it is advisable to keep it "clean" of any tags.
- Do not use HTML tags within the values of your metadata fields. For example, do not use "<br>" or "<br />" within metadata fields to force a line break. Do not use "<em>", "<i>", "<strong>", "<b>", or other formatting tags within metadata fields. Even where CONTENTdm is configured to render these tags (as OCLC has configured it to render "<br>" or "<br />" by default), they will be included in the OAI stream and therefore shared with central harvesters. This leads to ineffective and ugly metadata in the harvested environment.
- CONTENTdm can be configured to recognize hard carriage returns, without using HTML tags, if you like. Nathan Pugh posted instructions to the CONTENT-L list. See his posting at http://listserv.oclc.org/scripts/wa.exe?A2=ind0804A&L=CONTENTDM-L&P=R2290. Ask your CONTENTdm system administrator to make this configuration change if you want to include line breaks in your metadata display within CONTENTdm. Keep in mind that these line breaks will not be included in the OAI stream and therefore will not appear in any centrally harvested environment.
Suggested subsets of certain controlled vocabularies
Suggested values for the format and masterFormat elements (downloadable text file)
This list of common combinations may serve as an initial list within your digital assets management system (copy and paste this downloadable text file).
* MIME types not yet registered in the list of Internet Media Types maintained by the Internet Assigned Numbers Authority (IANA) at http://www.iana.org/assignments/media-types.
Suggested values for the genre element
The genre element may use any terms from established genre/form vocabularies, including the following recommended vocabularies:
- Art and Architecture Thesaurus (AAT) at http://www.getty.edu/research/tools/vocabularies/aat
- Thesaurus for Graphic Materials II: Genre and Physical Characteristics Terms (TGM II) at http://www.loc.gov/rr/print/tgm2/
- Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT), a list of terms in progress. See http://www.loc.gov/catdir/cpso/genre_form_faq.pdf and http://www.loc.gov/catdir/cpso/genreformthesaurus.html.
The values contained in the downloadable text files below are selected subsets of each of the full vocabularies. One or more of these lists of common genre types may serve as an initial list within your digital assets management system. You can copy and paste the chosen list(s) of terms.
All three subsets are contained in a combined file, available as a PDF document as well as in its original Excel format: