The Basics of findability and interoperability of digital collections
The Basics define "findability" as ‘a condition in which digital objects and its metadata are available, uniquely identified and reusable by humans and machines.' Using the Basics of findability for your digital collections, you ensure that your digital collections (for which usually substantial investments are already made) are accessible to a wider and more diverse group of users.
Implementing the guidelines of the Basics will make sure that digital objects can be found by search engines, persistent references can be made to digital objects and that the objects can be used on platforms other than your own website (think of initiatives like Europeana and other regional, thematic or national portals). Finally, the Basics are a first step to publish your collections as linked (open) data.
Four key conceptsThe Basics of findability are divided into four key concepts:
- Identification of data: the data have an unique, and preferably, sustainable identifier.
- Accessibility of data: the data are accessible through the Internet.
- Search engine readability of data: the data are presented in such a way search engines can index the data.
- Reuse of data: the data can (easily) be harvested and technically reused by third parties ( for instance through linked data) and linked to other data sets.
1. Identification of dataThe first step is to create an unique identifier for the digital object, the original analogue object if available and its metadata. This process is divided into three stages:
- In the first phase you assign every digital object with a unique identifier. This means that you provide every digital object (within the institution) with a unique number or code.
- In the second phase, you create a URI ( Uniform Resource Identifier ) for each digital object. This is a unique reference to a digital object. On the internet, URI’s are often represented as a URL (Uniform Resource Locator: a location reference on the server), a URN (a name reference, not tied to a physical location on a server) or a combination of both. A URI is the starting point for publishing your data as linked (open) data .
- The third phase you make sure your URI´s are persistent. The URI exists as an independent reference and will remain persistent, even if there are changes in the actual location of the digital source. The file location and the URI are connected through a table. This process is called ‘resolving’ or ‘redirecting’. There are currently a number of methods and standards available for persistent identification.
2. Accessibility of dataIn order to make the data accessible on the Internet, it must be able to convey the data by using the HTTP - (or secure HTTPS) protocol. You can also use the FTP protocol, especially for enabling the download of larger datasets.
3. Search engine readability of dataA few extra steps are needed to make the digital collections (so not only the website, but also its underlying databases) readable for search engines. As a minimum requirement, there should be a landing page for every digital object. In this way search engines can identify the all the available digital objects. This can either be a static page or a dynamic page. The Basics advice to use one of the options below, or a combination of both:
- Using Sitemaps: the Sitemaps protocol produces a list of URL’s and metadata for search engines to index the website and the underlying database.
- Using hypertext links: URL’s that are reached through hyperlinks can be indexed (spidering) by a search engine. You can, for instance, divide the collection into sections, sub-headings and so forth to enable browsing.
4. Reuse of dataThe Basics contain two requirements to enable technical reuse of data:
- Publish your data in a structured and open format to make it interoperable. An obvious choice would be to use (meaningful) XML (preferably validated XML through an XSD or DTD file).
- Use Dublin Core Metadata Element Set for the interoperability of metadata and use at least the limited set of 15 fields (simple Dublin Core) that provides a common ground for all cultural heritage institutions.
On top of these requirements THE BASICS also recommend to use:
- The OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) enables automatic retrieval from your information systems. The protocol allows automatic harvesting from various sources. OAI-PMH (together with Dublin Core) is an important and minimum requirement to make your data available to aggregators or portals such as Europeana.
- Alternative and specialized exchangeable metadata standards such as EAD (for archival collections) and LIDO (for museum collections), MODS or MARC XML (for library collections) are recommended for the exchange of a rich data set.
- Another possibility for sharing data is the use of RESTful API’s. This is a web service that uses the REST protocol and with the API it is possible to reuse the dataset (or parts of it) in a different context.
- RDF (Resource Description Framework) to publish your data in RDF triples. This is one of the key ingredients to linked data.
Licenses for reuse of dataThe reuse of data also has a legal component, but the Basics of findability only focusses on the technical side of reuse. A question like “to which extend do I want to open up my collections for reuse?” will be addressed in the soon to be published Basics for copyright.
The following guidelines are the minimum standards:
- Identification of the data: Use a persistent URI by implementing a method like Handle, PURL or OpenURL.
- Accessibility of data: Provide your data through HTTP(s) or FTP.
- Search Engine readability: Create an HTML landing page for each digital object. We recommend the use of the sitemap protocol and/or hypertext links to all URL labelled digital objects.
- Reuse of data: Present the data in a structured and open format (meaningful XML validation combined with an XML Schema definition (XSD) and Simple Dublin Core). We also recommend using RESTful API’s, OAI-PMH, JSON and RDF.
Liability and contribution to the BasicsThis text is a revised version of the Basics. The first version was written in 2007 and reviewed in 2013 during a meeting with Dutch experts working in the field of digital heritage. Professionals are also invited to comment on this text and share their experience with the Basics through www.den.nl/debasis or by emailing us: email@example.com.
Laatst gewijzigd: 05-01-2016