Digital Object Identifiers (DOIs) are unique character strings used to identify an object such as an electronic document. Historically they have been associated with scholarly materials such as journal articles and books, but they are now being used more and more for other published works, including research datasets. The International Federation of Digital Seismograph Networks (FDSN) is planning on publishing DOIs for member seismic networks in 2014. This is to ensure that seismic network operators under the FDSN umbrella receive full credit and citation(s) if scientists use their respective seismic network data in future published articles.
The main advantage of DOIs is that they make both an object and citations to the object findable.
Suppose that Jane Foo and Jack Bar write a paper titled Using A/B Testing on a Seismic Data Set. Their paper appears in a number of citations using a variety of different formats which may include a few minor inaccuracies:
- J Foo and J Bar, Using A/B Testing on a Seismic Dataset
- Jane Foo & Jack Bar, Using A/B Testing on Seismic Data Sets
It may be difficult for someone reading the citation to find the original paper — the title isn’t an exact match, and the words may be common enough to return a large number of unrelated results.
At the same time, the original authors may not be able to find all the citations to their paper — again, there’s no obvious search term that will return all citations.
Enter the DOI. The authors register a DOI for their article (usually through the journal publishing it) and link the DOI to the article’s URL. This article is assigned the DOI
10.1111/T42H12. Citations to the paper may still vary in format and accuracy, but do include the DOI:
- J Foo and J Bar, Using A/B Testing on a Seismic Dataset, doi:10.1111/T42H12
- Jane Foo & Jack Bar, Using A/B Testing on Seismic Data Sets, doi:10.1111/T42H12
Now readers can plug the DOI into a resolution service (eg. http://dx.doi.org/10.1111/T42H12) which looks up the registered DOI and redirects to its linked URL, delivering readers directly to the article itself.
At the same time, the authors can search a publication database (or a standard search engine) for “10.1111/T42H12” and be fairly certain that this will return only citations to their paper.
Why not use the URL itself?
In theory, the article’s URL itself would work just as well. Suppose the article is published at http://journal.com/issue/21/article/foo-bar-ab-testing.html. Printing this URL in a citation would also allow the reader to follow it to the article itself, and give the authors something to search for. The drawback of URLs is “link rot” — over time, the journal may change its name, or the structure of its article URLs. At that point, the link will no longer work.
The URL linked from a DOI can be modified, so when the URL structure changes the DOI record can be updated. The DOI itself remains unchanged, though, so existing citations continue to work.
As shown above, the main features of DOIs are:
- Uniqueness: Because DOIs are registered with a central database, they are guaranteed to be unique
- Linkability: A DOI maps to a target URL, so it can be “followed” to a data source.
- Permanence: A DOI will continue to remain valid over time (decades, at least).
- Indexed Metadata: A DOI is logged in a global registry using a standard metadata format, allowing identifiers across many sources to be searched and tracked using common tools.
DOIs at the FDSN Network level
The current goal is to have a DOI for each network (both permanent and temporary) within FDSN. To facilitate this, the we plan to provide:
- An FDSN-managed DOI registry for networks that choose to use it,
- Recommendations for DOI structure and metadata, for networks that choose to manage their own DOIs, and
- Support for including DOIs (and other identifiers) in FDSN-managed standards such as StationXML.
Adapting the DOI system, which was designed for the fairly simple case of articles published in periodical journals, to the more unstructured and heterogeneous world of data, presents a number of challenges, including:
- The mismatch between data attribution (tracking the source of data) and data replication (obtaining the data that was actually used in a piece of work)
- The complication of determining what constitutes a discrete “object” for a given data type
- The distinction between publication and ownership as it applies to data centers
See our DOI Landing Page for more information and the current status of our DOI implementation.
Clark, A., Ahern, T.K., Trabant, C.M., Newman, R.L., Benson, R. (2013) Practical Challenges in Designing Digital Object Identifiers for Data, AGU Fall Meeting, IN13B-1563