Data Services Newsletter

Volume 21 : No 3 : Winter 2019

Evolving standards and their impact on researchers

A number of standards-related activities are underway in International Federation of Digital Seismograph Networks. These changes are motivated by a desire to federate data access across data centers, improve delivery through web technologies, enhance data identification, and increase flexibility.

We highlight the following activities and estimates of their impact on users of the IRIS DMC and other FDSN data centers:

FDSN data center registry

The FDSN now offers a central data center registry. This registry serves as a foundation for tools to discover and access data across data centers, including the ability for a center to declare a priority for data sets they offer. This represents a key component for tools and systems that will support data discovery and access across FDSN data centers.

Impact on users – The registry will allow tools, and therefore users, to discover and access data at multiple data centers.

Researchers may use this system to look up centers, their high level details (contact email, website, etc.) and services they offer, but otherwise there is no direct impact on data users.

In 2020 the DMC will transition the IRIS Federator and irisws-fedcatalog service to utilize the new registry to replace it’s own, internal list of data centers. This will allow our Federator system to automatically pick up newly added data centers and to route data requests based on declared priority.

Next generation miniSEED version 3 in development

In 2016 discussions were initiated to develop a next generation miniSEED time series data format. At this point a draft specification is complete and a number of key software components have been updated or created (validator, converter and general library) at a pre-release level. The specification will be submitted to the FDSN in 2020 for evaluation and adoption.

In addition to supporting new source identifiers, miniSEED 3 will offer many advantages for data producers. In addition, there are some advantages for data users such as:

  • An embedded cyclic redundancy check (CRC), to determine if data have been corrupted
  • An embedded publication version
  • Defined “mass position off scale” flag
  • Defined “recenter” headers
  • Defined “provenance URI” headers and flexible user-defined headers, for advanced usage

Impact on users – The DMC intends to make the introduction of a new miniSEED format as easy as possible for data users by enhancing IRIS software to read all versions of miniSEED agnostically. In many cases, a user would only need to update to newer releases of software they are using. This strategy is based on the recognition that a large scale transition to the new format will not happen quickly and multiple formats will exist in parallel for the foreseeable future.

Potentially the most impactful change for researchers is that the new format will utilize single string source identifiers and allow longer individual codes. It is very likely that these identifiers will be a simple combination of the traditional network, station, location, and channel codes and can usually be mapped back and forth. But the new identifier will allow for codes that are longer than the limits of the SEED 2.4 standard.

New FDSN source identifiers in development

The FDSN has approved the development of a single string, URI-like identifier that uniquely identifies a data source, as an alternative to the traditional SEED network, station, location, and channel codes. Such an identifier is a requirement of the next generation time series format and an appropriate attribute has been reserved in StationXML 1.1 as well.

Impact on users – User-created software may require changes to translate source identifiers to individual codes and/or adapt for direct use of the source identifier.

When larger individual codes are allocated or proscribed by the FDSN, software, database schemas, etc. that are hard-coded to SEED limits will need to be adapted. Also note that use of codes expanded beyond SEED limits requires the use of miniSEED 3 and StationXML formats.

Note: we anticipate the usage of larger codes will happen piecemeal and relatively slowly. The existing codes are fully supported in this scheme and will likely be in common use into the far future.

A scheme for such an identifier has been proposed and will soon enter an evaluation phase for final definition and adoption. The primary goals of this scheme are to allow trivial mapping between the source identifier and SEED network, station, location, and channel codes (whenever possible), in addition to allowing for expansion of all codes.

What is described below is the source identifier scheme proposed to the FDSN in mid-2019. While the general scheme was well received and is expected to remain, some of the details may change before final adoption.

The proposed scheme follows this URI-like pattern, following URN conventions:

XFDSN:Network_Station_Location_Band_Source_Position

Where “XFDSN” is a namespace, providing scope and self-identification for this scheme. The Network, Station, and Location values have the same meaning as SEED and Channel is separated into its constituent codes of Band, Source, and Position (aka Band, Instrument, and Orientation in SEED).

An example, for network IU, station COLA, location 00, and channel BHZ the source identifier would be:

XFDSN:IU_COLA_00_B_H_Z

Retention and potential extension of codes

The proposed scheme specifies a maximum length of the Network, Station, and Location codes as 8 characters. The Band, Source, and Position are proscribed and are currently only a single character each, but multi-character could could be defined. Furthermore, the scheme allows the Station and Location codes to contain ‘-’ (dash) characters.

In this scheme all existing SEED codes are supported, and, as long as no expanded codes are used, can be mapped back to individual SEED codes.

Advantages

This single string source identifier and expansion of codes offer a number of advantages:

  • A simple, consistent way to uniquely identify a particular data source of time series.
  • The format no longer dictates the maximum length of each code. Instead maximum code length is defined by rules, which can be changed if needed.
  • Larger Network codes mean that re-use of codes for temporary deployments would not be necessary; every deployment can have a permanent, globally unique code. A suggested convention for temporary deployments is to include the start year of the experiment in the code, e.g. network SEIS2018.
  • Delimited Channel and larger Source codes mean more codes for instruments and sensors can be allocated (the current SEED scheme of capital letters is already allocated). This would allow for clear identification of synthetic data, derived data, and as-yet unknown future data sources.
  • Delimited Channel and larger codes provide vastly more flexibility to create common conventions for state of health and operational measurements.
  • Larger Station and Location codes and allowing dashes provides vastly more flexibility for more sensible handling of very “large N” deployments with 100,000+ sensors.
  • Provide an FDSN-standardized, single-value identifier that is readily usable in general formats and systems. For example, for use in multi-domain data brokers, where seismic data may be mixed with many other types of data.

New web service standard for data availability

The FDSN has adopted the fdsnws-availability specification for a web service that returns data availability. This interface will provide a consistent method to determine data availability across centers. The specification is relatively new and is therefore not broadly implemented yet, but is anticipated to be a common service at FDSN centers in the future.

Read about the DMC’s implementation of fdsnws-availability in a dedicated article.

Impact on users – Researchers that utilized our now deprecated availability service directly will need to switch to the DMC’s fdsnws-availability implementation.

When offered by data centers, this service will support more advanced, robust data selection and tools that work across centers.

StationXML specification updated to version 1.1

The DMC has been delivering metadata in StationXML 1.0 since 2013. Minor evolutionary changes to the specification were adopted in 2019 and version 1.1 was released.

Read about the DMC’s migration to StationXML 1.1 in a dedicated article.

Impact on users – No significant impact is expected for data users. Current metadata access tools should work as before, with some perhaps showing a warning about versioning until they are updated.

Changes that are mostly likely to impact users longer term:

  • A DOI (or other persistent identifier) may be included in the metadata, most importantly for network data citation.
  • A source identifier (“sourceID”) may be included in the metadata. This is an alternative to the traditional network, station, location, and channel codes.
  • A “WaterLevel” may be included in the metadata. Useful for OBS operators to record the elevation of the water surface.

All changes from StationXML 1.0 to 1.1 are listed here:
https://www.fdsn.org/xml/station/fdsn-station-changes-1.0-to-1.1.txt

by Chad Trabant (IRIS DMC)

16:49:10 v.b4412d20