Evolving standards and their impact on researchers
A number of standards-related activities are underway in International Federation of Digital Seismograph Networks. These changes are motivated by a desire to federate data access across data centers, improve delivery through web technologies, enhance data identification, and increase flexibility.
We highlight the following activities and estimates of their impact on users of the IRIS DMC and other FDSN data centers:
- FDSN data center registry
- Next generation miniSEED version 3 in development
- New FDSN source identifiers in development
- New web service standard for data availability
- StationXML specification updated to version 1.1
FDSN data center registry
The FDSN now offers a central data center registry. This registry serves as a foundation for tools to discover and access data across data centers, including the ability for a center to declare a priority for data sets they offer. This represents a key component for tools and systems that will support data discovery and access across FDSN data centers.
Researchers may use this system to look up centers, their high level details (contact email, website, etc.) and services they offer, but otherwise there is no direct impact on data users.
In 2020 the DMC will transition the IRIS Federator and irisws-fedcatalog service to utilize the new registry to replace it’s own, internal list of data centers. This will allow our Federator system to automatically pick up newly added data centers and to route data requests based on declared priority.
Next generation miniSEED version 3 in development
In 2016 discussions were initiated to develop a next generation miniSEED time series data format. At this point a draft specification is complete and a number of key software components have been updated or created (validator, converter and general library) at a pre-release level. The specification will be submitted to the FDSN in 2020 for evaluation and adoption.
In addition to supporting new source identifiers, miniSEED 3 will offer many advantages for data producers. In addition, there are some advantages for data users such as:
- An embedded cyclic redundancy check (CRC), to determine if data have been corrupted
- An embedded publication version
- Defined “mass position off scale” flag
- Defined “recenter” headers
- Defined “provenance URI” headers and flexible user-defined headers, for advanced usage
Potentially the most impactful change for researchers is that the new format will utilize single string source identifiers and allow longer individual codes. It is very likely that these identifiers will be a simple combination of the traditional network, station, location, and channel codes and can usually be mapped back and forth. But the new identifier will allow for codes that are longer than the limits of the SEED 2.4 standard.
New FDSN source identifiers in development
The FDSN has approved the development of a single string, URI-like identifier that uniquely identifies a data source, as an alternative to the traditional SEED network, station, location, and channel codes. Such an identifier is a requirement of the next generation time series format and an appropriate attribute has been reserved in StationXML 1.1 as well.
When larger individual codes are allocated or proscribed by the FDSN, software, database schemas, etc. that are hard-coded to SEED limits will need to be adapted. Also note that use of codes expanded beyond SEED limits requires the use of miniSEED 3 and StationXML formats.
Note: we anticipate the usage of larger codes will happen piecemeal and relatively slowly. The existing codes are fully supported in this scheme and will likely be in common use into the far future.
A scheme for such an identifier has been proposed and will soon enter an evaluation phase for final definition and adoption. The primary goals of this scheme are to allow trivial mapping between the source identifier and SEED network, station, location, and channel codes (whenever possible), in addition to allowing for expansion of all codes.
The proposed scheme follows this URI-like pattern, following URN conventions:
XFDSN:Network_Station_Location_Band_Source_Position
Where “XFDSN” is a namespace, providing scope and self-identification for this scheme. The Network, Station, and Location values have the same meaning as SEED and Channel is separated into its constituent codes of Band, Source, and Position (aka Band, Instrument, and Orientation in SEED).
An example, for network IU, station COLA, location 00, and channel BHZ the source identifier would be:
XFDSN:IU_COLA_00_B_H_Z
Retention and potential extension of codes
The proposed scheme specifies a maximum length of the Network, Station, and Location codes as 8 characters. The Band, Source, and Position are proscribed and are currently only a single character each, but multi-character could could be defined. Furthermore, the scheme allows the Station and Location codes to contain ‘-’ (dash) characters.
In this scheme all existing SEED codes are supported, and, as long as no expanded codes are used, can be mapped back to individual SEED codes.
Advantages
This single string source identifier and expansion of codes offer a number of advantages:
- A simple, consistent way to uniquely identify a particular data source of time series.
- The format no longer dictates the maximum length of each code. Instead maximum code length is defined by rules, which can be changed if needed.
- Larger Network codes mean that re-use of codes for temporary deployments would not be necessary; every deployment can have a permanent, globally unique code. A suggested convention for temporary deployments is to include the start year of the experiment in the code, e.g. network
SEIS2018
.
- Delimited Channel and larger Source codes mean more codes for instruments and sensors can be allocated (the current SEED scheme of capital letters is already allocated). This would allow for clear identification of synthetic data, derived data, and as-yet unknown future data sources.
- Delimited Channel and larger codes provide vastly more flexibility to create common conventions for state of health and operational measurements.
- Larger Station and Location codes and allowing dashes provides vastly more flexibility for more sensible handling of very “large N” deployments with 100,000+ sensors.
- Provide an FDSN-standardized, single-value identifier that is readily usable in general formats and systems. For example, for use in multi-domain data brokers, where seismic data may be mixed with many other types of data.
New web service standard for data availability
The FDSN has adopted the fdsnws-availability specification for a web service that returns data availability. This interface will provide a consistent method to determine data availability across centers. The specification is relatively new and is therefore not broadly implemented yet, but is anticipated to be a common service at FDSN centers in the future.
Read about the DMC’s implementation of fdsnws-availability in a dedicated article.
When offered by data centers, this service will support more advanced, robust data selection and tools that work across centers.
StationXML specification updated to version 1.1
The DMC has been delivering metadata in StationXML 1.0 since 2013. Minor evolutionary changes to the specification were adopted in 2019 and version 1.1 was released.
Read about the DMC’s migration to StationXML 1.1 in a dedicated article.
Changes that are mostly likely to impact users longer term:
- A DOI (or other persistent identifier) may be included in the metadata, most importantly for network data citation.
- A source identifier (“sourceID”) may be included in the metadata. This is an alternative to the traditional network, station, location, and channel codes.
- A “WaterLevel” may be included in the metadata. Useful for OBS operators to record the elevation of the water surface.
All changes from StationXML 1.0 to 1.1 are listed here:
https://www.fdsn.org/xml/station/fdsn-station-changes-1.0-to-1.1.txt
by Chad Trabant (IRIS DMC)