Data Flow Through the DMC
Initially the IRIS DMC was viewed by some as the “Data Center for the IRIS GSN”. Early in its history this was close to being true. With the exception of a few PASSCAL data sets in assorted formats and the very valuable contribution of the GDSN network data in SEED format made by the USGS, the IRIS DMC was basically a GSN data center. The purpose of this article is to familiarize the community with the diverse holdings that the DMC now allows you to tap into.
The above figure shows the volume of data flowing into the IRIS DMC from various sources. The central circles characterize the data by five fundamental types;
- IRIS GSN,
- IRIS PASSCAL,
- FDSN data,
- data from US Regional Networks, and
- data from IRIS JSP networks in the former Soviet Union.
It shows the average amount of data that has reached the DMC from all sources as an average over the past five years. The numbers on the input side show the combined impact on the mass storage systems at the DMC and as such are doubled to reflect the fact that the DMC stores data both in a time sorted order as well as a station sort order.
Data come to the DMC from 37 different permanent networks as well as 56 broadband PASSCAL experiments and 81 PASSCAL assembled data sets. With the exception of assembled data sets, all data flow to the DMC in SEED format. The 81 PASSCAL assembled data sets, primarily in SEG-Y format, come from a variety of active source or multi-channel experiments. The SEED data are all available using identical request mechanisms and are completely documented with appropriate header information. The data we receive come from five different data sources:
- GSN data have reached us at an average rate of 856 gigabytes per year from the two GSN DCCs , Albuquerque Seismological Laboratory and the IRIS/IDA DCC at UCSD. Recently data from the H2O station have started flowing from the University of Hawaii but this is a new source just becoming available now.
- Data from the PASSCAL program of IRIS comes to us either directly from the PIs or from the PASSCAL Instrument Center in Socorro. An average of 792 gigabytes per year has reached us from PASSCAL over the past five years. The total amount of data reaching the IRIS DMC from the IRIS GSN and the IRIS PASSCAL programs has averaged about 1.6 terabytes per year for the past 5 years.
- A large variety of regional seismic networks within the United States now contribute their data to the IRIS DMC. These include the USNSN, the ANZA Network, the TERRAscope, data from the Lamont Network, and data from the Pacific Northwest Seismic Network and the University of Utah networks. We anticipate receiving data from Nevada and Alaska within the next year as well. During the past 5 years, regional networks contributed data to the IRIS DMC at rate of 267 gigabytes per year.
- Data from the Federation of Digital Seismographic Networks (FDSN) represents a valuable contribution to the DMC from other countries including Canada, the Czech Republic, France, Germany, Italy, the Netherlands, Japan and Taiwan. FDSN member countries Russia and China have data contributed via the IRIS GSN. On average, FDSN countries have contributed data at a rate of 187 gigabytes per year. FDSN data reaches us with a variety of different latencies, ranging from 1-2 weeks in the case of Canada (the network that sends data faster than any other network including the GSN) up to several years for some networks.
- Data from the JSP networks in the former Soviet Union still continue to contribute data to the DMC. The total amount has averaged nearly 170 gigabytes per year and comes from either Lamont or UCSD.
The total amount of data flowing into the IRIS DMC over the past five years has averaged just over 2.2 terabytes per year. During 1999 this rate had increased to nearly 4 terabytes per year so data flow is increasing. As mentioned earlier, this 2.2 terabyte figure adjusts for the dual sort order within the DMC mass storage system. In terms of raw bytes the figure is 1.1 terabytes/year. Over the same period (1995-1999) the average amount of data sent to researchers from the DMC was 734 gigabytes/year, so the IRIS DMC has been sending out nearly 2/3 as much data as data sources are sending to the DMC. The IRIS DMC is an extremely active archive, with data output a significant fraction of the data input. The total volume of data available in the IRIS DMC is 13 terabytes as shown in figure 2.
Although the GSN remains the largest contributor to the IRIS DMC, it represents only 37% of the total, PASSCAL represents 35%, regional networks 12%, FDSN 8%, and JSP 7%. The IRIS DMC should no longer be viewed as the GSN Data Center as this article well attests. The IRIS DMC provides an infrastructure that collects data from a very large number (93) different broadband networks or experiments, archives the data in a homogenous data archive, and distributes data from diverse networks in a manner that hides the differences between the networks from the researcher.
by Tim Ahern (IRIS Data Management Center)