Servicing Data Requests at the DMC
A View From the Engine Room
By now, it is probably well known that the IRIS Data Management Center manages a lot of data from approximately 60 different network operators. About 10 Gigabytes is ingested into our holdings daily. To date, we have nearly 12 Terabytes stored either in our front-end RAID system, which is online, or in our 50 Terabyte, near-line mass store. This near-line system is tape-based and allows us to efficiently recall data from our dual sort archive, based on how a user requests data. We store all data either in a time-sorted filesystem or a station-sorted filesystem. The advantage to this dual-storage is its use as an efficient method to either recall entire continuous days of data from one station or assemble all data from all networks for a given time into a singular SEED volume for distribution.
Many of you reading this article have requested data and are quite familiar with how to format a request. What I would like to point out is that the nature of how we receive data and how you can access these data is changing.
The IRIS DMC processes many “customized” requests each day. These requests are considered customized because, through a user-defined request format (BREQ_FAST), the data returned to the user is a close fit to exactly what they need. We have a web interface to our database so users can find out exactly what data we have in our holdings. This interface is called SeismiQuery (see the Data Access article in this issue for details). We highly recommend that requesters use this interface before submitting their request to facilitate request processing.
Currently, the best utility for generating a well-constructed BREQ_FAST request is the WEED utility – an xWindow-based application that can be downloaded from ftp.iris.washington.edu/pub/programs (
see WEED manual for details). WEED allows a researcher to explicitly define the parameters that can be used in time-windowing data and contains the tau-p software that calculates predicted travel times therefor minimizing the pre-event or post-event data that might come back to the user. It is of major benefit for users to ask for data that is subset as small as possible because we have limitations that prohibit the generation of very large SEED volumes (like the normal UNIX limitation of files being larger than 2 Gig). If we get very large requests, we may ask that the author of the request resubmit smaller requests, or we may split the request somewhat arbitrarily.
We have controlling features built into our processing routines so that no matter how many requests we receive at one time, we can efficiently process each one. The system we currently use is – for the most part – first-come, first-served, but we also take the size of the requested data volume into consideration. The smaller requests get processed more quickly and are fully automated. We process over 90% of the customized requests in a fully automated way; from receipt of the request to the e-mail sent to advise the user that the SEED file is ready to download. If the SEED file is very large, we transfer the volume to a requested media tape and mail it to the user.
There has been a lot of effort the last year to bring data to the DMC closer to real time. We now have a Frame Relay circuit installed at both of the IRIS Data Collection Centers that contribute the GSN data to the DMC. These data are forwarded to the DMC and are archived automatically. Because data can now be brought into the DMC more efficiently, we will soon be able to generate event-oriented SEED volumes in the FARM holdings that are much closer to real-time and can be updated automatically when new data arrives. There will be more on this subject as we progress with the implementation but users should be aware that currently, 12 different networks provide data to the DMC either via ftp or frame relay circuits. We believe that the efficiency of the disk transfers should be very helpful in acquiring data for users in a more timely manner.
The Operations group at the DMC consists of 4 employees:
- Rick Benson – Director of Operations
- Anh Ngo – Senior Data Technician (highlighted in this issue)
- Stacy Fournier – Data Technician
- Mary Edmunds – Data Technician
If you have any questions about operations or making a request, please feel free to write to Rick Benson.
by Rick Benson (IRIS Data Management Center)