Data Services Newsletter

Volume 2 : No 2 : June 2000

The IRIS Data Management Center, Seattle

The Location

The IRIS Data Management Center is located at the University of Washington in Seattle. We lease space just off campus and are located in the building next to the UofW Department of Computing and Communications. Our large mass storage systems and some of our computational servers are located in the C&C’s office space. Our office is connected to the C&C location via several very high-speed Fiber Optic circuits.

The Staff

The DMC is staffed by twelve IRIS employees. We work closely with individuals at the University of Washington Geophysics Program. The UofW provides invaluable expertise in the development and maintenance of the SPYDER® system, the development of data request tools – such as WEED and BREQ_FAST. Additionally they provide close interaction in the development of products that are of general interest to the IRIS community.

Name Title Email (@washington.edu)
Tim Ahern IRIS DMS Program Manager tim
Deborah Barnes Information Services Coordinator debbie
Leanne Rubin Office Manager leanne
Rick Braman Systems Adminsistrator braman
Operations Group
Rick Benson Director of Operations rick
Mary Edmunds Data Control Technician mary
Stacy Fournier Data Control Technician stacy
Anh Ngo Data Control Technician anh
Software Group
Rob Casey Software Engineer rob
Chris Laughbon Software Engineer chris
Sue Schoch Software Engineer sue
Sandy Stromme Software Engineer sandy

There is a great deal of interaction between the various people at the DMC. The office is managed by Leanne Rubin whose job it is to insure that the business end of the DMC runs smoothly. Most of you that have had questions about data may have talked with Debbie Barnes. Debbie’s primary job is that of Webmaster and she oversees the more than 2000 pages IRIS has on its website. Rick Braman is the UNIX Systems Administrator whose job is to administer the roughly 40 SUN Microsystems computers that range form small desktop systems to large Enterprise class Servers. Rick also manages the various large mass storage systems that form the heart of the DMC.

The operations group is managed by Rick Benson. This group is charged with archiving data from the various networks that send data to the IRIS DMC (see the companion article in this Newsletter). Additionally, this group is responsible for servicing user requests for data. In general data are archived and requests are serviced on the same day the request is received at the DMC. With more than 4 terabytes arriving at the DMC each year, and almost 1 terabyte of data being shipped to users, the operations group is always busy.

The software group consists of 4 people ranging from experts in database management systems, to development and maintenance of the more than 400 applications that run behind the scenes to archive data and service user requests. Not only does this group develop and maintain the various applications that run behind the scenes at the IRIS DMC, they also develop several applications that IRIS distributes to the community – such as rdseed, verseed, WEED, evalresp. Additionally major Data Center Systems have been developed and are being maintained by this group, including the Portable Data Collection Center (PDCC) system and the NetDC Distributed Data Center System.

The DMC System

The DMC’s central task is to archive waveform data and maintain the various station metadata tables that allow one to translate the bits in the waveforms to true ground motion. To do this the DMC has a very large hardware infrastructure acquired primarily from the National Science Foundation with augmentation provided by the Keck Foundation, Storage Technology and SUN Microsystems. Our system is almost entirely SUN based. Figure 1 shows the main components of the infrastructure at the DMC.

Data are received either by physical tape or in several cases electronically, either through the Internet or through dedicated Frame Relay circuits to San Diego, AFTAC or ASL. Data are archived on an Enterprise 4500. The process of archiving is normally done on miniSEED data only. All waveform data pass through processes that encapsulate files in a manner whereby all data from a station/day is placed into one file. The various start/stop times for the seismograms are determined and this information is placed into the appropriate Oracle Database tables. The waveforms themselves are first moved into a 1.5 terabyte RAID Disk system for a few months or until data completeness is assured. At that time the data are then migrated to a large (50 terabyte) StorageTek, tape based, mass storage system for long term archiving. Data are placed in two sort orders: by time, allowing the DMC to efficiently process requests based upon events; and by station, allowing optimal processing of requests that are for a few stations over long time periods. For redundancy, a copy of the time sorted archive also resides in the Storage Tek system. A third copy of the time sorted data is archived on a much smaller (3.5 terabyte) StorageTek Timberwolf mass storage system on DLT media. Tapes from the DLT copy are routinely removed from the mass storage system and sent to UNAVCO. This off-site storage of the entire IRIS DMC archive is done in order to insure the integrity of the archive in case of a catastrophe such as an earthquake or fire. The total volume of the four copies of data is more than 26 terabytes (26,000,000,000,000 bytes).

The metadata (i.e. descriptive information) are those data that are, for the most part, contained in the SEED headers. As needed, routines at the DMC process updated SEED header volumes and update information in the Oracle Database Management System that runs on a SUN Enterprise 4000 server.

The Oracle database has also been populated with earthquake hypocenter information. We routinely process information from the NEIC via finger, QED, Weekly Hypocenter files, and Monthly Hypocenter files. We have also just recently begun to include information from the ISC in the hypocenter portion of the database. SeismiQuery was modified so that event information can be queried, mapped or output in the WEED event format.

Users interact with the DMC primarily through a SUN Enterprise 4000 server. This server acts as the ftp server, the WWW server, and the file server for the FARM products as well as the server for some user request tools such as WILBER and XRETRIEVE. Several years ago the DMC went to using Enterprise servers. In this manner we can incrementally increase the compute capacity of DMC machines as needed by simply adding incremental processors or memory. The strategy has worked very well for us and we anticipate being able to keep up with the demand of archiving several terabytes of new data per year as well as delivering nearly 1 terabyte of seismic data to the research community per year.

The DMC has three main data pools from which users can draw data.

  1. The Archive: The 50 terabyte Wolfcreek mass storage system and the 1.5 terabyte RAID system comprise the “archive”. Most of this data is not on-line but is termed “near-line” since a robotic system can automatically mount tapes and recover files anywhere within the 50 terabyte system. Access to the archive is through a variety of User Access Tools (See companion article in a previous Newsletter).
  2. The FARM: The FARM is a series of data volumes in SEED format. They are generated for all events larger than Mw 5.7 (or 5.5 if depth is great than 100 km). FARM products not only have data volumes in SEED format but also a collection of GIF images and summary files that provide useful information that relate to the specific FARM product. Presently they only contain data for IRIS GSN data. We plan on generating FARM volumes for all networks that the DMC has data for in the future. This will greatly expand data availability. We anticipate a very different method for building and managing FARM data volumes, but that will be a topic for a future Electronic Newsletter. We anticipate that nearly one half of our future data shipments will come from the FARM. FARM data volumes are stored on-line in a RAID system attached directly to the main DMC user interface computer.
  3. SPYDER®: SPYDER® is similar to the FARM only it contains non-quality controlled data. In principal, as the FARM products appear, the data in the SPYDER® products will be made inaccessible. The only data remaining in SPYDER® volumes will ultimately be only those data that did not make it through the normal archiving process. This could result from a lost tape or some other reason.

The IRIS DMC takes its responsibility to provide data to the research community very seriously. Accessing data from the FARM or SPYDER® data sources, you can generally have your data within a few minutes of your request. In general, you will receive your data the same day you make the request even if the nature of your request requires that the data be accessed from the archive. Only very large requests or requests that encounter problems in processing might take more than one day to satisfy.

I hope this brief review of the DMC, its staff, and some of its basic processes was of interest to you. If you have any specific questions please feel free to contact any of the staff listed in the table above.

by Tim Ahern (IRIS Data Management Center)

15:18:47 v.9bc884e1