Data Services Newsletter

Volume 4 : No 2 : June 2002

Using Antelope for PASSCAL Field Data Management

The PASSCAL program revolutionized field seismology, but this revolution has presented field investigators with a new burden: making data available to the rest of the community. Ideally this burden should be a minor extension of the principal interest of the investigator: acquiring, examining, processing, reducing, analyzing, and interpreting his or her data. Recent improvements in and extensions to the IRIS-licensed Antelope seismic package finally put this idealization within our grasp.

As part of the agreement among IRIS member institutions, data recorded using instruments and equipment provided by PASSCAL must be archived at the IRIS Data Management Center (DMC). Originally, the investigator could format the data however she or he chose, resulting in submission of tapes of various subsets of variously formatted and inconsistently documented data. In 1995, the DMC instituted a requirement that the investigator submit data in DMC-approved Standard for the Exchange of Earthquake Data (SEED) format. The absence of a reliable means of generating SEED from field SEGY data drove PASSCAL programmers (mainly Paul Freiberg and Sid Hellman) to complete the PASSCAL database system (pdb), which is built atop the GNU-licensed Postgres database software. This set of tools, designed to assist field experiments with waveform quality control, data archiving, and event parsing, increased the efficiency of the field data management process once widely introduced in 1996-7.

Rather in parallel with these developments, the IRIS Joint Seismic Program (JSP) developed software for the management and analysis of large volumes of continuous seismograms in the early 1990s. Originally termed Datascope, it built upon the older ascii, CSS data schema and relied upon a custom database engine built by Dan Quinlan and Danny Harvey. Although more mature in analysis tools and usually more responsive than Postgres-based database queries, Datascope lacked some of the tools to move data into the DMC-required SEED. Datascope has developed into the Antelope system of software developed privately by Boulder Real Time Technologies (BRTT) and now licensed by IRIS for all full IRIS member institutions. This package provides a convenient way to access information related to seismological field experiments in addition to supporting several aspects of basic data processing. Easy expansion of the database through the implementation of a transparent schema allows this product to be morphed into an appropriate tool for every observational seismology discipline. Already, many IRIS investigators use this package in parsing and analysis of large datasets.

Use of the Antelope package to generate the required SEED volumes necessary for archiving at the IRIS Data Management Center has proved more difficult until recently. SEED generation became possible when tools originally designed to support the real-time telemetry of the IRIS Broadband Array, the Antelope Real Time System (ARTS), became available for offstream use in version 4.3 and later releases of Antelope. We describe here the steps we have successfully taken to move data from the field to the DMC within Antelope. The advantage has been the ability to access our data in the field with the numerous tools Antelope provides without having to maintain a parallel processing stream to send to the DMC. A more through discussion of the process, along with necessary tcl scripts and associated files, can be found at http://seisg4.colorado.edu/~wilsonck/SEED.htm.

We assume that the following directories are present in a central directory:

  • dload: directory where initial data dumps will live (refdumps and ref2mseed output)
  • log: will contain the log files created through ref2mseed
  • wf: will contain time and header corrected waveforms ready for SEED generation
  • DBmaster: will contain the master database describing the details of the experiment
  • pf: contains necessary parameter files called by Antelope routines

1) Dump raw data files from Reftek datalogger using PASSCAL refdump or disk2dat commands inside the dload directory.

refdump {device name} {refdump filename}

2) Run ref2mseed on refdump files inside the same directory.

ref2mseed -f {refdump filename}

3) Time correct ref2mseed output. Use the PASSCAL refrate command to generate the pcf file for the log files in the dload directory.

refrate *.log > {output pcf file}

Now run clockcor from inside the dload directory:

clockcor -qm {pcf file name} R???.??/*.m

4) Correct mseed headers and move data to data directory. The miniseed headers do not correspond to the Datascope database tables, which often use geographic station names and more intuitive channel naming conventions. The station names in the miniseed files correspond to the Reftek DAS responsible for recording the data. The channel names are three character codes related to the data stream and component of the recorded waveform (i.e. 1C1, 1C4). These miniseed header values need to be changed to match those specified in the Antelope database. To accomplish this we have written a script named mvdata_remap.tcl (available on the website) that uses a parameter file mapping a Reftek DAS to a geographic station name and more common channel designation (i.e. SHZ, BHZ) and then changes the header values using PASSCAL’s mseedmod command. For the script to work properly the remapwf.pf file must be accurate. This file contains information about DAS/station/channel triplets. An example of this file is included in the provided downloadable files. Run the scripts from the dload directory. The output directory will contain a wf dir and and log dir. If the directories do not exist don’t worry. The script will make them. The corrected data will be in wf in year-jday dirs. The log files will be in log.

mvdata_remap.tcl -dir {full_path_name}

After the script has run, make sure that all files have been moved, i.e., there are no files left in R?. directories in the dload directory.􏰀 Error messages from the script may help to illuminate any problems.

5) Make the wfdisc. This is done with dbsteimu. Go into the wf directory, and type:

dbsteimu -S */*.m {database name}

There is a limit to the number of input dictated by the unix shell which will not allow a list longer than 10240 characters. To avoid this problem, the file list may have to be segmented and dbsteimu may have to be run several times and then the wfdiscs can be catted together.

6) Generate the Master Database. This involves the creation of station parameter files that describe the changes of stations within the network. The files are read and parsed by the ucsdsp2db command from Frank Vernon at UCSD, which then creates the master database. The files must be located in a directory of their own and must follow a file naming convention that contains the time of network change in the title, YYJJJHHMM.SSS. An example directory of stapar files and a script to run the compile command are provided at the website along with a macintosh version of a stapar file editor created by Craig Jones. It is best to keep the master DB in a directory of its own to prevent inadvertent editing or overwriting.􏰀 The wfdisc and master DB (now moved to the DBmaster directory) can be linked using a DB descriptor file. The descriptor file for a DB named example with a master DB named example_master would look like this:

rt1.0
./{example}:/rift/c3/NZ/DBmaster/{example_master}

7) Verify the DB. To do this run dbverify being sure to redirect the output so any problems can easily be recognized and fixed.

dbverify {database name} > dbverify.out

8) Backup continuous data and refdumps. This is best done as root using the dumpsked script provided in the downloadable files. Change the dump directory (set dumplist) in the dumpsked script to reflect the location of the data ready for backup, e.g.

set dumplist=( rift:/c3/NZ weka:/data00 )

The script must be run as root! Script will use ufsdump to copy files to tape. Use

ufsrestore -i /dev/rmt/0

to restore files. You will be given a prompt at which time you can cd around through the archived files. Use the command add to add files/directories to be restored. Use extract to restore the added files/directories.

9) Make SEED volumes for archiving. Run the mk_seed.tcl script in the /weka/data01/BB/wf directory.

mk_seed.tcl -db {database name}

Make sure there is a brand new DLT tape in the tape drive. Label the finished tape with the jdays contained on the tape, the experiment code, the date the tape was made, the name of the SEED operator, the date shipped, and the tape density. Mail this to PASSCAL.

10) Make dataless SEED. Run mk_dless.tcl in the DBmaster directory. Submit the output to PASSCAL via ftp for quality control.

mk_dless.tcl -db {database name}

At this point, the real advantage of working within the Antelope framework becomes apparent. A first-cut local earthquake origin table can be generated through the use of detection and location programs provided in the Antelope package. Direct analysis of the continuous data (the original purpose of Datascope) is easily accomplished (e.g., dbpick). Origin tables can also be made from available location information from the USGS (qed2origin, dbe2origin commands) or other sources., The continuous seismograms can be parsed into event-snipped waveforms using Antelope’s trexcerpt routine. We use a procedure where a waveform is snipped in such a way that most of the primary analysis phases are included. This database of large event based cuts is now small enough so that it can be maintained online. When the time comes for final processing, further trimming can be performed on this dataset so that a new set of waveforms can be created which contain only the phase(s) of interest to the particular researcher.

Miscellaneous commands

mseedhdr – to see header info for miniseed file.

mseed_start_stop – start and stop times for miniseed and normal seed files.

qedit – edit seed files directly (ex. disp to see header info)

mseedmod – change specific header values for mini-seed.

get_DAS_stn.tcl -DAS #### -date yyyy:jjj:hh:mm:ss.s – For the time and DAS specified, the script will return the geographic station name.

Schematic workflow figure for Antelope processes
Figure 1: Schematic workflow figure for Antelope processes

by Charles Wilson (Lamont-Doherty Earth Observatory) and Craig Jones (University of Colorado, Boulder)

13:16:06 v.b3198453