Tutorials: Accessing PH5 Archive with FetchData

Posted on

The following tutorial is intended to briefly describe the current state of the IRIS repository of PH5 formatted data and highlight some data request tools developed by IRIS Data Services to access the dataset.


Background on PH5 Data Format

The PH5 data format was developed by the PASSCAL Instrument Center (PIC) as an implementation of the hierarchical data format (HDF5) and is the preferred format for active-source seismological data sets. Distinct from the SEED volumes that are typically stored in the IRIS Data Management Center (DMC) archive, the PH5 data format provides several advantages(1):

  • Portability and extensibility ensures broad access through a variety of programming interfaces and platforms,
  • Self-description, which gives the user direct access to part of a file without requiring parsing of the entire file, and
  • A well-established base of experienced users and detailed documentation that spans several decades of development and crosses a wide variety of technical disciplines.

Currently, the IRIS DMC repository for PH5 data includes 60 distinct deployments with most experiments supporting high sample rate (>100 Hz) sensors. The total amount of data stored in the archive is around 12 terabytes but that is expected to increase substantially in the coming years as support grows for deployments with large numbers of sensors and more PH5 data is ingested.

link to top

PH5 Web Services at IRIS

In June 2017, IRIS DMC released three web services designed to serve time series and associated metadata from active-source seismological data archived in PH5 data format. The new services include:

  • ph5ws-station, which serves station and instrument metadata,
  • ph5ws-event, which reports shot location and source characteristics for each array deployment, and
  • ph5ws-dataselect, which retrieves time series data in either miniSEED or SAC output format.

The PH5 web services are functionally similar to their counterpart FDSN services in that data requests are organized by network, station, and channel identifiers as well as start and end times. Usage patterns and parameter notation for the new PH5-based web services were deliberately designed to accept similar inputs in order to maintain consistency and familiarity within the IRIS user base.

Other popular tools hosted by IRIS that have been modified to accommodate PH5 data retrieval include GMAP and the perl utility FetchData, which we will focus on below. Developing other tools to access the PH5 archive will be an ongoing process within the DMC as more data is made available following future experiments.

link to top

Using FetchData with PH5

The perl utility FetchData has been available for several years to facilitate access to the standard SEED formatted archive of waveform data. Please find the most recent version of at the following link: https://seiscode.iris.washington.edu/projects/ws-fetch-scripts/files

A basic example using FetchData may look like this, where we are requesting a certain length record section of a single channel from a single station:
$ FetchData -N IU -S ANMO -C BHZ -s 2011-01-01T00:00:00 -e 2011-01-01T01:00:00 -o IU_ANMO_BHZ.mseed -m IU_ANMO_BHZ.metadata

where we specify network, station, and channel codes along with start and end times of the record requested. The data output file is named by the -o flag while associated metadata is written to file denoted by the -m option flag. More details about the standard inputs of FetchData can be viewed using the --help option. Also note that the channel identifier codes may be wildcarded (*) to accommodate larger requests across multiple stations and channel types.

Changing default service endpoints

Adapting your request for using FetchData to access the PH5 data archive is a straight-forward process similar to specifying an alternate data center

Now, instead of using the default FDSN web services, we’ll direct the FetchData script to request data from their PH5 counterparts:

FetchData -N ZI -S 1002 -C DPZ -s 2015-06-29T04:45:00 -e 2015-06-29T04:46:00 -o ZI.1002.DPZ.dat -m ZI.1002.DPZ.meta --timeseriesws http://service.iris.edu/ph5ws/dataselect/1/ --metadataws http://service.iris.edu/ph5ws/station/1/

This request has retrieved a record section that is 60 seconds in duration from a single channel at a single station. Notice this uses similar inputs for the channel identifier codes, the start and end time formats, and the output file names as before.

To access the PH5 archive, we have simply added two option flags to specify the PH5 services for waveform data (--timeseriesws) and associated metadata (--metadataws).

If setting the verbose option -vv you can directly confirm which service endpoints are being selected.

Refine data requests based on ShotTime

Now, to better constrain our record requests, we can use the times of the shots made during the experiment. The FetchEvent tool is currently under development to work more seamlessly with our PH5 web services, but for now, we can query the web service directly for the shot metadata. Using the same network ID and station metadata from our previous example:



The output for our query on this experiment is shown in the pipe-delimited table above. For this particular experiment, only 3 shots were made, but others may have many hundreds or even thousands of shots.

For any shot of interest, we can now use the “ShotTime” parameter to constrain our data requests. For example, a 10-second data record looking for first arrival data from shot 5013 on station 1002 could be requested as so:

FetchData -N ZI -S 1002 -C DPZ -s 2015-06-29T06:10:00 -e 2015-06-29T06:10:10 -o ZI.1002.DPZ.dat -m ZI.1002.DPZ.meta --timeseriesws http://service.iris.edu/ph5ws/dataselect/1/ --metadataws http://service.iris.edu/ph5ws/station/1/

Or, if interested in records for all stations in the experiment, simply omit the station flag -S or use wildcard expressions as described above.

At this point, we have covered some of the basic differences between PH5 and SEED formatted data, provided an estimate of how much PH5 data is currently stored within IRIS archives, and discussed some client tools and web services that may be used to access the data. From here, researchers may branch off into their own workflows of data ingestion for further processing and analysis. However, as discussed further below, IRIS DS has developed some tools within the open-source PIC/PH5 project on GitHub that we hope can facilitate some basic data visualization needs.

link to top

Shot and Receiver Gather Tools

For users that have a desire for visualizing the requested data, IRIS DS has prepared a set of Python-based command-line tools that will provide common-shot and common-receiver gather images. These tools are called ph5shotgather and ph5receivergather, respectively, and can output the gathers as either an image, or in mseed or SAC formats.

The tools are hosted under the open-source PIC/PH5 project on Github and users are recommended to install the conda utility (https://conda.io/docs/user-guide/install/index.html) to help manage their installation of the PH5 software and Python environments. Detailed installation instructions for the PH5 software can be found here while a full listing for the input parameters of the gather tools can be seen by using the -h option flag for either utility.

Below is an example of the ph5shotgather tool, but ph5receivergather uses similar inputs and logic to make images. We’ll use shot 5013 again, like before, to observe the shot propagating across the network:

The ph5shotgather command is formed as follows:
ph5shotgather --network=ZI --channel=DPZ --starttime=2015-06-29T01:00:00.0 --endtime=2015-06-29T09:00:00.0 --shotid=5013 --shotline=001 --format=plot

Since we’re interested in recording the shot across the whole network, the station code has been omitted from this request, but could be included for more targeted inquiries. In the record section below, the vertical axis represents time since shot 5013 was made, and the horizontal axis represents the distance from the shot to a particular station. Note: If using Mac OS X, you may need to additionally download the XCode developer toolkit to properly view the images.

It should be noted that the amount of time included in the record window is currently hard-wired into the gather tools themselves, but this project is still very much under development and constantly looking to add more versatility and functionality. Depending on feedback and use cases, the gather clients may be expanded to include other cosmetic enhancements like including a reduction velocity or a variable area wiggle plot, or more functional features allowing for more user input or possibly including a phase picker tool to identify coherent arrival energy across stations.

link to top


  1. Folk, M., E. Pournal 2010. Balancing Performance and Preservation Lessons learned with HDF5, https://www.hdfgroup.org/pubs/papers/HDFandpreservation_NIST_2010_paper_Folk.pdf

Filed under categories



20:08:29 v.ad6b513c