Data Services Newsletter

Volume 18 : No 3 : Winter 2016

MUSTANG R Metrics Code Available on CRAN

The MUSTANG Data Quality Assurance system has been in production for two years, having generated more than 7 TB of data quality measurements for ready access through web services. At its core, the data collection and calculations performed on seismic data has been carried out by a set of custom code packages written in R, with cooperative efforts between IRIS and Mazama Science.

R is a recognized and heavily used statistical processing language that is freely available and open-source. It provides an environment that allows for exploratory data crunching and taps into a wealth of statistical, data preparation, signal processing, and visualization packages that are developed and trusted by experts in the data science community.

CRAN is the Comprehensive R Archive Network, which is a public repository of many commonly used packages, and serves as a quality controlled digital warehouse of nearly ten thousand open source packages, covering many different disciplines and use cases. CRAN packages are easy to discover, access, and install. It is for this reason that IRIS Data Services has chosen to publicly release the core utilities used in MUSTANG to this repository.

There are three packages that IRIS has developed and released to CRAN. The first version was released in the Summer of 2015, but we have just recently updated these to new and improved versions. The versions currently available are:

  • seismicRoll v1.1.2 – a package with rolling window and averaging functions for time series arrays
  • IRISSeismic v1.3.8 – base classes and functions for seismic data retrieval and analysis
  • IRISMustangMetrics v2.0.2 – contains function kernels for metrics created in MUSTANG

These updated CRAN packages represent the improvements we have been working on the past year, which have been in production in MUSTANG since July of this year.

One item of note: We do not release our workflow business logic for data selection and parameterization in the current package releases. The concern is that they are system specific, somewhat proprietary, and would not function well outside of the production environment. That being said we are reviewing what we can fold into future CRAN releases that will be generic and safe for general use.

We see this code sharing as an opportunity to allow public review and validation of our computational methods. It also can serve as a jumping-off point to help others in the community make use of what we have learned in building MUSTANG in order to carry out their own computational workflows on IRIS-supplied data. We welcome the community to review our package offerings on CRAN, especially if you are an R user wanting to work with seismic data provided via FDSN Web Services.

For more information:

  1. MUSTANG
  2. R Project
  3. CRAN
  4. Tutorial on using MUSTANG Metrics in R
  5. Mazama Science

by Rob Casey and Gillian Sharer (IRIS Data Management Center)