Since it went into production, MUSTANG has had data volume challenges. First, it was the sheer number of networks, with their attendant stations, going back decades. The amount of data that MUSTANG had to pull back and run calculations on was very large (300 TB). Because of this, we had to implement a strategy of maximum value early and up front by reducing the list of networks we first tackled and by limiting our channel selection to broadband channels. SEED represents broadband as BH and HH channels, falling between 20 and 100 samples per second (SPS). By tackling our most common sample rates first, we greatly improved our prospects of getting a meaningful and useful population of metrics to the IRIS community in a timely fashion.
We can now process new data for all networks each and every day and still have plenty of capacity to carry out bulk jobs to generate or recalculate selected networks in the historical archive going back 40 plus years. MUSTANG has largely made this manageable through distributed processing and careful tuning of the storage system as we routinely process 1 to 2 TB of waveform data per day.
Since the beginning of this year, we are quite confident that we have completed the original mission of compiling broadband metrics for all of the active networks, with some small exceptions applying to problematic data. We have turned our attention to considering new channels. As a result, we have been undertaking an effort to begin noise calculations of LH (long-period, 1 SPS) channels in addition to the normal broadband channel processing.
As a result of our current productivity, the size of our storage system has become the next issue to address. Having exceeded 7 TB in the size of our Postgres database, we have begun to stretch our original allotment of filesystem space, which must use a high performance RAID system to carry out the millions of write transactions that are performed each day. Even as we have had the CPU capacity to complete about 200 network-days of work per hour, we realized that we needed to focus on broadband channels until we could address the expansion in volume that we would see in the database by introducing different frequencies.
A rough estimate of the channel population in the IRIS data archives yielded an estimated 50% additional increase in the size of the database should we begin processing all additional seismic channels. Since this takes us to the conceivable limit of our available storage, we have invested in new storage to triple that capacity. This storage expansion should provide years (hopefully) of additional growth capacity not only for additional channels, but also for a conceivably greater number of sensors provided by next-generation seismometers and the increased number of metrics that MUSTANG will provide. As a result, we are examining an ordered plan to phase in support of new channels such as:
LH - Long Period (1 SPS) -- underway
VH - Very Long Period (0.1 SPS)
EH - Extremely Short Period (100-200 SPS)
SH - Short Period (100 SPS)
LN,BN,HN - Accelerometer/Strong Motion
…and many others…
Though we do not have an exact timetable for implementation, the fact that our new storage systems are being readied for use is encouraging for us to see expanded channel support in 2017.
by Rob Casey (IRIS Data Management Center) , Gillian Sharer (IRIS Data Management Center) , Mary Templeton (IRIS Data Management Center) , Bruce Weertman (IRIS DMC) and Laura Keyson (IRIS Data Management Center)