Manuals: NetDC Protocol


Important Note: NetDC has been deprecated at the DMC and was shut down on October 1st, 2013
Any email to netdc@fdsn.org now redirects to NCEDC’s installation of NetDC.

0.0 INTRODUCTION

In recent years, we have seen the Internet flourish as a medium for the exchange of ideas and data. This has helped scientific research reach new heights of cooperation and discovery as institutions have begun to find critical data for their studies readily available in a form they can quickly put to use. Advances in the seismic research arena have been no exception.

There are currently many seismological data centers scattered about the globe providing services that make seismic data available to researchers. Each center tends to focus on a subset of the worldwide collection of data. With so many data centers out there, a scientist looking for data has to be aware of whom to contact and how to request the data, the latter case tending to vary widely from data center to data center. This paradoxically leads to a situation where there is a massive amount of data within reach that few know how to find and extract.

One solution to this problem is to unify these various data centers to appear like one large data center. By providing a single, common request interface the user is able to submit a data request without the need to know the specifics of how their request will be fulfilled. The worldwide network of data centers then becomes an integrated archive of data holdings, virtually eliminating the need for each data center to “mirror” other sites just to make the data readily available to its customer base.

This solution takes the form of Networked Data Centers: A set of utilities and protocols that interconnect data centers and provide a means of exchanging and merging data for user collection. The system is entirely automated in its networking procedures and provides an avenue for data centers to increase their customer base and data production volume.

This technical manual outlines the concepts of Networked Data Centers and provides specifics for installation, maintenance, and use of this software system.

1.0 OVERALL CONCEPT

The familiar model of the data request path is shown in the diagram below. When a data center sees a request arrive, the system knows it is a user on the other end sending the request. When the data is prepared, it is sent back directly to the user and the request is satisfied.

Fig. 1.1
Figure 1.1: The typical data request pattern is a user requesting data from a data center.

The next figure illustrates the same concept, except that now the user is replaced by another data center. For all intents and purposes, the method of processing the request is much the same as before. The requesting data center (the client) will receive data from the processing data center (the server).

Fig. 1.2
Figure 1.2: An alternative request pattern is one data center requesting from another.

Extending this example, we can make data from one data center available through another data center by simply propagating the user request from one data center to the next and having the data return along the same route.

Fig. 1.3
Figure 1.3: The user wanting data from Data Center B can make the request through Data Center A.

Following this, we can expand the networked request model by having a number of data centers be possible destinations for a user’s request.

Fig. 1.4
Figure 1.4: The networked request path represented in a typical star configuration.

By looking at Fig. 1.4, it is apparent that a user interested in data will contact only one site when making a request. The site that receives the request will have the responsibility of routing appropriate portions of the request to other data centers, should it be appropriate. In addition, this site is the central receiving point of completed data before it is sent to the user. In order to distinguish this data center’s role from others, we choose to call it the hub site for the request. At the same time that a data center is serving as a hub site for one request, it could also be servicing a request routed to it from another site. When it serves the latter role, it is termed a delegate site for that request.

The general flow of the NetDC protocol is this:

  • The user submits a request for data to a data center. This data center becomes the hub site.
  • The hub site then scans each line of the request and decides whether to route any of the lines to delegate sites.
  • Individual request forms are sent to the selected delegate sites with information on how to return the data back to the hub site once processing is completed.
  • The hub site processes its local request and collects data from the delegate sites as they come in.
  • When data collection is completed, the user is notified of the data shipment and the data is made available to the user through FTP or email.

NetDC requests are always emailed to participating sites through the account name “netdc”. All NetDC processing functions at that site will also operate under that username. Since each data center will have their own unique domain address, it is easy to distinguish NetDC sites, even as they share the same username. An example email address to which a user would submit his request would be:

netdc@somehost.org

From there, the request is processed and within a reasonable period of time the user can expect a data return that is either packaged at the hub site or received in separate pieces from each of the delegate sites. This relates to the merging feature of NetDC, which will be discussed further in Chapter 9.0.

2.0 NETDC REQUEST FORMAT

A NetDC request has a similar layout to the familiar BREQ_FAST request format, which has already been in use at various data centers for several years 1 .

Below is the general layout of the NetDC request. Some of the fields are mandatory (noted beneath). Take note that all fields may be separated by spaces, tabs, or both (collectively known as white space).

.NETDC_REQUEST 	 
.NAME    <name of user requesting data>
.INST    <name of institution>
.MAIL    <return mailing address>
.EMAIL    <return email address>
.PHONE    <phone number>
.FAX    <fax number>
.LABEL    <user-assigned label for request>
.MEDIA    <primary media selection>
.ALTERNATE MEDIA    <alternate media selection>
.FORMAT_WAVEFORM    <what format to receive waveform traces in>
.FORMAT_RESPONSE    <what format to receive response information in>
.MERGE_DATA    <YES or NO and waiting time>
.DISPOSITION    <instructions for FTP data transfer to user>
.END 	 
<data request line #1> 	 
<data request line #2> 	 
<data request line #3> 	 
… 	 
<data request line #N> 	 

Let us take a closer look at each of the header entries:

.NETDC_REQUEST (mandatory)

This is necessary to identify the mail document as a request intended for NetDC. This must always be the first line of the request.

.NAME (mandatory)

This indicates the name of the user. This is needed to identify the request and allows for the grouping of multiple requests by the same user.

.INST (mandatory)

This lists the institution that the user belongs to. This can be a company name or educational institution. This assists in establishing contact with the user should it be necessary for servicing the request.

.MAIL

This indicates the postal address of the institution, should it be necessary to send physical media containing the requested data product.

.EMAIL (mandatory)

This is the email address of the requesting user. This entry is mandatory since the majority of user contact will be through email.

.PHONE

This lists the user’s contact phone number.

.FAX

This indicates the fax machine number that the user has access to.

.LABEL

This is a user-assigned label for the request, which will appear in data files shipped to the user. If a label is not specified, a default value will be assigned.

.MEDIA

This specifies the preferred media of delivery of the data. Normally the media type will be predetermined by the type and size of the data being shipped. Options are currently FTP, EMAIL, EXABYTE TAPE, DAT TAPE, DLT TAPE, and possibly others.

.ALTERNATE MEDIA

This specifies a backup media option should the first not be available.

.FORMAT_WAVEFORM

This indicates the format for the waveform data to be shipped. Initially, the only option will be SEED. SEED will also be the default format if this line is not provided.

.FORMAT_WAVEFORM SEED

.FORMAT_RESPONSE

This indicates the format for response information when it is requested. The current default is SEED ASCII, which is a specific text output format for displaying response data known as RESP format. RESP files can easily be produced by a program called “rdseed”/dms/nodes/dmc/manuals/rdseed/.

.FORMAT_RESPONSE SEED_ASCII

.MERGE_DATA

This requires a YES or NO entry, specifying whether the data products should be combined at the hub data center before shipment to the user or if each data center should send their shipment to the user. In the case of a YES entry, a number of days should be entered which equates to the wait time in days. After that point, the hub data center will ship what it is able to provide and any late shipments will be redirected to the user. Here is an example of how the user would specify product merging with a two-day time window.

.MERGE_DATA YES 2

.DISPOSITION

This is an optional field for specifying how to transfer data through FTP to the user. It will be followed by one of two directives: PUSH or PULL. The PUSH case directs the data center performing the shipment to open an FTP dialogue with the user’s host machine and put the data on the user’s machine. The PUSH directive is followed by the host name and the anonymous FTP directory into which the data is placed:

.DISPOSITION PUSH myhost.seismology.edu /pub/dropoff

The PULL directive specifies that the user will get the data through FTP manually once notified that it is available. There is no need to specify a host name or directory here:

.DISPOSITION PULL

.END

This is a mandatory entry that signals the end of the request header. What follows the .END tag is one to many data request lines, which list specifically what data the user wants to receive. There is no set upper limit to the number of data request lines a user can enter, but each line must be a separate record with white space separators for each of the fields.

Data Request Lines

The data request lines come in three flavors: .DATA, .RESP, and .INV. .DATA lines request waveform data, .RESP lines ask for response information, and .INV queries a site for an inventory of data holdings. All of these request types follow the same general format, even though the response to each will differ.

The format of a data request line is laid out in a logical order, reflecting the one-to-many relationship between successive fields. Each field in the request line is a text string with the first field containing a leading period. UNIX-style wildcards ? and * can be used in many of the field strings, which say “match to any one character” and “any number of characters”, respectively. The field layout of a data request line is as follows:

.<DATA_TYPE> <DATA_CENTER> <NETWORK> <STATION> <LOCATION> <CHANNELS> <START_TIME> <END_TIME>

To elaborate:

DATA_TYPE

This specifies what data is desired. This field must always have a leading period, making the possible choices .DATA, .RESP, and .INV. Note that more data types may become available in the future.

DATA_CENTER

This is a unique string identifier representing a specific data center in the group of networked data centers. The proper data center code name must be used here in order to match to the proper data center. Except for inventory requests, this field will generally be wildcarded with a single *. However, if the user insists that data comes from a specific data center, then putting an identifier in this field would force the request line to be sent to that site.

NETWORK

This is the FDSN network code for the data requested, consisting of one or two characters. This field may be wildcarded. Network code examples can be found at http://www.iris.edu/dms/nodes/dmc/services/network_codes/.

STATION

This is a station name up to five characters in length. This name refers to a geographic location, so occasionally another network will have the same station name for their instrument placed nearby. This is equivalent to the station identifier in SEED format.

LOCATION

This is a field that allows users to request data from specific data streams on the instrument at the specified network and station. This is in the form of a one or two character string, referring to the location identifier in SEED format. This field may be wildcarded.

CHANNELS

This is a string describing the channels to be retrieved. Channel names are up to three characters in length and follow SEED channel-naming conventions. The number of channel names can vary from one to any number of space-separated elements. Each channel entry may be wildcarded. When two or more channels are specified, they need to be enclosed in double-quotes. An example channel string would be:

"BHE LH? E*"

START_TIME

This is a six-field set of numbers specifying the time and date for the beginning of the time window desired. The format is:

"YYYY MM DD hh mm ss.ffff"

where

YYYY = year (0000-9999)
MM = month (01-12)
DD = day of month (01-31)
hh = hour (00-23)
mm = minute (00-59)
ss = second (00-59)
ffff = fraction of second (0000-9999 ten-thousandths)

Take note that the ss field can drop the decimal point if the fraction of a second is equal to zero. Since this is a space-separated set of characters, the time string must be contained within double-quotes. Wildcards are not allowed. An example start time could be:

"1995 06 22 04 00 23.4522"

END_TIME

This has the same format as START_TIME, and pertains to the end of the time window for the data desired. Armed with this information, the following example of a DATA request line makes sense:

.DATA * AA ORCA * "BHE LH? E*" "1995 06 22 04 00 23.4522" "1995 06 22 05 30 00"

This asks for data from station ORCA of network AA from June 22nd 1995, 0400 hours and 23.4522 seconds to 0530 hours. The channels returned will be BHE, all LH orientations, and any extremely short period (E) channels. If the user instead wanted response information, the line would read:

.RESP * AA ORCA * "BHE LH? E*" "1995 06 22 04 00 23.4522" "1995 06 22 05 30 00"

More information on these types of requests will be provided in later chapters.

Example

An example of a NetDC request submitted by a user would be:

.NETDC_REQUEST
.NAME Joe Seismologist
.INST University of Quakes
.MAIL 1101 Binary Data Way, Anytown, WA 90909
.EMAIL joe@host.seismolab.edu
.PHONE (999) 555-4567
.FAX (999) 555-4568
.LABEL My_Request
.MEDIA FTP
.ALTERNATE MEDIA EXABYTE 2GB
.FORMAT_WAVEFORM SEED
.FORMAT_RESPONSE SEED_ASCII
.MERGE_DATA YES 3
.DISPOSITION PULL
.END
.RESP * G SSBC * * "1990 03 01 00 00 00" "1990 03 02 00 00 00"
.INV NCEDC * *
.DATA * PS TSKO * M?? "1990 03 01 00 00 00" "1990 03 05 06 02 45.78"
.DATA * CD ZHLP * "B?? S??" "1986 06 16 00 00 00" "1986 06 19 04 00 00"

1 – Those not yet familiar with the BREQ_FAST format are encouraged to examine the BREQ_FAST manual available on-line.

3.0 REQUEST RECEPTION AND DELEGATION

Requests will arrive at a data center through the email account netdc. When the email arrives to that address, a program reads the message and determines if it is indeed a NetDC request. If this is the case, then the message is processed and a copy of the request is kept in a text file.

The request, if coming directly from the user, is assigned a “hub ID”, which is a unique timestamp label just for that request. The hub ID is the identifier that will be used to track the progress of the request as it works its way through the NetDC system. This tag will take the form of:

<DC_NAME>:<MMM_DD,HH:MM:SS>:<PID>

where the DC_NAME represents the name of the hub data center and <MMM_DD,HH:MM:SS> represents the date and time in GMT (Greenwich Mean Time) of the request arrival. <PID> is the UNIX process ID of the code that initially runs when the request is received, and is for all intents and purposes a random number. The hub ID tag will never be modified once set. An example hub ID would read:

.HUB_ID NCEDC:Jul_04,02:00:06:4435

which would indicate that Northern California Earthquake Center at Berkeley received the original NetDC request at 2:00 a.m. on July 4th.

The request is examined and critical identifying fields are noted so that a unique directory can be made for processing the request. In this directory, certain processing files will be generated:

  • A copy of the NetDC request. The filename will be of the form netdc_req.<HUB_ID>.
  • Outgoing NetDC request files, which are to be sent to delegate data centers for processing.
  • A file named check.list, used for logging where requests have been delegated and “checking off” shipments as they arrive from the delegate data centers.
  • Text files containing information used in request processing.
  • Any data files that are ready to be shipped to the user.
  • Other files that may be used to flag the status of the request.

If the request lacks needed information, the data center technicians or the requesting user will be notified so that the problem can be remedied. Otherwise, the data request lines are split into two groups: Those to be delegated to other data centers and those to be processed locally.

As each data request line is read, NetDC will look at the network code requested to determine if the line is to be delegated to another data center. A routing table maintained on the local system contains network code references for all networked data centers and acts as a “directory listing” to determine where a given request line is to be sent. Using this information, the request line will either be processed locally or will be written to an outgoing file destined for a primary or secondary provider of that network’s data.

Fig. 3.1
Figure 3.1: Each data request line is routed to a request file destined for the appropriate data center. Destination is determined by matching the DATA_CENTER and NETWORK fields to entries in the routing table.

The typical routing table looks like the example below. The fields are separated by the vertical bar (|) character. All network codes are listed with its data center name and contact information.

BK|NCEDC|PRIMARY|netdc@quake.geo.berkeley.edu|Northern California Earthquake Data Center|Seismographic Station, Earth Sci. Bldg., Univ. of California, Berkeley, CA 94720|Douglas Neuhauser|(510) 642-0931 |doug@seismo.berkeley.edu|4096|ver3
G|GEOSCOPE|PRIMARY|netdc@geosp6.ipgp.jussieu.fr|Institut de Physique du globe de Paris|Department de Sismologie, Programme GEOSCOPE, 4, Place Jussieu, 75252 Paris cedex 05, France|Genvieve Roult|33 1 44274888|groult@ipgp.fr|8192|9802
GE|GEOFON|PRIMARY|netdc@gfz-potsdam.de|GFZ, Telegrafenberg A6, D-14473, Potsdam, Germany|Winfried Hanka|49-331-288-1213|hanka@gfz-potsdam.de|10240|2A
GE|IRIS_DMC|SECONDARY|netdc@dmc.iris.washington.edu|IRIS Data Management Center|1408 NE 45th St., Ste. 201, Seattle, WA 98105|Tim Ahern|(206) 547-0393 |tim@iris.washington.edu|4096|SEP02
IU|IRIS_DMC|PRIMARY|netdc@dmc.iris.washington.edu|IRIS Data Management Center|1408 NE 45th St., Ste. 201, Seattle, WA 98105|Tim Ahern|(206) 547-0393|tim@iris.washington.edu|4096|1.3
II|IRIS_DMC|PRIMARY|netdc@dmc.iris.washington.edu|IRIS Data Management Center|1408 NE 45th St., Ste. 201, Seattle, WA 98105|Tim Ahern|(206) 547-0393|tim@iris.washington.edu|4096|1.3

The format of these entries can be described as follows:

<NETWORK>|<DC_NAME>|<PRIORITY>|<EMAIL>|<DATA_CENTER_NAME>|<ADDRESS>|<CONTACT>|<PHONE>|<CONTACT_EMAIL>|<PEAK_MERGE_KB>|<VERSION>

When deciding where a given data request line should be routed, NetDC looks at the network code in the request line and compares it to the network codes in the route table (Current routing table can be downloaded here). If there is a match, then this line is selected, and added to a request destined for the site that will process that request line. If a specific data center was listed in the DC_NAME field then that center will be used as the destination for the request line.

However, it gets more specific than that. You’ll notice that there are sites listed as PRIMARY and there is one example of a line that is listed as SECONDARY. These entries indicate the priority that data center holds for providing that network’s data. In the example above, GEOFON is the primary provider of GE data and will take precedence in processing requests for GE data over secondary sites, such as IRIS_DMC as shown above.

There is always just one PRIMARY site. Should several sites be listed as providing data for a given network, the PRIMARY site is always selected, should it be listed. If the PRIMARY site for that network is not listed, then NetDC scans for the first SECONDARY site in the routing table and uses that site instead. There can potentially be several SECONDARY sites but there should never be more than one PRIMARY site.

The separate request files that are delegated to other sites are created in the user’s request directory and take the form of:

netdc_out.<DC_NAME>

such as:

netdc_out.GEOSCOPE

which would be a request file destined for the GEOSCOPE site.

The exception is with request lines that are delegated locally. Their files take on generic names such as “inventory.request” and “data.request” and contain just the request lines without a header. These files are read directly by the local NetDC process after routing to other sites has completed.

As these requests are sent out to the delegate sites, a checklist file is made, indicating which data centers have been delegated tasks along with the current status of request processing. The file will include email addresses from the routing table and look something like this:

GEOFON	netdc@gfz-potsdam.de	DATA PENDING
GEOFON	netdc@gfz-potsdam.de	RESP COMPLETE
GEOSCOPE	netdc@ipgp.fr	DATA COMPLETE
GEOSCOPE	netdc@ipgp.fr	INV COMPLETE
IRIS_DMC	netdc@iris.washington.edu	DATA PENDING

Those items marked as PENDING represent responses still being waited on. Items marked as COMPLETE have returned from processing and are ready to be merged with other data before shipment (assuming the requester asked to merge the data). If the user requested that the data not be merged, or certain conditions arise where merging is not possible, this field would be NOMERGE. As the course of request processing and data merging go forth, these values will be updated automatically by NetDC.

4.0 NETDC DATAGRAMS

For the NetDC system to function properly, data centers have to be able to exchange state information and request data between each other in an automated fashion. This can be achieved by establishing a protocol of formatted text that NetDC programs on either end of a connection can read and act on.

This protocol should allow for a message to identify itself as a formatted action item, indicate what data type the message relates to, and specify what kind of action or signal is to be acted on relating to that data type. In addition, it should have the capacity to encapsulate text data should larger volumes of information need to be interchanged.

This requirement is satisfied in the form of NetDC datagrams. This is a simple email message format that NetDC can read and act on to perform a variety of tasks. Sometimes the message triggers an action and sometimes data is exchanged. Datagrams are meant to be strictly software generated. User interfacing through datagrams may be performed under special circumstances but care must be taken to avoid errors when doing so.

A datagram first identifies itself through the subject line of the email message:

Subject: NETDC DATAGRAM

Following this, the body of the message will start with an action tag which triggers what is requested of the NetDC system. The template for the action tag is:

%%ACTION <CLASS>::<METHOD>

where CLASS indicates the type of data the message refers to and METHOD relates the action or trigger to be acted on that type of data. The CLASS will take the form names such as DATA, INV, or RESP. The METHOD names take on directives such as RCVOK, LOCALDONE, and RESEND.

Below is a sample datagram that is sent locally to notify the NetDC system that local processing of a DATA request is completed:

Subject: NETDC DATAGRAM
%%ACTION DATA::LOCALDONE
.HUB_ID NCEDC:Feb_25,22:07:30:17501
.NAME Bill_Mantle
.EMAIL bmantle@upthrust.seismolab.edu
.LABEL Sample_01
.DISPOSITION_USER
.DISPOSITION_HUB
.FILENAME /usr/local/processing/netdc/local_data.Feb_25.001
.END

What follows the ACTION title is a list of parameters that relates to the CLASS and METHOD of the message. For a given CLASS and METHOD, the receiving site will expect to see certain parameters in order to have enough information to carry out the desired action effectively. As you can see, the above example is telling NetDC that a particular user, Bill Mantle, has a file waiting for pickup called local_data.Feb_25.001 in a local directory for the request identified by the HUB_ID. NetDC will grab the listed file and place it in Bill Mantle’s request directory, followed by marking off the data as COMPLETE in the request checklist file. This is merely one example of datagram functionality in NetDC. Others datagrams will be illustrated throughout this manual.

For a complete specification of currently implemented datagrams, please consult Appendix A.

5.0 LOCAL REQUEST PROCESSING

After request routing is finished, each data site turns to their own local processing assignment. If a local file such as “response.request” or “data.request” should be found in the user’s request directory, then NetDC will call the appropriate functions for handling each type of request. Because response data tend to be processed in a different fashion than waveform data, and so on for other data types, NetDC maintains separate functions to handle each data type.

The code in NetDC organizes a series of function branches to implement this concept. Inventory data is handled by local_inventory() in a source file called “local_inv.c”. Response data is handled by “local_response()” in a source file called “local_resp.c”. This naming convention will generally be adhered to for functions that are called and remain running until data is returned. These functions are referred to as “persistent” and work well for data queries that take little time to process.

Waveform data tends to take more time to be assembled and can face delays of hours or even days before data is returned. This requires a slightly different calling structure where we have two functions to process a data request. One function is used for submission of the data request to the data center’s processing system. The other is used to deliver data back to the NetDC system once the data are ready. For waveform requests, the functions are kept in the source file “local_data.c” and are named “local_data_submit()” and “local_data_receive()” respectively.

These “local” functions are generally considered “fixed” into the NetDC code and it should not be necessary for any user installing NetDC to modify the code for their own use. It may be necessary for the installer to make changes to the local functions, but this condition should be avoided as much as possible, because leaving the code as-is will make future software updates easier to implement.

Let us examine the flow of request processing through these local functions (Fig 5.1). A local function is called from the main routine and the request file for that data type is passed onto it. From there parameters are added to the request before forwarding on to the interface function.

Fig. 5.1
Figure 5.1: Local request processing flow

For example, a response request would require that “local_response()” be called with the file “response.request” listed as the file to read. “response.request” would contain just .RESP lines that the local site was to process. “local_response()” will take those lines from the file and add a header to the beginning which provides information about the user requesting the response data. All of this will be passed onto the interface routine for processing a response request by opening a pipe to it.

The interface routine is an independent executable called by the local function that is specifically designed for the user’s installation site. This code needs to be developed at the time of installation in order to interface with the data center’s unique data processing methodology. The convention is to name the executables in the style of process_inv, process_resp, and process_data. These routines are generally called with no command line arguments but instead take their parameters through an input stream referred to as stdin. The requested data gets back to the local function that called it through a file that the interface routine creates. The filename of the returned data is initially determined by the local function and is passed onto the interface routine before processing begins.

Looking at the example of response request processing again, we would find “local_response()” running “process_resp” by opening a pipe and passing request information and the request lines through that pipe stream. “process_resp”, reading the stream lines one at a time, pulls out the necessary request information and then comes to the request lines. For each request line, “process_resp” will invoke the local processing system to get the response data. “process_resp” will be coded to know exactly how to request this data and how to filter the output into a standard format. This return data will go to a filename that was designated by “local_response()”. This way, when “process_resp” exits, “local_response()” will know exactly where to find the output.

Upon return to the local function, the output data file is read and the contents are transferred to a specially named file in the user’s request directory. The data product is checked off in the checklist file and if that product type is ready for shipment (i.e. there is no other data to merge), the shipment routine is called and the product is sent to the user.

This description of local request processing sums up the general aspects of how NetDC handles these data requests. Each data type will have some degree of variation in terms of how the data query is carried out. The responsibility lies with the site installer to determine what is required to interface NetDC with the local site’s information base. This should not require a great deal of the NetDC code modification. Instead, the calling structure has been organized such that the installer really only needs to modify or create the interface routine to access and filter the data that the user is requesting.

6.0 INVENTORY REQUESTS

Inventory requests (.INV) ask for information regarding the data holdings of a particular data center. The information can generally be extracted from a database but the source of data may also be a flat file index or a filesystem directory tree. Inventory data is provided in a standard ASCII format and can generally be forwarded to the requesting user directly through email.

Local processing begins with the “inventory.request” file being fed to the function local_inventory(), where the data is queried and returned. The request file is laid out as a series of request lines, like the example below:

.INV *
.INV GEOFON *
.INV GEOFON AA B*
.INV GEOFON AA TATO * *
.INV * IU ANMO * * "1995 03 03 02 24 01.3" "1995 03 03 07 00 30.0"
.INV * II KIV * "BHE BHN BHZ" "1996 05 01 00 00 00.0" "1996 05 01 05 00 00.0"

Looking at these sample inventory requests, you’ll notice that some of the request lines have less than the total number of fields filled in. Some omit the start and end time, some exclude the channel and location, and still others leave out the station and network. What this illustrates is the inventory request’s unique ability to accept a variable number of field entries which will determine how much information is returned. More will be explained in the output examples below.

process_inv, the inventory interface routine, is called from local_inventory() supplying the following parameters, followed by the contents of the file “inventory.request”:

.NAME	(the name of the requesting user)
.EMAIL	(the email address of the requesting user)
.HUB_ID	(the assigned ID tag of the netdc_request)
.LABEL	(the label of the netdc_request)
.DISPOSITION_USER	(how to deliver output to the user)
.DISPOSITION_HUB	(how to deliver output to the hub data center)
.FILENAME	(the filename where the output will be placed)

This header information allows NetDC to know how to deliver the output once it has been constructed. process_inv must take each request line and form a query to the local information base. As the query is passed, it should in some fashion extract the output and filter the information into the output format shown below:

<REQUEST LINE 1>
[<LIST CONTENT DESCRIPTION 1>]
[<HEADER LINE 1>]
<line 1>
<line 2>
< .... >
[<LIST CONTENT DESCRIPTION 2>]
[<HEADER LINE 2>]
<line 1>
<line 2>
< .... >
[...]
<REQUEST LINE 2>
[<LIST CONTENT DESCRIPTION 1>]
[<HEADER LINE 1>]
<line 1>
<line 2>
< .... >
...etc...

The REQUEST LINE displays the inventory request line being processed. This helps associate the inventory output with the query that triggered it. The LIST CONTENT DESCRIPTION indicates what kind of list follows. This description can be one of the following:

  • AVAILABLE DATA CENTERS
  • AVAILABLE SEISMIC NETWORKS
  • AVAILABLE STATIONS
  • AVAILABLE LOCATIONS
  • AVAILABLE CHANNELS
  • AVAILABLE WAVEFORM DATA

Following this is the HEADER LINE, which can vary depending on the content listing. This is a standard text line that gives the user a quick reference to what the fields stand for, much like a table header.

Each of the lines afterward contain the records that fit the query criteria. There can be zero lines, one line, or many lines of output, depending on the query and the matching elements.

For each request line there can be many of these “content blocks” consisting of description, header, and data lines. The arrangement of these blocks is recursive in nature, meaning that a single category is exhaustively described down through all of its subcategories before going to the next category. In other words, when printing information on a network, the next block of information will be on the first station of the network and the block after will be on the first location or channel of that station, and so forth. This arrangement helps link subcategories to their parent categories. A given channel is unmistakably part of the previously listed station, which is a part of the previously listed network, etc.

The following are examples showing what output a user could expect to see from various inventory requests. Carefully note the format of each request, since inventory requests can be made with a variable number of fields, depending on how much information is desired.

In this first example, a query for available data centers comes back with a report with lines pulled directly out of the local routing table. The query is made by placing a wildcard in the DC_NAME field, with no other request fields specified. In the output, the fields are contained within double-quotes and space-separated. The field names are indicated just below the title.

.INV *
[AVAILABLE DATA CENTERS]
[NETCODE DC_NAME PRIORITY EMAIL INST_NAME ADDRESS CONTACT PHONE CONTACT_EMAIL PEAK_MERGE_KB VERSION]
"GE" "GEOFON" "PRIMARY" "netdc@gfz-potsdam.de" "GFZ Potsdam" "GFZ, Telegrafenberg A6, D-14473, Potsdam, Germany" "Winfried Hanka" "49-331-288-1213" "hanka@gfz-potsdam.de" "2048" "1.4"
"G" "GEOSCOPE" "PRIMARY" "netdc@ipgp.jussieu.fr" "Institut de Physique du globe de Paris" "Department de Sismologie, Programme GEOSCOPE, 4, Place Jussieu, 75252 Paris cedex 05, France" "Genvieve Roult" "33 1 44274888" "groult@ipgp.jussieu.fr" "1024" "2.0"
"IU" "IRIS_DMC" "PRIMARY" "netdc@iris.washington.edu" "IRIS Data Management Center" "1408 NE 45th St., Ste. 201, Seattle, WA 98105" "Tim Ahern" "(206) 547-0393" "tim@iris.washington.edu" "500" "1.6"
"BK" "NCEDC" "PRIMARY" "netdc@quake.seismo.berkeley.edu" "Northern California Earthquake Data Center" "Seismographic Station, Earth Sci. Bldg., Univ. of California, Berkeley, CA 94720" "Douglas Neuhauser" "(510) 642-0931" "doug@seismo.berkeley.edu" "400" "3.8"

This next example shows a list of available networks. This occurs when the network field is filled with a wildcard or requested network code. In this case, the database at the PRIMARY site provides the data. Information on the data center from the routing table is not included in such a query.

.INV * IU
[AVAILABLE NETWORKS]
[NET_CODE NETWORK_NAME OPERATORS COMMENTS]
"IU" "IRIS/USGS" "Albuquerque Seismic Laboratory" ""

A list of available stations will show the station name and related information, including the effective time. Note that more than one entry can exist for a given station in the case of having more than one effective time, as illustrated in the example below (values may be hypothetical). It is conventional to represent an open-ended end time with a very large value, such as 2500,365,23:59:59.9999.

.INV GEOSCOPE G *
[AVAILABLE NETWORKS]
[NET_CODE NETWORK_NAME OPERATORS COMMENTS]
"G" "GEOSCOPE" "IPGP" ""
[AVAILABLE STATIONS]
[STATION LATITUDE LONGITUDE ELEVATION DESCRIPTION START_EFF_TIME END_EFF_TIME]
"AGD" "11.529" "42.824" "450.0" "Arta Grotte, Djbouti" "1985,68,00:00:00.0000" "1990,343,00:00:00.0000"
"AGD" "11.514" "42.821" "450.0" "Arta Grotte, Djbouti" "1990,347,00:00:00.0000" "2500,365,23:59:59.9999"
"BNG" "4.435" "18.547" "378.0" "Bangui, Republique Centrafricaine" "1987,345,00:00:00.0000" "2500,365,23:59:59.9999"
"CAY" "4.948" "-52.317" "25.0" "Cayenne, French Guyana" "1985,203,00:00:00.0000" "1991,272,00:00:00.0000"

It is important to note that locations and channels are provided in the same line of output. Location identifiers are very closely tied in uniqueness to channel names, so it makes sense to include both on the same output line. It should be noted that a user wishing to see channel information needs to fill both the location identifier and channel field in the query, as shown in the example below. Also note the use of two channel strings contained in double-quotes in the query.

.INV GEOSCOPE G * * "MH? LH?"
[AVAILABLE NETWORKS]
[NET_CODE NETWORK_NAME OPERATORS COMMENTS]
"G" "GEOSCOPE" "IPGP" ""
[AVAILABLE STATIONS]
[STATION LATITUDE LONGITUDE ELEVATION DESCRIPTION START_EFF_TIME END_EFF_TIME]
"AGD" "11.529" "42.824" "450.0" "Arta Grotte, Djbouti" "1985,068,00:00:00.0000" "1990,343,00:00:00.0000"
[AVAILABLE CHANNELS]
[LOCATION CHANNEL LATITUDE LONGITUDE ELEVATION DEPTH AZIMUTH DIP SAMPLE_RATE CHANNEL_TYPE INSTRUMENT_TYPE START_EFF_TIME END_EFF_TIME]
" " "MHE" "11.529" "42.824" "450.0" "0" "90" "0" "5" "CG" "Streckeisen STS-1" "1985,068,00:00:00.0000" "1990,343,00:00:00.0000"
" " "MHN" "11.529" "42.824" "450.0" "0" "0" "0" "5" "CG" "Streckeisen STS-1" "1985,068,00:00:00.0000" "1990,343,00:00:00.0000"
" " "MHZ" "11.529" "42.824" "450.0" "0" "0" "-90" "5" "CG" "Streckeisen STS-1" "1985,068,00:00:00.0000" "1990,343,00:00:00.0000"
[AVAILABLE STATIONS]
[STATION LATITUDE LONGITUDE ELEVATION DESCRIPTION START_EFF_TIME END_EFF_TIME]
"BNG" "4.435" "18.547" "378.0" "Bangui, Republique Centrafricaine" "1987,345,00:00:00.0000" "2500,365,23:59:59.9999"
[AVAILABLE CHANNELS]
[LOCATION CHANNEL LATITUDE LONGITUDE ELEVATION DEPTH AZIMUTH DIP SAMPLE_RATE CHANNEL_TYPE INSTRUMENT_TYPE START_EFF_TIME END_EFF_TIME]
" " "LHE" "4.435" "18.547" "378.0" "0" "90" "0" "1" "CG" "Streckeisen STS-1" "1987,345,00:00:00.0000" "2500,365,23:59:59.9999"
" " "LHN" "4.435" "18.547" "378.0" "0" "0" "0" "1" "CG" "Streckeisen STS-1" "1987,345,00:00:00.0000" "2500,365,23:59:59.9999"
" " "LHZ" "4.435" "18.547" "378.0" "0" "0" "-90" "1" "CG" "Streckeisen STS-1" "1987,345,00:00:00.0000" "2500,365,23:59:59.9999"

The list of available waveform data will indicate the start and end times of the available data, the number of samples, and the number of bytes. By entering a start and end time, you not only get waveform information, but only the channels and stations that have effective times within the time window will be displayed.

.INV IRIS_DMC CD WMQ * BHZ "1990 01 16 00 00 00" "1990 11 01 00 00 00"
...(network, station, and channel information)...
[AVAILABLE WAVEFORM DATA]
[START_TIME END_TIME NUMBER_SAMPLES NUMBER_BYTES]
"1990,135,16:34:51.2900" "1990,135,16:39:48.8900" "5952" "12288"
"1990,222,21:19:35.8700" "1990,222,21:24:33.4700" "5952" "12288"
"1990,255,15:35:23.0700" "1990,255,15:40:20.6700" "5952" "12288"

The time strings in the above examples are in standard format of year, julian day, and time down to the ten-thousandth of a second. Precision at least down to the second is recommended. The format can be described in C-code syntax:

"%04d,%03d,%02d:%02d:%07.4f",(int)year,(int)jday,(int)hour,(int)min,(float)sec

Inventory data will typically be emailed back to the user. In the case of merging inventory data from different sites, the data will be concatenated with an identification header at the head of each data center’s output. An identification header is always included with an inventory shipment and looks like this example:

***Inventory Shipment***
		   From: GEOFON
	 For request ID: GEOFON:Feb_12,01:27:30:1874
Originally Requested by: Joe_Seismologist (jseis@quake.institute.edu)
		     of: The Quake Institute
	  Request Label: netdc_data_gather_1

If an inventory shipment is too large to be mailed to the user (this limit is set by the data center), it is instead diverted to FTP transfer. The user is notified of the shipment through email, but the data is made available through FTP.

7.0 RESPONSE REQUESTS

Response requests (.RESP) return detailed station and channel information to the user in the form of a specially formatted ASCII text file. When a data center is delegated to process .RESP requests, a file named “response.request” is created. The output for a response request is pulled from response files generated by the IRIS utility rdseed or similar utility.

An excerpt of .RESP text output is presented here for illustration:

   B050F03 	Station: 	SAO
   B050F16 	Network: 	BK
   B052F03 	Location: 	??
   B052F04 	Channel: 	BHE
   B052F22 	Start date:    	1996,143,21:30:00
   B052F23 	End date: 	1998,155,21:43:00
   #
   #		+------------------------------------+
   # 		| Response (Poles    and Zeros)      |
   # 		| BK SAO BHE 			     |
   # 		| 05/22/1996 to 06/04/1998   	     |
   # 		+------------------------------------+
   #
   B053F03 	Transfer function type: 	A [Laplace Transform (Rad/sec)]
   B053F04 	Stage sequence number: 		1
   B053F05 	Response in units lookup: 	M/S - Velocity in Meters Per Second
   B053F06 	Response out units lookup: 	V - Volts
   B053F07 	A0 normalization factor: 	986.834
   B053F08 	Normalization frequency: 	0.12
   B053F09 	Number of zeroes:    		2
   B053F14 	Number of poles:    		4
   # 		Complex zeroes:
   # 		i real imag real_error    imag_error
   B053F10-13 	0 0.000000E+00    0.000000E+00 0.000000E+00 0.000000E+00
   B053F10-13 	1 0.000000E+00    0.000000E+00 0.000000E+00 0.000000E+00
   # 		Complex poles:
   # 		i real imag real_error    imag_error
   B053F15-18 	0 -1.234120E-02    1.234150E-02 0.000000E+00 0.000000E+00
   B053F15-18 	1 -1.234120E-02    -1.234150E-02 0.000000E+00 0.000000E+00
   B053F15-18 	2 -1.958780E+01    2.456170E+01 0.000000E+00 0.000000E+00
   B053F15-18 	3 -1.958780E+01    -2.456170E+01 0.000000E+00 0.000000E+00
   #
   # 		+------------------------------------+
   # 		| Channel Sensitivity/Gain    	     |
   # 		| BK SAO BHE 			     |
   # 		| 05/22/1996 to 06/04/1998  	     |
   # 		+------------------------------------+
   #
   B058F03 	Stage sequence number: 		1
   B058F04 	Gain: 				2.255790E+03
   B058F05 	Frequency of gain: 		1.200000E-01 HZ
   B058F06 	Number of calibrations:    	0
   #
   # 		+------------------------------------+
   # 		| Response (Poles    and Zeros)      |
   # 		| BK SAO BHE 			     |
   # 		| 05/22/1996 to 06/04/1998   	     |
   # 		+------------------------------------+
   #
   B054F03 	Transfer function type: 	D
   B054F04 	Stage sequence number: 		2
   B054F05 	Response in units lookup: 	V - Volts
   B054F06 	Response out units lookup: 	COUNTS - Digital Counts
   B054F07 	Number of numerators:    	0
   B054F10 	Number of denominators:    	0
   #
   # 		+------------------------------------+
   # 		| Decimation 			     |
   # 		| BK SAO BHE 			     |
   # 		| 05/22/1996 to 06/04/1998 	     |
   # 		+------------------------------------+
   #
   B057F03 	Stage sequence number: 		2
   B057F04 	Input sample rate: 		5.120000E+03
   B057F05 	Decimation factor:   		1
   B057F06 	Decimation offset:    		0
   B057F07 	Estimated delay (seconds): 	6.152000E-03
   B057F08 	Correction applied (seconds): 	-6.250000E-03
   #
   # 		+------------------------------------+
   # 		| Channel Sensitivity/Gain   	     |
   # 		| BK SAO BHE 			     |
   # 		| 05/22/1996 to 06/04/1998  	     |
   # 		+------------------------------------+
   #
   B058F03 	Stage sequence number: 		2
   B058F04 	Gain: 				4.270340E+05
   B058F05 	Frequency of gain: 		1.200000E-01 HZ
   B058F06 	Number of calibrations:    	0

…(and so on)…

In terms of requesting .RESP data, the user must fill out all of the request fields, including the start and end time. The flexibility found in inventory requests does not carry over to the .RESP request type. By supplying the necessary information, you can expect to receive .RESP output that covers the specified stations and channels that have effective times within the start and end time request fields.

Because the output is ASCII text, many of the response shipments will be sent to the user through email. If the response output is especially large, the user will receive the data through FTP, accompanied by notification of the shipment through email.

8.0 WAVEFORM REQUESTS

Waveform data (.DATA) will represent the largest type of request processing in the NetDC system. The files created will generally be quite large in size and processing times can take hours or days in duration, depending on the scope of the user request and the speed of each data center’s waveform processing system. Despite all this, waveform data represents the core piece of information for users interested in seismic data and NetDC has been designed around having such requests fulfilled.

Unlike inventory requests, waveform requests cannot be requested with variable number of fields filled in. All fields must be included in a “.DATA” request line, including the start and end time, in order for the request line to be accepted.

A sample data request line is shown here for illustration:

.DATA * II BFO * BHZ "1998 02 03 04 20 34" "1998 03 04 20 20 00"

A request for waveform data will take considerably longer to process than requests for inventory and response data. If the user has mixed inventory or response request lines with waveform request lines the inventory and response data will generally be returned sooner than waveform data. NetDC cannot control or estimate how long processing for a request will take.

The default format for waveform data is binary SEED format. Because it is binary data, it is not feasible to email the shipment to the user (not everyone can read mail attachments), so the default form of shipment is “FTP”. Another issue is merging of data. SEED products produced at different data centers are collected at the hub site and need to be concatenated into a single output file. However it is difficult to separate out the individual SEED volumes from a single file when they have simply been appended to one another. The solution currently implemented to get around this problem is to ship the waveform data as a UNIX “tar” archive, which will consist of individual SEED volumes produced at each data center.

Fig. 8.1
Figure 8.1: With three separate SEED volumes, it is preferable to package them into a “tar” file rather than append the three volumes together

If any information is needed on the particulars of SEED volumes, please contact the IRIS Data Management Center or consult our web site at http://www.iris.edu/dms/nodes/dmc.

9.0 PRODUCT SHIPMENT

NetDC has the important goal of ensuring that data processed at various sites finds its way back to the user in an automated fashion. In creating an interface where any NetDC site can be contacted to request data, NetDC has the responsibility to ensure that the user gets the requested data regardless of where the original request was directed. This necessitates effective request and product tracking and requires reliable data exchange mechanisms.

Networked Data Centers gives the user a choice between receiving data from individual data centers and receiving the data as a single merged product from the data center originally contacted. The former simply gives permission to the delegated data centers to forward their data products directly to the user. The latter requires automated information flow back to a single data center, as well as request tracking and data holding procedures.

At a given data center processing a request, a directory is created that becomes the home for tracking and storing all data produced. Requests are forwarded from here to other sites and temporary files related to the request are kept here as well. This directory is referred to as the user’s “request directory”. A request directory is created at both the hub site and the delegate sites for a given request.

Fig. 9.1
Figure 9.1: Layout of user request directory tree

As data products are completed, they are returned to the user directory and given a predictable file naming structure for easy organization. The file name pattern is this:

<DATA_TYPE>.<HUB_ID>.<DC_NAME>

where the DATA_TYPE is the kind of data in the file, whether it’s inventory, response, or waveform data. The HUB_ID is the tag that is always present for any of the data products and remains static throughout the life of the networked request. This unique tag allows NetDC to track the progress of a request throughout the network as processing takes place. Every data center uses the same HUB_ID for a given request. The DC_NAME tag at the end lists the data center code name where the data product is produced. This helps to differentiate data produced at one center from that produced at another. Both products will share the same HUB_ID but they will have their own unique DC_NAME. Examples are shown here:

DATA.GEOFON:Mar_09,00:19:14:5975.GEOSCOPE
INV.IRIS_DMC:Apr_04,10:45:56:1123.ORFEUS
RESP.ORFEUS:Jul_23,13:45:32:12922.ORFEUS

Data files with the above mentioned naming structure will begin to appear in the user’s request directory. The “check.list” file will note the arrival of these data files as they appear and change the status entry in the checklist.

If the user requested that the data be merged, the checklist will have entries with status PENDING. This means that NetDC is waiting for the data product to come back. Once the data has arrived, the status is changed to COMPLETE. Normally, the role of the hub site is to monitor incoming data from delegate sites, marking off arrivals as complete and waiting for all data to come in. When all delegate data products have been collected, the data products are combined together into one file and shipped to the user.

In the cases where the data is not to be merged, the “check.list” file will instead have all entries with NOMERGE status so that all delegate sites know immediately that they are to send their data directly to the user.

If too much data is produced for merging, the hub site has the option of refusing any further data be delivered to it and instead instructs the delegate site to send the shipment directly to the user and marks the “check.list” with NOMERGE. Products in the “check.list” file that are marked as NOMERGE are ignored with regard to the hub site waiting to send a merged shipment. Should there be some products COMPLETE as well as some products marked NOMERGE for a given data type, only the ones that are marked COMPLETE are included in a merge shipment. The others are assumed to be shipped to the user by the delegate site.

When it comes to shipping data products, each data type is treated as a separate entity. Inventory shipments are shipped on a schedule independent from waveform shipments and response shipments. If the data for responses should become available before waveform data, the user can be assured that the response data will be shipped without delay. Also of note: Data of different types are never merged together. You won’t find response data mixed with inventory data. They are kept separate and only data of the same type is merged in a shipment. Should a user request response and waveform data, they will receive two separate shipment volumes, not one.

Upon shipment to the user, the data file, whether a merged data file or not, is renamed in the FTP directory to match the LABEL that the user specified in his request. The filename pattern in this case is:

<LABEL>.<DC_NAME>.<PID>

If a label was not provided by the user, the NetDC system generates a random tag and uses that as a label for the shipment. The labeling feature is meant as a convenience to the user receiving the shipment, since it results in a less-cryptic filename than what is used internally by NetDC and allows the user to specify a personalized form of organization to the data they collect. Once the user’s requested data has been created in FTP, the contents are either sent to the user through email or the user is notified that the data can be retrieved. If so requested, NetDC can even push the data to the user’s FTP site.

Once all data products have been shipped from a user’s request directory, a flag file “SHIPPED” is created so that the NetDC system will later remove the directory. Request directories, for the most part, will remain cleaned up and not accumulate on the file system.

Over time, shipment files *wil*l accumulate in the FTP shipment directory as NetDC fulfills requests. It is a good idea to institute a separate cleanup procedure for this directory, removing files as they age past a certain number of days. More will be discussed on this issue later in the manual.

Datagrams make it possible to effectively enact product merging between a delegate site and a hub site. A certain protocol is followed that helps the transaction to complete effectively. First, when a shipment is ready to be sent to a hub site, the delegate site sends a message with the action word SHIPRDY. This tells the hub site that a delegate wishes to send it some data belonging to a certain user’s request. This datagram looks something like this:

%%ACTION DATA::SHIPRDY
.HUB_ID GEOSCOPE:Feb_23,22:54:23:1176
.NAME Joe Seismologist
.EMAIL joe@host.seismolab.edu
.DELEGATE ORFEUS
.SHIPTO netdc@ipgp.fr
.REPLYTO netdc@knmi.nl
.SIZE 23 KB
.MEDIA FTP
.DISPOSITION_USER PULL
.DISPOSITION_HUB PUSH ftp.ipgp.jussieu.fr /pub/netdc
.FILENAME joe_request_1.ORFEUS.3498
.LABEL joe_request_1
.END

The receiving site will take this information and decide to either send back a RCVRDY datagram, which tells the delegate that it’s okay to send the data, or it will send a NOMERGEdatagram, which tells the site to send the data directly to the user. The NOMERGE override will usually occur because the product being sent by the delegate is too large for the hub site to accommodate. If the hub site sends the RCVRDY notice, the datagram will contain all the same parameters as the SHIPRDY datagram

Once the delegate site receives the RCVRDY datagram, it initiates the file transfer, either through email or through FTP, depending on how the parameters were set. In both cases a SHIPMENT datagram is sent. Data being sent by email will be attached to the SHIPMENT datagram. The data is extracted by the hub site’s NetDC routines from the email message and written to the appropriate request directory. If the data is being sent by FTP, the delegate site actually pushes the data to the hub site’s anonymous FTP directory, based on the instructions provided in the DISPOSITION_HUB parameter and using an automated FTP client. Once the data has been placed at the hub site, the delegate sends a SHIPMENT datagram telling the hub site to grab the data file.

Once the hub site has confirmed that it has received the data, it will send a datagram back with the message RCVOK, allowing the delegate site to free itself from that request, enact cleanup routines, and so forth. Should the data not get to the hub site intact, it will send a RESEND message to the delegate site to have it try again. Only one RESEND will be attempted before failure. After a second failed attempt to merge, the hub site will instruct the delegate to switch to NOMERGE and send the data directly to the user. This is intended as an emergency fallback should transfer of data for merging fail.

Fig. 9.2
Figure 9.2: Diagram of merge shipment protocol between two sites.

The user receiving data can also take advantage of some level of automation in the way data is received. The DISPOSITION line in the NetDC request allows the user to request that the data is pushed his or her site through FTP, as opposed to having the user get the data manually. Note that this only applies to FTP shipments since email shipments are by default pushed to the user.

As indicated in an earlier chapter, the DISPOSITION line accepts one of two modes. The default mode is PULL, which requires no additional parameters and merely states that the user will grab the data manually from the hub site. The other mode is PUSH, which accepts two additional parameters. The first is the name of the FTP host that the user wishes the shipment to be transferred to, and the second is the anonymous FTP directory where the shipment is to go. If either one of these parameters is not present or prove untenable, the transfer fails and NetDC falls back to PULL mode, simply notifying the user through email of the shipment and telling them where to pick the product up at the hub site. An example PUSH directive looks like:

.DISPOSITION PUSH ftp.gfz-potsdam.de /pub/dropoff/netdc

Another feature of NetDC is the facility to have the user declare a maximum duration for product merging. This is specified as a number in the MERGE_DATA field of the request following the YES flag which specifies the maximum number of days that NetDC will wait for incoming data before sending a data shipment to the user.

.MERGE_DATA YES 3

At times, certain delegate sites will be unable to deliver their data for request merging in a timely fashion. NetDC by default waits for all the delegate data products to be accounted for. Should even one site be unable to deliver its data, NetDC could find itself waiting indefinitely. With this timeout feature however, the user is able to get as much data as possible in a suitable amount of time. The maximum number of timeout days is 90, the default number is zero, which implies same-day delivery.

The routine that carries out this timeout feature is called “shipment_handler” and is meant to be run on a daily basis. It runs through all of the current user request directories and checks to see if the request has expired. At the same time, it cleans up any directories that have been flagged as SHIPPED, which means that the request has been satisfied and the products shipped off to the user.

If “shipment_handler” finds that a request is overdue, it bundles up any products that are currently present and sends off merged data products to the user. The directory is then flagged with a SHIPPED empty file so that it will be cleaned up on the next pass of “shipment_handler”. Delegate sites with overdue shipments that send a SHIPRDY datagram will be notified with a NOMERGE datagram in reply.

10.0 NETDC INSTALLATION AND SETUP

NetDC is a software distribution that contains a number of source code directories, each with their own “makefiles” for building executables. There is also online documentation included. A running NetDC system will take shape with the data center’s administrator building the system up from the NetDC distribution.

Installation and setup will involve the following steps:

  1. Creation of a NetDC user account with email reception capabilities
  2. Un-tar-ing the NetDC software package in an installation directory
  3. Creating a request processing directory
  4. Setting up an FTP directory for outgoing shipments
  5. Building the software executables
  6. Setting up soft links to executables
  7. Customizing interface code to local system
  8. Installing a “.forward” file for email processing
  9. Testing the NetDC installation

Let us discuss each step in detail:

1. CREATION OF A NetDC USER ACCOUNT

Ask your system administrator to set up a user account called “netdc”. The account must be visible to the machine receiving email for NetDC and the account must also have a mail spool set up for it. The account will require a home directory and can run any startup shell desired. In order to make installation easy, the “netdc” username should be accessible to whoever installs and maintains the software, through such avenues as “login” or “su”. Either the administrator of the NetDC software knows the password to the account or that person has root access and can just drop into the account. In the second case, the password can be an unknown value, for security purposes.

2. UN-tar-ING THE NetDC SOFTWARE PACKAGE

As user “netdc”, copy the NetDC tar distribution file to a write-capable directory (like the “netdc” home directory) and type:

tar -xvf <netdc_file>.tar

where <netdc_file> is the base name of the tar file. The tar archive will extract to the local directory only. What is written are the source code directories from which the executables will be built. The directories you will see there are:

  • date_calc: code for performing date calculations
  • doc: contains online documentation
  • ftpcl: code for the ftp client
  • netdc_req: the main code for NetDC…a majority of the functionality can be found here

There may be other directories present, that will be provided with enhancements and upgrade to the code. In each of the source code directories, there is a file called “Makefile”, that needs to be edited before running. This can be done after setting up the processing and FTP directories.

3. CREATING A REQUEST PROCESSING DIRECTORY

Find a write-capable directory to be used for request processing and type:

mkdir requests

Next type this:

mkdir tables

The routing table will be kept there, as well as a log file or processing activity. Next type:

mkdir tmp

This is a directory for writing temporary files. NetDC will do its best to keep the “tmp” and “requests” directories cleaned up, but it will pay to check in on it now and then to see if it needs files to be removed by hand.

4. MAKE AN FTP DIRECTORY

Set up a directory for shipments to be placed for anonymous FTP access. Make sure this directory is write-capable by the “netdc” username. Also, it is assumed that your site has its own cleanup methods in place for FTP directories to remove old files. The location of this FTP directory will be specified in the makefile for the main NetDC routines. Fig. 10.1 illustrates a typical anonymous FTP layout:

Fig. 10.1
Figure 10.1: Anonymous FTP directory layout

5. BUILDING THE SOFTWARE

Proceed inside each of the source code directories that came with the distribution and do the following:

a. Edit the makefile, changing just the variable settings so that they apply to your system environment. Take note of the directory settings that you have for your binary links, request processing, and FTP access. Also note that if your system specifications change in the future (new pathnames, new installation, name changes), the makefile needs to be edited and the executables rebuilt. Save these edits and return to the shell.
b. Type: make
c. If you get a “rm” error regarding not finding a “core” file or “.o” files, just ignore it. This is just noise from “make”.
d. Check to see that there are no compilation errors. If there are, it is possible that you need to do some fine-tuning with your compiler flags, specifying the proper compiler, and setting the proper environment variables for header files and libraries. There may be some system-specific conflicts that have to be rooted out in the source code, but the code was written to minimize this. Be sure that your compiler is ANSI C-compatible, because the code was written in this standard.
e. You can leave the created binaries in their respective directories and have symbolic links point to them from your “bin” directory. The other alternative is to copy all of the executables to the “bin” directory.

6. SET UP BINARY DIRECTORY

In your NetDC home directory, type:

mkdir bin

This will be the directory where the binaries are referenced. This is referred to as BIN_DIR in the NetDC configuration makefiles. Inside, soft-links can be made to each of the executables that have been built in the source directories. This can be done by hand using the “ln” command, or done with a script that Is provided, called “setup_links.csh”.

In the NetDC home directory, you will find “setup_links.csh”. To run it, simply type:

./setup_links.csh

to have soft-links placed in the bin directory. This will include links to all files that have executable status. The other alternative is to copy all of the executables to the “bin” directory.

7. CUSTOMIZING INTERFACE CODE TO LOCAL SYSTEM

The installing administrator will have to do some customized coding to interface the NetDC request processing routines with their local data center’s information base. The three main routines that will need to be customized are process_inv, process_resp, and process_data. Example code is provided in C source format and it is recommended that this code is modified for use as opposed to writing the interface code from scratch. This code will be called from their respective “local” functions, which are represented in the source files local_inv.c, local_resp.c, and local_data.c.

More detail will be provided on interface code later in the manual.

8. INSTALLING A “.forward” FILE

The “.forward” file is necessary for automated mail processing to take place. Simply use your text editor as user “netdc” and create a file called “.forward”. Inside, enter just one line and save the file:

"|<BIN_DIR>/netdc_request"

where BIN_DIR is the full file path of your NetDC binary directory. Be sure to include the double-quotes. An example might look like this:

"/users/netdc/bin/netdc_request"

Then, type:

chmod 644 .forward

9. TESTING THE NetDC INSTALLATION

Once all of these steps have been followed through, including the building of the interface code and the placement of a .forward file, NetDC is ready for testing. Conduct tests along the following lines as a username other than “netdc”:

  1. Mail an inventory request to the local site, asking for a list of data centers.
  2. Mail an inventory request to a remote site, asking for its list of data centers.
  3. Mail a request (for any kind of data), to the local site, requesting data for a network code that the local site supports.
  4. Mail a request to the local site for data that another site supports.
  5. Mail a request to a remote site for data that the local site supports.
  6. Mail a request to the local site requesting data for many different networks (note that a wildcard for the network code will only route to the local site).
  7. Perform the same request, but insist upon data merging.
  8. Mail the same request to another site and request data merging.

Acting in the role of a user requesting data, assess the feedback and the data returned to see that NetDC is working properly. Also be on the lookout for errors and watch the “netdc_activity.log” file (in the “tables” directory) for anything unusual. More will be described on troubleshooting and monitoring later in Chapter 12.0.

11.0 WRITING YOUR INTERFACE CODE

Perhaps the most challenging aspect of NetDC installation can be found in the necessary step of writing the routines to allow NetDC to “talk” to a data center’s information base, which can consist of a database, flat file tables, disk archives, and/or mass storage systems. Because of the wide variation between data centers in terms of how data is stored and retrieved, the writing of this code must be left up to the individual who is installing NetDC.

Effort has been made to reduce the amount of code that the installer must modify or write. Essentially, the bulk of the code in NetDC can remain as-is, and generally should be left alone in the interest of technical support and ease of future upgrades of NetDC. Essentially, the local processing system is broken into data type branches, as described earlier in the manual. Currently, there are three branches: one for inventory processing, one for response data, and one for waveform data. Each of these branches has a stub that must be implemented by the installer. Each stub takes the form of an independent executable that NetDC calls. It is typical to name the executable with the prefix process_, followed by a shorthand variation of the data type. For DATA requests, it’s process_data, for INV, it’s process_inv.

Example “C” code of these implementations is provided with the distribution, and represents an actual working executable, most likely a version that currently runs at IRIS Data Management Center. It is recommended that the administrator read the example code and consider how to modify the code to work at their site, as opposed to writing the interface from scratch. The “process” code can be written in one of any number of compiled or scripting languages, but the key is that the interface code must be able to run from the UNIX shell with no command line arguments. The “process” code gets its parameters and request lines through “standard input.” The output data always ends up in a file named by the FILENAME parameter that is passed to it.

To see an example of the flow of a NetDC interface, let’s look at the code found in process_inv.c. First, the code is called from the NetDC function local_inventory() with a popen() call that contains the command process_inv. The command is opened in write mode, so that process_inventory() can read parameter information as well as the INV request lines themselves. When process_inv starts, it goes through a process of initialization, setting up a few preliminary values and some storage memory. Then comes the line reading loop, where each line from standard input is read. The line is “dressed up” for parsing, and then the component fields are copied and examined separately (see Fig. 11.1).

Fig. 11.1
Figure 11.1: Process inventory flow of operation

For each line that is read, the code assesses whether the input line has .INV as the first field. If it is not an inventory request line, then it runs through a series of checks to see if the first field indicates a particular parameter that will provide information regarding the request. process_inv, as well as the other interface code, will need to be able to accept and store the following parameters:

.NAME		(the name of the requesting user)
.EMAIL			(the email address of the requesting user)
.HUB_ID			(the assigned ID tag of the netdc_request)
.LABEL			(the label of the netdc_request)
.DISPOSITION_USER	(how to deliver output to the user)
.DISPOSITION_HUB	(how  to deliver output to the hub data center)
.FILENAME			(the filename where the output will be placed)

When an inventory request line is read, each field is separately stored. Because of the variable-length nature of inventory requests, extra checks are put in place to determine how many request fields are present, which will set certain triggers as to what kind of information is being requested.

As the request lines are being read, an output file is opened using the filename specified with the FILENAME parameter. With the file opened and ready to have data written to it, process_inv goes on to form an SQL query for the local database. With the SQL string constructed from the request fields, a read pipe is opened that will call upon the database’s query program, with the SQL query being included on the command line. A loop is then created to read the lines that return through the read pipe. This will be the inventory output, which is filtered and formatted into NetDC inventory format (see Chapter 6.0) before being written to the output file. After all the output is read, the query pipe is closed.

As a special case, an inventory request asking just for information on Networked Data Centers (.INV *) will return information directly from the routing table, as opposed to the database.

After all of the request lines have been processed in this fashion, the output file is closed and the interface program exits. Context then returns to the local_inventory() function, where it proceeds to get the output file named by the FILENAME parameter and this data is forwarded onto the NetDC request directory.

The example code provided in the NetDC distribution may stray slightly in its behavior from the above description, but the basic flow of the code remains the same.

In some cases, it may be necessary to set up a different query calling structure than the example described above. One possibility is to have the SQL command passed through standard input (STDIN) and have the output written to a file that can be retrieved afterward. Another case could be where the query routine is called with both an input pipe and output pipe, with SQL being fed into the input pipe and the output being read from the other pipe. The application of these techniques is left up to the individual developer.

Fig. 11.2
Figure 11.2: Three examples of connecting to the information base

The end result of the interface code (such as process_inv) should be that it reads request input and places the output in a file with a specified name. This can be tested in stand-alone mode by setting up a test request input and passing it to your interface code. The interface code should be able to access the information base and produce a formatted output file that can be retrieved later by the local function (such as local_inv()).

A slightly different approach must be taken with data types that take a protracted length of time to produce, such as waveform data. In this case, the “process” interface should not wait until the data is returned. Instead, process_data sets up a batch request and forwards that on to the waveform processing system, followed by an exit call. The interface code does not wait for input to come back another instance of process_data will not be invoked later to gather the output.

So how is the output retrieved? The answer is with the use of a datagram message sent from the waveform processing system called by process_data. When NetDC receives the message DATA::LOCALDONE, it immediately triggers the function local_data_receive() where the output data file is retrieved and the contents are copied to the NetDC request directory of the intended recipient.

The datagram is formatted like the example below:

%%ACTION DATA::LOCALDONE
.HUB_ID NCEDC:Feb_25,22:07:30:17501	<this is the ID for the request>
.NAME Bill_Mantle				<the name of the user>
.EMAIL bmantle@upthrust.seismolab.edu	 <the user's email address>
.LABEL Sample_01				<the label for the request>
.DISPOSITION_USER				<how the user wants to receive data>
.DISPOSITION_HUB				<how to merge data to the hub site>
.FILENAME /usr/local/processing/netdc/local_data.Feb_25.001	<name of data file>
.END						<always include an END terminator>

In instances where the waveform information system cannot produce any data matching the request specifications, the datagram DATA::NODATA can be sent instead. This datagram contains the same information as LOCALDONE, but results in notification to the user that no data could be produced. As for specifying a FILENAME in the datagram, it is safe to use a dummy reference, since no attempt will be made to extract the information from the listed filename.

A final emphasis should be made to the reader to examine the code provided in the distribution as pertains to the local functions and the interface functions. Become familiar with how they are intended to perform before starting down the road of writing the interface code. Writing your interface from an informed standpoint will help avoid running into development difficulties.

12.0 GENERAL HOUSEKEEPING AND TROUBLESHOOTING

NetDC is code that runs in the background constantly. It is meant to be automated and involve little if any manual work from the data center administrators. However errors will occur, processing will cease, machines will go down, and disks will fill up. It is this reason that NetDC administrators should keep an eye on NetDC’s functioning and to clean up after it when things go wrong.

As NetDC is set into full operation, some processes need to be put “on the clock”. That is to say that there are a few programs that need to be put into “cron”. As user “netdc” you can edit your cron table with the command crontab -e. On some systems, such as Sun Solaris® , you might have to specify the EDITOR environment variable with the name of a text editor that you typically use, such as “vi”.

In the cron table, you’ll want to put in the following references. Note that this just represents examples. Each installation will have different pathnames and time settings. Refer to the man pages on “cron” and “crontab” if you do not understand these entries.

First:

0 * * * * /users/netdc/bin/shipment_handler

shipment_handler will pass through each NetDC request directory and check two different things. If the request is flagged as having all data products shipped to the user, then the directory is deleted. If the user has no other request directories, the directory named after the user is deleted as well. shipment_handler also checks for user directories where data merges are in progress. If the data merge has taken longer than the requested number of days to complete, a shipment is forced on the data present in the directory and the request is flagged as having all products shipped.

There may be occasions where shipment_handler overlooks old request directories that never completed processing. The administrator should assess whether those old directories should be deleted manually.

Finally:

30 1 * * * find /users/netdc/tmp -mtime +1 -name 'netdc.*'-exec rm -f {} \;

This represents an example cleanup command. By using find with the -mtime parameter, it is possible to remove files that are older than a certain number of days. In the above example, the filename filter -name is also employed, so that only files in the directory “/users/netdc/tmp” with the name pattern “netdc.*” are removed after they are one day old. Notice the execute parameter at the end of the expression that carries out the removal. For removing old directories, use rm -rf instead.

It is recommended that this removal technique be employed with the FTP directory where NetDC writes its shipments. That way, shipment files can remain in the FTP directory for a few days before being removed to free up disk space.

When NetDC encounters errors, it will more often than not return an ERROR message and an error code. These messages can sometimes be helpful, whereas others tend to be more technical and only useful to those who understand the source code. There are two different places that NetDC reports errors.

The first avenue is through email to the NetDC administrator. Sometimes the mail is sent intentionally by NetDC to report some warning, other times, the mail arrives from a non-zero exit code, which is then forwarded to the administrator with a message from the program describing the fatal error or a message from the operating system. In pretty much all of the fatal exit cases the error code reported can be tracked down to a specific section of source code within NetDC. Many times the error code is reported along with the name of the function that failed. A technically willing administrator can look at the source code to see the exact point of the error and troubleshoot from there.

The other place to report errors is through the file netdc_activity.log, which is located in the “tables” directory. This file is not just for logging error reports, but also for displaying NetDC’s current activity, especially as relates to incoming and outgoing messages. Administrators are encouraged to monitor this file to assure NetDC’s proper functioning and also to look there for additional information in the case of a fatal error.

13.0 FUTURE IMPLEMENTATIONS

NetDC is being released in a “functional” form with the idea that future code will be added to make installation and running of NetDC easier, more robust, and increasingly feature rich. Some of the ideas proposed are as follows:

  • interface NetDC with Portable Data Collection Center (PDCC) software to create a complete request processing package
  • implement more intelligent forms of routing, perhaps involving deeper routing layers (national data centers to regional data centers to installation sites)
  • implement processing for more kinds of data types
  • support different file formats of waveform data output
  • easier forms of NetDC configuration that can be changed without recompiling code
  • support conversion from other data request types
  • create interface tools to NetDC for administrative and request functions
  • use sockets protocol for messaging instead of email

Note that the above are only proposals and may or may not be implemented in follow-up versions of NetDC. User feedback will also contribute to determining the direction that NetDC takes in future developmental cycles.

14.0 CONCLUSION

As the scope of seismic data increases and the Internet becomes a more pervasive environment for information interchange, the expansion of request processing to a distributed network becomes the next logical step toward making seismic data easily and widely available. NetDC represents just such a distributed system and will reduce the need for data centers to “mirror” each other in order to increase data accessibility. Each of the data centers in the NetDC interchange will be able to provide their clients with data that is even more up-to-date and of greater coverage than they were able to provide before. This results in less maintenance demands of any one data center since their responsibility has now been narrowed to a smaller, unique set of seismic networks.

A request format has been developed for user access to networked data that is both easy to use and flexible. The method of processing the request is fairly straightforward and melds well with current request processing methods. The intention of the NetDC system is for it to sit as a layer on top of the local data center’s processing system while it also carries out the duties of delegating requests to many sites and gathering that data to forward to the user. By using email as the primary means of interchange, technical complexity of the system is reduced while robustness of data exchange is retained.

It is hoped that participation in the NetDC program will foster a new level of cooperation and involvement between data centers located around the globe. Users and data centers alike will benefit from the interconnectedness that NetDC provides.

APPENDIX A: SUMMARY OF NETDC DATAGRAMS

general format:

%%ACTION <class>::<method>	<identifying header>
.PARAM1 VALUE1			<begin parameter list here>
.PARAM2 VALUE2
.
.
.PARAMn VALUEn
.END				<always put an END tag after params>
<optional data may follow>
class: INV or RESP or DATA
	method: SHIPRDY or RCVRDY or NOMERGE or SHIPMENT or RCVOK or RESEND (used for data  merging)  
		parameters:	.HUB_ID			<ID tag assigned by NetDC>
				.NAME			<name of requesting user>
   				.EMAIL			<email of requesting user>
				.DELEGATE		<data center name of delegate site>
   				.SHIPTO			<email of center receiving shipment>
   				.REPLYTO		<email of center sending shipment>
   				.SIZE			<size of shipment in kilobytes>
				.MEDIA			<selected shipment method>
   				.DISPOSITION_USER	<shipment mode to user>
 				.DISPOSITION_HUB	<shipment mode to hub site>
				.FILENAME		<name of file being shipped>
   				.LABEL			<label assigned to this shipment>
   				.RESEND			<present if a resend is requested>
		data: (for SHIPMENT)ASCII text data of shipment if applicable
 	method: LOCALDONE or NODATA(for delayed data processing notification)
		parameters: 	.HUB_ID			<ID tag assigned by NetDC
				.NAME			<name of requesting user>
				.EMAIL			<email of requesting user>
				.LABEL			<label of data product>
				.DISPOSITION_USER	<shipment mode to user>
				.DISPOSITION_HUB	<shipment mode to hub site>
				.FILENAME		<name of file being shipped>
		data: none

Release date:     Modified date:

04:51:26 v.b3198453