SAGE: Thread: includeavailability

Started: 2014-11-26 21:48:00

Last activity: 2014-12-02 22:29:29

Topics: Web Services

Joachim Saul

2014-11-26 21:48:00

Hello,

how frequently is the availability information updated at the IRIS
server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream

<DataAvailability><Extent start="1998-10-26T20:35:58"
end="2014-11-25T12:00:00"/></DataAvailability>

which ends about a day ago. However, the data are there (of course) and
I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00

In other words the availability information is not in sync with the
actual data holdings. Not just for IU.ANMO but for many more
stations/networks. Is this intentional? It currently prevents me from
making use of the otherwise very useful availability info when
requesting data from a few hours ago.

Regards
Joachim

Chad Trabant

Re: includeavailability

2014-11-26 20:25:57

Hi Joachim,

The data availability sub-system at the DMC is updated every 12 hours for archive data. Data is added to the archive (i.e. from the real time collection system) on a regular basis and is usually all in place ~18 hours after arriving, often sooner, but it varies depending on system load. The data availability you see our our fdsnws-station service is determined by the combination of those activities.

To report the near real-time data as available when setting matchtimeseries=TRUE, our service assumes that data that has been archived within the last 36 hours is still flowing into the data center. This is documented in the Data Availability section of the Help for the service: http://service.iris.edu/fdsnws/station/docs/1/help/

So yes, the data availably information is not in perfect sync with the actual availability, it never will be of course: (in the extreme) by the time you receive and parse the response from our service even the real time information is likely out of date. So it's really a question of how good is good enough. Frankly, we have found that the 'matchtimeseries' parameter with the 36 hour rule for near real time data is sufficient for most needs.

If you just want to know what data is available a couple of hours ago, consider using matchtimeseries with a correct time window, and then assume any returned metadata is for channels that have data available. It'll be pretty darn close to correct.

We would consider putting effort into improving the near real-time data availability information if there is enough need, so I'd be happy to hear from folks that need such information. Possibly, we could find alternatives, like checking our real-time SeedLink export system, for many use cases.

Chad

On Nov 26, 2014, at 4:48 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

Hello,

how frequently is the availability information updated at the IRIS server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream

<DataAvailability><Extent start="1998-10-26T20:35:58" end="2014-11-25T12:00:00"/></DataAvailability>

which ends about a day ago. However, the data are there (of course) and I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00

In other words the availability information is not in sync with the actual data holdings. Not just for IU.ANMO but for many more stations/networks. Is this intentional? It currently prevents me from making use of the otherwise very useful availability info when requesting data from a few hours ago.

Regards
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
- Joachim Saul
  
  Re: includeavailability
  
  2014-11-28 00:43:53
  
  Hi Chad
  
  Chad Trabant wrote on 11/26/14 21:25:
  
  The data availability sub-system at the DMC is updated every 12 hours
  for archive data. Data is added to the archive (i.e. from the real time
  collection system) on a regular basis and is usually all in place ~18
  hours after arriving, often sooner, but it varies depending on system
  load. The data availability you see our our fdsnws-station service is
  determined by the combination of those activities.
  
  To report the near real-time data as available when setting
  matchtimeseries=TRUE, our service assumes that data that has been
  archived within the last 36 hours is still flowing into the data center.
  This is documented in the Data Availability section of the Help for
  the service: http://service.iris.edu/fdsnws/station/docs/1/help/
  
  Indeed, thanks for pointing that out. It is now clear that this
  behaviour is not a mistake, so I can work around it in my client.
  
  On the other hand, quoting from the documentation:
  
  "Extents are not modified in real-time. The archive will likely be out
  of sync by up to a day, meaning:
  
  The service assumes that if channel data was archived within the
  last 36 hours, then data for the last 36 hours is available."
  
  Based on these assumptions and considering that any end time within the
  last 36 hours is equivalent to "this stream is probably producing
  near-real-time data now", wouldn't it be safe to set the end time of the
  availability time span to something very far in the future? I do
  understand that it may look odd to claim availability of not yet
  recorded data... but a similar trick is already used for the end times
  in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for
  the IU network. One might also leave the end time unset to indicate
  "open end".
  
  This would eliminate the need for additional logic at the client code.
  Implementation of such a logic requires some knowledge of the
  operational internals of the particular server, which may also be
  subject to changes.
  
  As an additional benefit the future end time would only require an
  update in the case of a prolonged data outage.
  
  Just an idea.
  
  Cheers
  Joachim
  - Chad Trabant
    
    Re: includeavailability
    
    2014-12-02 22:29:29
    
    Hi Joachim,
    
    On Nov 27, 2014, at 7:43 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hi Chad
    
    Chad Trabant wrote on 11/26/14 21:25:
    
    The data availability sub-system at the DMC is updated every 12 hours
    for archive data. Data is added to the archive (i.e. from the real time
    collection system) on a regular basis and is usually all in place ~18
    hours after arriving, often sooner, but it varies depending on system
    load. The data availability you see our our fdsnws-station service is
    determined by the combination of those activities.
    
    To report the near real-time data as available when setting
    matchtimeseries=TRUE, our service assumes that data that has been
    archived within the last 36 hours is still flowing into the data center.
    This is documented in the Data Availability section of the Help for
    the service: http://service.iris.edu/fdsnws/station/docs/1/help/
    
    Indeed, thanks for pointing that out. It is now clear that this behaviour is not a mistake, so I can work around it in my client.
    
    On the other hand, quoting from the documentation:
    
    "Extents are not modified in real-time. The archive will likely be out of sync by up to a day, meaning:
    
    The service assumes that if channel data was archived within the last 36 hours, then data for the last 36 hours is available."
    
    Based on these assumptions and considering that any end time within the last 36 hours is equivalent to "this stream is probably producing near-real-time data now", wouldn't it be safe to set the end time of the availability time span to something very far in the future? I do understand that it may look odd to claim availability of not yet recorded data... but a similar trick is already used for the end times in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for the IU network. One might also leave the end time unset to indicate "open end".
    
    Not a bad idea. But then we could not communicate what we are certain to be a valid latest time, i.e. the archive holdings.
    
    This would eliminate the need for additional logic at the client code. Implementation of such a logic requires some knowledge of the operational internals of the particular server, which may also be subject to changes.
    
    If the station service is used by a client to prepare for a time series request, which I believe is very common, the 'matchtimeseries' solves the issue. In effect, our fdsnws-station service is already doing the time window matching logic when using that option, the client does not need to do this again. In fact, the client does not even need to request the availability information, just set matchtimeseries=true and assume whatever comes back intersects with data availably (this is exactly what our internal data extraction routine is doing).
    
    Of course there are other use cases for data availability where you might need to know the actual date-times. In my opinion, your proposed change would be handier for some but at the cost of others. If you described what you were doing we can keep it in mind for consideration.
    
    Chad

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: includeavailability

Connect