Hello,
how frequently is the availability information updated at the IRIS
server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream
<DataAvailability><Extent start="1998-10-26T20:35:58"
end="2014-11-25T12:00:00"/></DataAvailability>
which ends about a day ago. However, the data are there (of course) and
I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00
In other words the availability information is not in sync with the
actual data holdings. Not just for IU.ANMO but for many more
stations/networks. Is this intentional? It currently prevents me from
making use of the otherwise very useful availability info when
requesting data from a few hours ago.
Regards
Joachim
how frequently is the availability information updated at the IRIS
server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream
<DataAvailability><Extent start="1998-10-26T20:35:58"
end="2014-11-25T12:00:00"/></DataAvailability>
which ends about a day ago. However, the data are there (of course) and
I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00
In other words the availability information is not in sync with the
actual data holdings. Not just for IU.ANMO but for many more
stations/networks. Is this intentional? It currently prevents me from
making use of the otherwise very useful availability info when
requesting data from a few hours ago.
Regards
Joachim
-
Hi Joachim,
The data availability sub-system at the DMC is updated every 12 hours for archive data. Data is added to the archive (i.e. from the real time collection system) on a regular basis and is usually all in place ~18 hours after arriving, often sooner, but it varies depending on system load. The data availability you see our our fdsnws-station service is determined by the combination of those activities.
To report the near real-time data as available when setting matchtimeseries=TRUE, our service assumes that data that has been archived within the last 36 hours is still flowing into the data center. This is documented in the Data Availability section of the Help for the service: http://service.iris.edu/fdsnws/station/docs/1/help/
So yes, the data availably information is not in perfect sync with the actual availability, it never will be of course: (in the extreme) by the time you receive and parse the response from our service even the real time information is likely out of date. So it's really a question of how good is good enough. Frankly, we have found that the 'matchtimeseries' parameter with the 36 hour rule for near real time data is sufficient for most needs.
If you just want to know what data is available a couple of hours ago, consider using matchtimeseries with a correct time window, and then assume any returned metadata is for channels that have data available. It'll be pretty darn close to correct.
We would consider putting effort into improving the near real-time data availability information if there is enough need, so I'd be happy to hear from folks that need such information. Possibly, we could find alternatives, like checking our real-time SeedLink export system, for many use cases.
Chad
On Nov 26, 2014, at 4:48 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello,
how frequently is the availability information updated at the IRIS server side? If I do now an inventory request using e.g.
http://service.iris.edu/fdsnws/station/1/query?network=IU&station=ANMO&channel=BHZ&location=10&level=channel&includeavailability=TRUE&start=2014-11-26T12:00:00
then I get as availability time span for that stream
<DataAvailability><Extent start="1998-10-26T20:35:58" end="2014-11-25T12:00:00"/></DataAvailability>
which ends about a day ago. However, the data are there (of course) and I can retrieve them using
http://service.iris.edu/fdsnws/dataselect/1/query?network=IU&station=ANMO&channel=BHZ&location=10&start=2014-11-26T12:00:00&end=2014-11-26T12:10:00
In other words the availability information is not in sync with the actual data holdings. Not just for IU.ANMO but for many more stations/networks. Is this intentional? It currently prevents me from making use of the otherwise very useful availability info when requesting data from a few hours ago.
Regards
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi Chad
Chad Trabant wrote on 11/26/14 21:25:
The data availability sub-system at the DMC is updated every 12 hours
Indeed, thanks for pointing that out. It is now clear that this
for archive data. Data is added to the archive (i.e. from the real time
collection system) on a regular basis and is usually all in place ~18
hours after arriving, often sooner, but it varies depending on system
load. The data availability you see our our fdsnws-station service is
determined by the combination of those activities.
To report the near real-time data as available when setting
matchtimeseries=TRUE, our service assumes that data that has been
archived within the last 36 hours is still flowing into the data center.
This is documented in the Data Availability section of the Help for
the service: http://service.iris.edu/fdsnws/station/docs/1/help/
behaviour is not a mistake, so I can work around it in my client.
On the other hand, quoting from the documentation:
"Extents are not modified in real-time. The archive will likely be out
of sync by up to a day, meaning:
The service assumes that if channel data was archived within the
last 36 hours, then data for the last 36 hours is available."
Based on these assumptions and considering that any end time within the
last 36 hours is equivalent to "this stream is probably producing
near-real-time data now", wouldn't it be safe to set the end time of the
availability time span to something very far in the future? I do
understand that it may look odd to claim availability of not yet
recorded data... but a similar trick is already used for the end times
in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for
the IU network. One might also leave the end time unset to indicate
"open end".
This would eliminate the need for additional logic at the client code.
Implementation of such a logic requires some knowledge of the
operational internals of the particular server, which may also be
subject to changes.
As an additional benefit the future end time would only require an
update in the case of a prolonged data outage.
Just an idea.
Cheers
Joachim
-
Hi Joachim,
On Nov 27, 2014, at 7:43 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Not a bad idea. But then we could not communicate what we are certain to be a valid latest time, i.e. the archive holdings.
Hi Chad
Chad Trabant wrote on 11/26/14 21:25:
The data availability sub-system at the DMC is updated every 12 hours
Indeed, thanks for pointing that out. It is now clear that this behaviour is not a mistake, so I can work around it in my client.
for archive data. Data is added to the archive (i.e. from the real time
collection system) on a regular basis and is usually all in place ~18
hours after arriving, often sooner, but it varies depending on system
load. The data availability you see our our fdsnws-station service is
determined by the combination of those activities.
To report the near real-time data as available when setting
matchtimeseries=TRUE, our service assumes that data that has been
archived within the last 36 hours is still flowing into the data center.
This is documented in the Data Availability section of the Help for
the service: http://service.iris.edu/fdsnws/station/docs/1/help/
On the other hand, quoting from the documentation:
"Extents are not modified in real-time. The archive will likely be out of sync by up to a day, meaning:
The service assumes that if channel data was archived within the last 36 hours, then data for the last 36 hours is available."
Based on these assumptions and considering that any end time within the last 36 hours is equivalent to "this stream is probably producing near-real-time data now", wouldn't it be safe to set the end time of the availability time span to something very far in the future? I do understand that it may look odd to claim availability of not yet recorded data... but a similar trick is already used for the end times in other contexts, like e.g. "2500-12-31T23:59:59" as the "end time" for the IU network. One might also leave the end time unset to indicate "open end".
This would eliminate the need for additional logic at the client code. Implementation of such a logic requires some knowledge of the operational internals of the particular server, which may also be subject to changes.
If the station service is used by a client to prepare for a time series request, which I believe is very common, the 'matchtimeseries' solves the issue. In effect, our fdsnws-station service is already doing the time window matching logic when using that option, the client does not need to do this again. In fact, the client does not even need to request the availability information, just set matchtimeseries=true and assume whatever comes back intersects with data availably (this is exactly what our internal data extraction routine is doing).
Of course there are other use cases for data availability where you might need to know the actual date-times. In my opinion, your proposed change would be handier for some but at the cost of others. If you described what you were doing we can keep it in mind for consideration.
Chad
-
-