SAGE: Thread: requesting full day using ws

Started: 2012-01-27 00:24:07

Last activity: 2012-01-28 18:23:43

Topics: Web Services

John West

requesting full day using ws_bulkdataselect

2012-01-27 00:24:07

Hi.

I'm retrieving continuous data using bulkdataselect, one day at a time. A
typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
2007-03-30T23:59:59.999"

Using this method, I occasionally miss one or two samples at the day
boundaries. I'm under the impression that the DMC internals make it more
efficient to request on day boundaries. How do you recommend I do this to
keep the data continuous and not miss samples at the day boundaries?

Thanks!

-- John

Doug Neuhauser

Re: requesting full day using ws_bulkdataselect

2012-01-26 07:19:02

One suggestion (which unfortunately changes the IRIS web service
time specification) is for time intervals to be half-open invervals,
represented in math notation as
[time1, time2)
This means the time interval where time t >= time1 and t < time2.

I believe that all IRIS services currently defined a closed interval
[time1, time2] which means the time interval where
time t >= time1 and <= time2.

Closed intervals make it very hard to request a series of
requests whose results can be concatenated to generate a
contiguous timeseries with no overlap. For day requests,
2 request for:
2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
will contains 2 copies of a sample whose timestamp is 2007-03-04T00:00:00.0000
However, if the requests are open intervals, you will never miss a sample or
get a duplicate sample at a request boundary.

If you are missing 1 sample at a day boundary, it could be that you are missing a
sample timestamped between 59.999 and 00.000 seconds. If you are missing
more than one sample, there is either a timetear (or gap) in the timeseries,
or there is a problem with the IRIS web service.

- Doug N

On 1/25/12 10:24 PM, John D. West wrote:

Hi.

I'm retrieving continuous data using bulkdataselect, one day at a time. A typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00 2007-03-30T23:59:59.999"

Using this method, I occasionally miss one or two samples at the day boundaries. I'm under the impression that the DMC internals make it more efficient to request on day boundaries. How do you recommend I do this to keep the data continuous and not miss samples at the day boundaries?

Thanks!

-- John

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices

--
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
- John West
  
  Re: requesting full day using ws_bulkdataselect
  
  2012-01-27 04:01:04
  
  Thanks, Doug.
  
  That's exactly correct, in one case I'm missing a sample which should have
  been at 23:59.999998. I was trying to avoid crossing day boundaries because
  I understood that was more intensive processing on the DMC end. My process
  for stitching together traces can handle overlap, so the brute force method
  would be to just request to 00:00:01 the next day. I'd like to know if
  there is a more efficient way.
  
  -- John
  
  On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>wrote:
  
  One suggestion (which unfortunately changes the IRIS web service
  time specification) is for time intervals to be half-open invervals,
  represented in math notation as
  [time1, time2)
  This means the time interval where time t >= time1 and t < time2.
  
  I believe that all IRIS services currently defined a closed interval
  [time1, time2] which means the time interval where
  time t >= time1 and <= time2.
  
  Closed intervals make it very hard to request a series of
  requests whose results can be concatenated to generate a
  contiguous timeseries with no overlap. For day requests,
  2 request for:
  2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
  2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
  will contains 2 copies of a sample whose timestamp is
  2007-03-04T00:00:00.0000
  However, if the requests are open intervals, you will never miss a sample
  or
  get a duplicate sample at a request boundary.
  
  If you are missing 1 sample at a day boundary, it could be that you are
  missing a
  sample timestamped between 59.999 and 00.000 seconds. If you are missing
  more than one sample, there is either a timetear (or gap) in the
  timeseries,
  or there is a problem with the IRIS web service.
  
  - Doug N
  
  On 1/25/12 10:24 PM, John D. West wrote:
  
  Hi.
  
  I'm retrieving continuous data using bulkdataselect, one day at a time. A
  typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
  2007-03-30T23:59:59.999"
  
  Using this method, I occasionally miss one or two samples at the day
  boundaries. I'm under the impression that the DMC internals make it more
  efficient to request on day boundaries. How do you recommend I do this to
  keep the data continuous and not miss samples at the day boundaries?
  
  Thanks!
  
  -- John
  
  ______________________________**_________________
  webservices mailing list
  webservices<at>iris.washington.**edu <webservices<at>iris.washington.edu>
  http://www.iris.washington.**edu/mailman/listinfo/**webserviceshttp://www.iris.washington.edu/mailman/listinfo/webservices
  
  --
  Doug Neuhauser University of California, Berkeley
  doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
  Office: 510-642-0931 215 McCone Hall # 4760
  Fax: 510-643-5811 Berkeley, CA 94720-4760
  Remote: 530-752-5615 (Wed,Fri)
  - Philip Crotwell
    
    Re: requesting full day using ws_bulkdataselect
    
    2012-01-26 17:08:27
    
    Hi
    
    Not sure if this is related, but back in November I identified a bug
    where a request that asked for data till 12.999 seconds would only get
    data up to 12.000, so there would be almost a second of data missing
    within the request window.
    
    Chad said that this would be addressed in the next release of the web
    services, but I am not sure if that has happened or not. Chad, can you
    let us know the status of this bug fix?
    
    One thing I have done in the past to get continuous data is to make
    the request for the next window begin at end time of the data returned
    by the previous request. So if you ask for a day and get data ending
    at 23:59:51.234 then the request for the next day would start at
    23:59:51.235. Obviously requires some bookkeeping, but might be more
    efficient than asking for a really small time window before moving to
    the next day and also makes it pretty likely that you will not miss
    data.
    
    Philip
    
    On Thu, Jan 26, 2012 at 5:01 AM, John D. West <john.d.west<at>asu.edu> wrote:
    
    Thanks, Doug.
    
    That's exactly correct, in one case I'm missing a sample which should have
    been at 23:59.999998. I was trying to avoid crossing day boundaries because
    I understood that was more intensive processing on the DMC end. My process
    for stitching together traces can handle overlap, so the brute force method
    would be to just request to 00:00:01 the next day. I'd like to know if there
    is a more efficient way.
    
    -- John
    
    On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>
    wrote:
    
    One suggestion (which unfortunately changes the IRIS web service
    time specification) is for time intervals to be half-open invervals,
    represented in math notation as
    [time1, time2)
    This means the time interval where time t >= time1 and t < time2.
    
    I believe that all IRIS services currently defined a closed interval
    [time1, time2] which means the time interval where
    time t >= time1 and <= time2.
    
    Closed intervals make it very hard to request a series of
    requests whose results can be concatenated to generate a
    contiguous timeseries with no overlap. For day requests,
    2 request for:
    2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
    2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
    will contains 2 copies of a sample whose timestamp is
    2007-03-04T00:00:00.0000
    However, if the requests are open intervals, you will never miss a sample
    or
    get a duplicate sample at a request boundary.
    
    If you are missing 1 sample at a day boundary, it could be that you are
    missing a
    sample timestamped between 59.999 and 00.000 seconds. If you are missing
    more than one sample, there is either a timetear (or gap) in the
    timeseries,
    or there is a problem with the IRIS web service.
    
    - Doug N
    
    On 1/25/12 10:24 PM, John D. West wrote:
    
    Hi.
    
    I'm retrieving continuous data using bulkdataselect, one day at a time. A
    typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
    2007-03-30T23:59:59.999"
    
    Using this method, I occasionally miss one or two samples at the day
    boundaries. I'm under the impression that the DMC internals make it more
    efficient to request on day boundaries. How do you recommend I do this to
    keep the data continuous and not miss samples at the day boundaries?
    
    Thanks!
    
    -- John
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
  - Chad Trabant
    
    Re: requesting full day using ws_bulkdataselect
    
    2012-01-28 18:23:43
    
    Hi John,
    
    The times are inclusive, as you've figured out, such that that any sample occurring on the time specified will be included in output. We do not anticipate changing this logic.
    
    Your request is exactly the right way to select a whole day, the problem is that the service currently only supports millisecond resolution. We will be updating the service to support microsecond resolution to match the resolution supported by the underlying miniSEED format. After which you should be able to request, for example:
    
    start: 2007-03-30T00:00:00.000000
    end: 2007-03-30T23:59:59.999999
    
    and always get the entire day. Thanks for bringing this to light. It'll might be a week or two before we roll out an update.
    
    Request time ranges using the day boundaries would work, but this is not preferable because you must deal with any overlap and it does add a bit more load for the DMC request mechanism.
    
    Chad
    
    On Jan 26, 2012, at 2:01 AM, John D. West wrote:
    
    Thanks, Doug.
    
    That's exactly correct, in one case I'm missing a sample which should have been at 23:59.999998. I was trying to avoid crossing day boundaries because I understood that was more intensive processing on the DMC end. My process for stitching together traces can handle overlap, so the brute force method would be to just request to 00:00:01 the next day. I'd like to know if there is a more efficient way.
    
    -- John
    
    On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
    One suggestion (which unfortunately changes the IRIS web service
    time specification) is for time intervals to be half-open invervals,
    represented in math notation as
    [time1, time2)
    This means the time interval where time t >= time1 and t < time2.
    
    I believe that all IRIS services currently defined a closed interval
    [time1, time2] which means the time interval where
    time t >= time1 and <= time2.
    
    Closed intervals make it very hard to request a series of
    requests whose results can be concatenated to generate a
    contiguous timeseries with no overlap. For day requests,
    2 request for:
    2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
    2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
    will contains 2 copies of a sample whose timestamp is 2007-03-04T00:00:00.0000
    However, if the requests are open intervals, you will never miss a sample or
    get a duplicate sample at a request boundary.
    
    If you are missing 1 sample at a day boundary, it could be that you are missing a
    sample timestamped between 59.999 and 00.000 seconds. If you are missing
    more than one sample, there is either a timetear (or gap) in the timeseries,
    or there is a problem with the IRIS web service.
    
    - Doug N
    
    On 1/25/12 10:24 PM, John D. West wrote:
    Hi.
    
    I'm retrieving continuous data using bulkdataselect, one day at a time. A typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00 2007-03-30T23:59:59.999"
    
    Using this method, I occasionally miss one or two samples at the day boundaries. I'm under the impression that the DMC internals make it more efficient to request on day boundaries. How do you recommend I do this to keep the data continuous and not miss samples at the day boundaries?
    
    Thanks!
    
    -- John
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: requesting full day using ws_bulkdataselect

Connect