Thread: requesting full day using ws_bulkdataselect

Started: 2012-01-27 00:24:07
Last activity: 2012-01-28 18:23:43
Topics: Web Services
John West
2012-01-27 00:24:07
Hi.

I'm retrieving continuous data using bulkdataselect, one day at a time. A
typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
2007-03-30T23:59:59.999"

Using this method, I occasionally miss one or two samples at the day
boundaries. I'm under the impression that the DMC internals make it more
efficient to request on day boundaries. How do you recommend I do this to
keep the data continuous and not miss samples at the day boundaries?

Thanks!

-- John

  • Doug Neuhauser
    2012-01-26 07:19:02
    One suggestion (which unfortunately changes the IRIS web service
    time specification) is for time intervals to be half-open invervals,
    represented in math notation as
    [time1, time2)
    This means the time interval where time t >= time1 and t < time2.

    I believe that all IRIS services currently defined a closed interval
    [time1, time2] which means the time interval where
    time t >= time1 and <= time2.

    Closed intervals make it very hard to request a series of
    requests whose results can be concatenated to generate a
    contiguous timeseries with no overlap. For day requests,
    2 request for:
    2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
    2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
    will contains 2 copies of a sample whose timestamp is 2007-03-04T00:00:00.0000
    However, if the requests are open intervals, you will never miss a sample or
    get a duplicate sample at a request boundary.

    If you are missing 1 sample at a day boundary, it could be that you are missing a
    sample timestamped between 59.999 and 00.000 seconds. If you are missing
    more than one sample, there is either a timetear (or gap) in the timeseries,
    or there is a problem with the IRIS web service.

    - Doug N

    On 1/25/12 10:24 PM, John D. West wrote:
    Hi.

    I'm retrieving continuous data using bulkdataselect, one day at a time. A typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00 2007-03-30T23:59:59.999"

    Using this method, I occasionally miss one or two samples at the day boundaries. I'm under the impression that the DMC internals make it more efficient to request on day boundaries. How do you recommend I do this to keep the data continuous and not miss samples at the day boundaries?

    Thanks!

    -- John


    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices


    --
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)

    • John West
      2012-01-27 04:01:04
      Thanks, Doug.

      That's exactly correct, in one case I'm missing a sample which should have
      been at 23:59.999998. I was trying to avoid crossing day boundaries because
      I understood that was more intensive processing on the DMC end. My process
      for stitching together traces can handle overlap, so the brute force method
      would be to just request to 00:00:01 the next day. I'd like to know if
      there is a more efficient way.

      -- John


      On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>wrote:

      One suggestion (which unfortunately changes the IRIS web service
      time specification) is for time intervals to be half-open invervals,
      represented in math notation as
      [time1, time2)
      This means the time interval where time t >= time1 and t < time2.

      I believe that all IRIS services currently defined a closed interval
      [time1, time2] which means the time interval where
      time t >= time1 and <= time2.

      Closed intervals make it very hard to request a series of
      requests whose results can be concatenated to generate a
      contiguous timeseries with no overlap. For day requests,
      2 request for:
      2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
      2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
      will contains 2 copies of a sample whose timestamp is
      2007-03-04T00:00:00.0000
      However, if the requests are open intervals, you will never miss a sample
      or
      get a duplicate sample at a request boundary.

      If you are missing 1 sample at a day boundary, it could be that you are
      missing a
      sample timestamped between 59.999 and 00.000 seconds. If you are missing
      more than one sample, there is either a timetear (or gap) in the
      timeseries,
      or there is a problem with the IRIS web service.

      - Doug N


      On 1/25/12 10:24 PM, John D. West wrote:

      Hi.

      I'm retrieving continuous data using bulkdataselect, one day at a time. A
      typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
      2007-03-30T23:59:59.999"

      Using this method, I occasionally miss one or two samples at the day
      boundaries. I'm under the impression that the DMC internals make it more
      efficient to request on day boundaries. How do you recommend I do this to
      keep the data continuous and not miss samples at the day boundaries?

      Thanks!

      -- John


      ______________________________**_________________
      webservices mailing list
      webservices<at>iris.washington.**edu <webservices<at>iris.washington.edu>
      http://www.iris.washington.**edu/mailman/listinfo/**webserviceshttp://www.iris.washington.edu/mailman/listinfo/webservices



      --
      Doug Neuhauser University of California, Berkeley
      doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
      Office: 510-642-0931 215 McCone Hall # 4760
      Fax: 510-643-5811 Berkeley, CA 94720-4760
      Remote: 530-752-5615 (Wed,Fri)


      • Philip Crotwell
        2012-01-26 17:08:27
        Hi

        Not sure if this is related, but back in November I identified a bug
        where a request that asked for data till 12.999 seconds would only get
        data up to 12.000, so there would be almost a second of data missing
        within the request window.

        Chad said that this would be addressed in the next release of the web
        services, but I am not sure if that has happened or not. Chad, can you
        let us know the status of this bug fix?

        One thing I have done in the past to get continuous data is to make
        the request for the next window begin at end time of the data returned
        by the previous request. So if you ask for a day and get data ending
        at 23:59:51.234 then the request for the next day would start at
        23:59:51.235. Obviously requires some bookkeeping, but might be more
        efficient than asking for a really small time window before moving to
        the next day and also makes it pretty likely that you will not miss
        data.

        Philip

        On Thu, Jan 26, 2012 at 5:01 AM, John D. West <john.d.west<at>asu.edu> wrote:
        Thanks, Doug.

        That's exactly correct, in one case I'm missing a sample which should have
        been at 23:59.999998. I was trying to avoid crossing day boundaries because
        I understood that was more intensive processing on the DMC end. My process
        for stitching together traces can handle overlap, so the brute force method
        would be to just request to 00:00:01 the next day. I'd like to know if there
        is a more efficient way.

        -- John



        On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>
        wrote:

        One suggestion (which unfortunately changes the IRIS web service
        time specification) is for time intervals to be half-open invervals,
        represented in math notation as
        [time1, time2)
        This means the time interval where time t >= time1 and t < time2.

        I believe that all IRIS services currently defined a closed interval
        [time1, time2] which means the time interval where
        time t  >= time1 and <= time2.

        Closed intervals make it very hard to request a series of
        requests whose results can be concatenated to generate a
        contiguous timeseries with no overlap.  For day requests,
        2 request for:
        2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
        2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
        will contains 2 copies of a sample whose timestamp is
        2007-03-04T00:00:00.0000
        However, if the requests are open intervals, you will never miss a sample
        or
        get a duplicate sample at a request boundary.

        If you are missing 1 sample at a day boundary, it could be that you are
        missing a
        sample timestamped between 59.999 and 00.000 seconds.  If you are missing
        more than one sample, there is either a timetear (or gap) in the
        timeseries,
        or there is a problem with the IRIS web service.

        - Doug N


        On 1/25/12 10:24 PM, John D. West wrote:

        Hi.

        I'm retrieving continuous data using bulkdataselect, one day at a time. A
        typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00
        2007-03-30T23:59:59.999"

        Using this method, I occasionally miss one or two samples at the day
        boundaries. I'm under the impression that the DMC internals make it more
        efficient to request on day boundaries. How do you recommend I do this to
        keep the data continuous and not miss samples at the day boundaries?

        Thanks!

        -- John


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices



        --
        Doug Neuhauser                  University of California, Berkeley
        doug<at>seismo.berkeley.edu        Berkeley Seismological Laboratory
        Office: 510-642-0931            215 McCone Hall # 4760
        Fax:    510-643-5811            Berkeley, CA  94720-4760
        Remote: 530-752-5615 (Wed,Fri)



        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices



      • Chad Trabant
        2012-01-28 18:23:43

        Hi John,

        The times are inclusive, as you've figured out, such that that any sample occurring on the time specified will be included in output. We do not anticipate changing this logic.

        Your request is exactly the right way to select a whole day, the problem is that the service currently only supports millisecond resolution. We will be updating the service to support microsecond resolution to match the resolution supported by the underlying miniSEED format. After which you should be able to request, for example:

        start: 2007-03-30T00:00:00.000000
        end: 2007-03-30T23:59:59.999999

        and always get the entire day. Thanks for bringing this to light. It'll might be a week or two before we roll out an update.

        Request time ranges using the day boundaries would work, but this is not preferable because you must deal with any overlap and it does add a bit more load for the DMC request mechanism.

        Chad


        On Jan 26, 2012, at 2:01 AM, John D. West wrote:

        Thanks, Doug.

        That's exactly correct, in one case I'm missing a sample which should have been at 23:59.999998. I was trying to avoid crossing day boundaries because I understood that was more intensive processing on the DMC end. My process for stitching together traces can handle overlap, so the brute force method would be to just request to 00:00:01 the next day. I'd like to know if there is a more efficient way.

        -- John


        On Thu, Jan 26, 2012 at 5:19 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
        One suggestion (which unfortunately changes the IRIS web service
        time specification) is for time intervals to be half-open invervals,
        represented in math notation as
        [time1, time2)
        This means the time interval where time t >= time1 and t < time2.

        I believe that all IRIS services currently defined a closed interval
        [time1, time2] which means the time interval where
        time t >= time1 and <= time2.

        Closed intervals make it very hard to request a series of
        requests whose results can be concatenated to generate a
        contiguous timeseries with no overlap. For day requests,
        2 request for:
        2007-03-03T00:00:00.0000 to 2007-03-04T00:00:00.0000
        2007-03-04T00:00:00.0000 to 2007-03-05T00:00:00.0000
        will contains 2 copies of a sample whose timestamp is 2007-03-04T00:00:00.0000
        However, if the requests are open intervals, you will never miss a sample or
        get a duplicate sample at a request boundary.

        If you are missing 1 sample at a day boundary, it could be that you are missing a
        sample timestamped between 59.999 and 00.000 seconds. If you are missing
        more than one sample, there is either a timetear (or gap) in the timeseries,
        or there is a problem with the IRIS web service.

        - Doug N


        On 1/25/12 10:24 PM, John D. West wrote:
        Hi.

        I'm retrieving continuous data using bulkdataselect, one day at a time. A typical request line might look like "TA X16A -- BHZ 2007-03-30T00:00:00 2007-03-30T23:59:59.999"

        Using this method, I occasionally miss one or two samples at the day boundaries. I'm under the impression that the DMC internals make it more efficient to request on day boundaries. How do you recommend I do this to keep the data continuous and not miss samples at the day boundaries?

        Thanks!

        -- John


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        --
        Doug Neuhauser University of California, Berkeley
        doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
        Office: 510-642-0931 215 McCone Hall # 4760
        Fax: 510-643-5811 Berkeley, CA 94720-4760
        Remote: 530-752-5615 (Wed,Fri)

        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


15:51:29 v.c03ec7af