Thread: default in bulkdataselect

Started: 2012-08-29 15:40:29
Last activity: 2012-09-04 20:50:37
Topics: Web Services
Philip Crotwell
2012-08-29 15:40:29
Hi

What are the default values for minimumlength and longestonly? I am
guessing 0 and false, but the docs don't say.
http://www.iris.edu/ws/bulkdataselect/

Also, have you found a performance increase with bulkdataselect over
dataselect for large miniseed downloads?

thanks
Philip

  • Chad Trabant
    2012-08-29 15:51:54

    Hi,

    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

    Chad

    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

    Hi

    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/

    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?

    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices



    • Philip Crotwell
      2012-08-30 23:57:14
      HI Chad

      That is really useful information, thanks.

      I am finding one interesting thing. If I request the same data twice,
      the first time takes about twice as long as the second. I assume there
      is some caching going on somewhere in your systems.

      I have done some experiments here and am finding no difference between
      bulk and dataselect. The biggest difference is with request time
      window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
      hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
      will be contiguous on your system and there is socket and other
      overhead.

      Might do some more playing, but seems a wash from the outside
      perspective for at least single channel single time window requests.

      thanks
      Philip

      On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

      Hi,

      Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

      Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

      A bit more explanation:
      For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

      Chad

      On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

      Hi

      What are the default values for minimumlength and longestonly? I am
      guessing 0 and false, but the docs don't say.
      http://www.iris.edu/ws/bulkdataselect/

      Also, have you found a performance increase with bulkdataselect over
      dataselect for large miniseed downloads?

      thanks
      Philip
      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices


      • Bruce Weertman
        2012-08-30 21:43:51
        Philip:

        The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.

        You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
        archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
        reads the data into a buffer and holds it there for some period of time.

        -Bruce

        On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:

        HI Chad

        That is really useful information, thanks.

        I am finding one interesting thing. If I request the same data twice,
        the first time takes about twice as long as the second. I assume there
        is some caching going on somewhere in your systems.

        I have done some experiments here and am finding no difference between
        bulk and dataselect. The biggest difference is with request time
        window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
        hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
        will be contiguous on your system and there is socket and other
        overhead.

        Might do some more playing, but seems a wash from the outside
        perspective for at least single channel single time window requests.

        thanks
        Philip

        On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

        Hi,

        Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

        Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

        A bit more explanation:
        For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

        Chad

        On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

        Hi

        What are the default values for minimumlength and longestonly? I am
        guessing 0 and false, but the docs don't say.
        http://www.iris.edu/ws/bulkdataselect/

        Also, have you found a performance increase with bulkdataselect over
        dataselect for large miniseed downloads?

        thanks
        Philip
        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices

        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices



        • Philip Crotwell
          2012-09-04 17:52:51
          Hi Bruce and Chad

          Turns out there was a bug in my code, and my tests of the
          bulkdataselect were actually using regular old dataselect, hence the
          very similar numbers. :(

          However, now that I really am using bulkdataselect, I am finding that
          it is slower, by about a factor of 1.5 to 2.5. This seems surprising
          given your information that bulkdataselect avoided hitting a disk on
          your end. For example here are some numbers asking for IU ANMO 00 BHZ
          starting at 2012-09-01T00:00:00:

          dataselect bulk
          10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
          1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
          10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s

          Each of these is average of 10 runs, each request separated by 2 seconds.

          Any thoughts on why this would be the case? Basically I am trying to
          decide which of these two to use by default, and from what I can see
          dataselect is the winner.

          Of course these are only cases where it is a single channel, so maybe
          bulkdataselect becomes a winner when there are many channels and time
          windows.

          thanks
          Philip

          On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
          <bruce<at>iris.washington.edu> wrote:
          Philip:

          The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.

          You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
          archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
          reads the data into a buffer and holds it there for some period of time.

          -Bruce

          On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:

          HI Chad

          That is really useful information, thanks.

          I am finding one interesting thing. If I request the same data twice,
          the first time takes about twice as long as the second. I assume there
          is some caching going on somewhere in your systems.

          I have done some experiments here and am finding no difference between
          bulk and dataselect. The biggest difference is with request time
          window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
          hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
          will be contiguous on your system and there is socket and other
          overhead.

          Might do some more playing, but seems a wash from the outside
          perspective for at least single channel single time window requests.

          thanks
          Philip

          On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

          Hi,

          Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

          Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

          A bit more explanation:
          For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

          Chad

          On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

          Hi

          What are the default values for minimumlength and longestonly? I am
          guessing 0 and false, but the docs don't say.
          http://www.iris.edu/ws/bulkdataselect/

          Also, have you found a performance increase with bulkdataselect over
          dataselect for large miniseed downloads?

          thanks
          Philip
          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices

          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          • Bruce Weertman
            2012-09-04 15:51:45
            Philip:

            Performance testing can be a little tricky.

            Are you by any chance making the same call(s) to ws-dataselect more than once?

            The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
            will normally be much faster.

            Other than that, I would be surprised to see on method being a lot faster than the other.

            -B


            On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:

            Hi Bruce and Chad

            Turns out there was a bug in my code, and my tests of the
            bulkdataselect were actually using regular old dataselect, hence the
            very similar numbers. :(

            However, now that I really am using bulkdataselect, I am finding that
            it is slower, by about a factor of 1.5 to 2.5. This seems surprising
            given your information that bulkdataselect avoided hitting a disk on
            your end. For example here are some numbers asking for IU ANMO 00 BHZ
            starting at 2012-09-01T00:00:00:

            dataselect bulk
            10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
            1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
            10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s

            Each of these is average of 10 runs, each request separated by 2 seconds.

            Any thoughts on why this would be the case? Basically I am trying to
            decide which of these two to use by default, and from what I can see
            dataselect is the winner.

            Of course these are only cases where it is a single channel, so maybe
            bulkdataselect becomes a winner when there are many channels and time
            windows.

            thanks
            Philip

            On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
            <bruce<at>iris.washington.edu> wrote:
            Philip:

            The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.

            You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
            archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
            reads the data into a buffer and holds it there for some period of time.

            -Bruce

            On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:

            HI Chad

            That is really useful information, thanks.

            I am finding one interesting thing. If I request the same data twice,
            the first time takes about twice as long as the second. I assume there
            is some caching going on somewhere in your systems.

            I have done some experiments here and am finding no difference between
            bulk and dataselect. The biggest difference is with request time
            window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
            hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
            will be contiguous on your system and there is socket and other
            overhead.

            Might do some more playing, but seems a wash from the outside
            perspective for at least single channel single time window requests.

            thanks
            Philip

            On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

            Hi,

            Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

            Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

            A bit more explanation:
            For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

            Chad

            On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

            Hi

            What are the default values for minimumlength and longestonly? I am
            guessing 0 and false, but the docs don't say.
            http://www.iris.edu/ws/bulkdataselect/

            Also, have you found a performance increase with bulkdataselect over
            dataselect for large miniseed downloads?

            thanks
            Philip
            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices


            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices

            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices


            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices

            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices



            • Philip Crotwell
              2012-09-04 20:50:37
              Hi

              I am doing the same request 10 times in a row, so I guess that
              explains it. Yep, performance testing can be VERY tricky. I guess I
              should increment the request each time.

              Doing that, I am seeing more variation within runs than between the
              two services. Sometimes one is faster, sometimes the other, but no
              clear winner. Ah well, I will just assume they are roughly the same
              and use whichever seems easiest at the time.

              thanks
              Philip


              On Tue, Sep 4, 2012 at 11:51 AM, Bruce Weertman
              <bruce<at>iris.washington.edu> wrote:
              Philip:

              Performance testing can be a little tricky.

              Are you by any chance making the same call(s) to ws-dataselect more than once?

              The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
              will normally be much faster.

              Other than that, I would be surprised to see on method being a lot faster than the other.

              -B


              On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:

              Hi Bruce and Chad

              Turns out there was a bug in my code, and my tests of the
              bulkdataselect were actually using regular old dataselect, hence the
              very similar numbers. :(

              However, now that I really am using bulkdataselect, I am finding that
              it is slower, by about a factor of 1.5 to 2.5. This seems surprising
              given your information that bulkdataselect avoided hitting a disk on
              your end. For example here are some numbers asking for IU ANMO 00 BHZ
              starting at 2012-09-01T00:00:00:

              dataselect bulk
              10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
              1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
              10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s

              Each of these is average of 10 runs, each request separated by 2 seconds.

              Any thoughts on why this would be the case? Basically I am trying to
              decide which of these two to use by default, and from what I can see
              dataselect is the winner.

              Of course these are only cases where it is a single channel, so maybe
              bulkdataselect becomes a winner when there are many channels and time
              windows.

              thanks
              Philip

              On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
              <bruce<at>iris.washington.edu> wrote:
              Philip:

              The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.

              You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
              archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
              reads the data into a buffer and holds it there for some period of time.

              -Bruce

              On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:

              HI Chad

              That is really useful information, thanks.

              I am finding one interesting thing. If I request the same data twice,
              the first time takes about twice as long as the second. I assume there
              is some caching going on somewhere in your systems.

              I have done some experiments here and am finding no difference between
              bulk and dataselect. The biggest difference is with request time
              window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
              hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
              will be contiguous on your system and there is socket and other
              overhead.

              Might do some more playing, but seems a wash from the outside
              perspective for at least single channel single time window requests.

              thanks
              Philip

              On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

              Hi,

              Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

              Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

              A bit more explanation:
              For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

              Chad

              On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

              Hi

              What are the default values for minimumlength and longestonly? I am
              guessing 0 and false, but the docs don't say.
              http://www.iris.edu/ws/bulkdataselect/

              Also, have you found a performance increase with bulkdataselect over
              dataselect for large miniseed downloads?

              thanks
              Philip
              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices


              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices

              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices


              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices

              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices


              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices


              • Chad Trabant
                2012-09-04 18:00:08

                Hi Philip,

                All things being relatively equal I would suggest using the bulkdataselect service whenever possible as it takes fewer resources at the DMC.

                thanks,
                Chad

                On Sep 4, 2012, at 10:50 AM, Philip Crotwell wrote:

                Hi

                I am doing the same request 10 times in a row, so I guess that
                explains it. Yep, performance testing can be VERY tricky. I guess I
                should increment the request each time.

                Doing that, I am seeing more variation within runs than between the
                two services. Sometimes one is faster, sometimes the other, but no
                clear winner. Ah well, I will just assume they are roughly the same
                and use whichever seems easiest at the time.

                thanks
                Philip


                On Tue, Sep 4, 2012 at 11:51 AM, Bruce Weertman
                <bruce<at>iris.washington.edu> wrote:
                Philip:

                Performance testing can be a little tricky.

                Are you by any chance making the same call(s) to ws-dataselect more than once?

                The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
                will normally be much faster.

                Other than that, I would be surprised to see on method being a lot faster than the other.

                -B


                On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:

                Hi Bruce and Chad

                Turns out there was a bug in my code, and my tests of the
                bulkdataselect were actually using regular old dataselect, hence the
                very similar numbers. :(

                However, now that I really am using bulkdataselect, I am finding that
                it is slower, by about a factor of 1.5 to 2.5. This seems surprising
                given your information that bulkdataselect avoided hitting a disk on
                your end. For example here are some numbers asking for IU ANMO 00 BHZ
                starting at 2012-09-01T00:00:00:

                dataselect bulk
                10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
                1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
                10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s

                Each of these is average of 10 runs, each request separated by 2 seconds.

                Any thoughts on why this would be the case? Basically I am trying to
                decide which of these two to use by default, and from what I can see
                dataselect is the winner.

                Of course these are only cases where it is a single channel, so maybe
                bulkdataselect becomes a winner when there are many channels and time
                windows.

                thanks
                Philip

                On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
                <bruce<at>iris.washington.edu> wrote:
                Philip:

                The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.

                You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
                archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
                reads the data into a buffer and holds it there for some period of time.

                -Bruce

                On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:

                HI Chad

                That is really useful information, thanks.

                I am finding one interesting thing. If I request the same data twice,
                the first time takes about twice as long as the second. I assume there
                is some caching going on somewhere in your systems.

                I have done some experiments here and am finding no difference between
                bulk and dataselect. The biggest difference is with request time
                window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
                hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
                will be contiguous on your system and there is socket and other
                overhead.

                Might do some more playing, but seems a wash from the outside
                perspective for at least single channel single time window requests.

                thanks
                Philip

                On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

                Hi,

                Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

                Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

                A bit more explanation:
                For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

                Chad

                On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

                Hi

                What are the default values for minimumlength and longestonly? I am
                guessing 0 and false, but the docs don't say.
                http://www.iris.edu/ws/bulkdataselect/

                Also, have you found a performance increase with bulkdataselect over
                dataselect for large miniseed downloads?

                thanks
                Philip
                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices


                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices

                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices


                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices

                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices


                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices

                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices



  • Bruce Weertman
    2012-08-29 16:25:13
    Philip:

    You are correct.

    Put it another way, It will, by default, give you any data that it can that fits in the time window.

    Cheers,
    -Bruce

    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

    Hi

    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/

    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?

    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices



12:14:53 v.b4412d20