SAGE: Thread: default in bulkdataselect

Started: 2012-08-29 15:40:29

Last activity: 2012-09-04 20:50:37

Topics: Web Services

Philip Crotwell

2012-08-29 15:40:29

Hi

What are the default values for minimumlength and longestonly? I am
guessing 0 and false, but the docs don't say.
http://www.iris.edu/ws/bulkdataselect/

Also, have you found a performance increase with bulkdataselect over
dataselect for large miniseed downloads?

thanks
Philip

Chad Trabant

Re: default in bulkdataselect

2012-08-29 15:51:54

Hi,

Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.

Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.

A bit more explanation:
For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.

Chad

On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

Hi

What are the default values for minimumlength and longestonly? I am
guessing 0 and false, but the docs don't say.
http://www.iris.edu/ws/bulkdataselect/

Also, have you found a performance increase with bulkdataselect over
dataselect for large miniseed downloads?

thanks
Philip
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
- Philip Crotwell
  
  Re: default in bulkdataselect
  
  2012-08-30 23:57:14
  
  HI Chad
  
  That is really useful information, thanks.
  
  I am finding one interesting thing. If I request the same data twice,
  the first time takes about twice as long as the second. I assume there
  is some caching going on somewhere in your systems.
  
  I have done some experiments here and am finding no difference between
  bulk and dataselect. The biggest difference is with request time
  window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
  hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
  will be contiguous on your system and there is socket and other
  overhead.
  
  Might do some more playing, but seems a wash from the outside
  perspective for at least single channel single time window requests.
  
  thanks
  Philip
  
  On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
  
  Hi,
  
  Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
  
  Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
  
  A bit more explanation:
  For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
  
  Chad
  
  On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
  
  Hi
  
  What are the default values for minimumlength and longestonly? I am
  guessing 0 and false, but the docs don't say.
  http://www.iris.edu/ws/bulkdataselect/
  
  Also, have you found a performance increase with bulkdataselect over
  dataselect for large miniseed downloads?
  
  thanks
  Philip
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  - Bruce Weertman
    
    Re: default in bulkdataselect
    
    2012-08-30 21:43:51
    
    Philip:
    
    The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.
    
    You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
    archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
    reads the data into a buffer and holds it there for some period of time.
    
    -Bruce
    
    On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
    
    HI Chad
    
    That is really useful information, thanks.
    
    I am finding one interesting thing. If I request the same data twice,
    the first time takes about twice as long as the second. I assume there
    is some caching going on somewhere in your systems.
    
    I have done some experiments here and am finding no difference between
    bulk and dataselect. The biggest difference is with request time
    window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
    hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
    will be contiguous on your system and there is socket and other
    overhead.
    
    Might do some more playing, but seems a wash from the outside
    perspective for at least single channel single time window requests.
    
    thanks
    Philip
    
    On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi,
    
    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
    
    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
    
    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
    
    Chad
    
    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
    
    Hi
    
    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/
    
    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?
    
    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: default in bulkdataselect
    
    2012-09-04 17:52:51
    
    Hi Bruce and Chad
    
    Turns out there was a bug in my code, and my tests of the
    bulkdataselect were actually using regular old dataselect, hence the
    very similar numbers. :(
    
    However, now that I really am using bulkdataselect, I am finding that
    it is slower, by about a factor of 1.5 to 2.5. This seems surprising
    given your information that bulkdataselect avoided hitting a disk on
    your end. For example here are some numbers asking for IU ANMO 00 BHZ
    starting at 2012-09-01T00:00:00:
    
    dataselect bulk
    10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
    1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
    10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s
    
    Each of these is average of 10 runs, each request separated by 2 seconds.
    
    Any thoughts on why this would be the case? Basically I am trying to
    decide which of these two to use by default, and from what I can see
    dataselect is the winner.
    
    Of course these are only cases where it is a single channel, so maybe
    bulkdataselect becomes a winner when there are many channels and time
    windows.
    
    thanks
    Philip
    
    On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.
    
    You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
    archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
    reads the data into a buffer and holds it there for some period of time.
    
    -Bruce
    
    On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
    
    HI Chad
    
    That is really useful information, thanks.
    
    I am finding one interesting thing. If I request the same data twice,
    the first time takes about twice as long as the second. I assume there
    is some caching going on somewhere in your systems.
    
    I have done some experiments here and am finding no difference between
    bulk and dataselect. The biggest difference is with request time
    window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
    hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
    will be contiguous on your system and there is socket and other
    overhead.
    
    Might do some more playing, but seems a wash from the outside
    perspective for at least single channel single time window requests.
    
    thanks
    Philip
    
    On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi,
    
    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
    
    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
    
    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
    
    Chad
    
    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
    
    Hi
    
    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/
    
    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?
    
    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Bruce Weertman
    
    Re: default in bulkdataselect
    
    2012-09-04 15:51:45
    
    Philip:
    
    Performance testing can be a little tricky.
    
    Are you by any chance making the same call(s) to ws-dataselect more than once?
    
    The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
    will normally be much faster.
    
    Other than that, I would be surprised to see on method being a lot faster than the other.
    
    -B
    
    On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:
    
    Hi Bruce and Chad
    
    Turns out there was a bug in my code, and my tests of the
    bulkdataselect were actually using regular old dataselect, hence the
    very similar numbers. :(
    
    However, now that I really am using bulkdataselect, I am finding that
    it is slower, by about a factor of 1.5 to 2.5. This seems surprising
    given your information that bulkdataselect avoided hitting a disk on
    your end. For example here are some numbers asking for IU ANMO 00 BHZ
    starting at 2012-09-01T00:00:00:
    
    dataselect bulk
    10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
    1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
    10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s
    
    Each of these is average of 10 runs, each request separated by 2 seconds.
    
    Any thoughts on why this would be the case? Basically I am trying to
    decide which of these two to use by default, and from what I can see
    dataselect is the winner.
    
    Of course these are only cases where it is a single channel, so maybe
    bulkdataselect becomes a winner when there are many channels and time
    windows.
    
    thanks
    Philip
    
    On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.
    
    You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
    archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
    reads the data into a buffer and holds it there for some period of time.
    
    -Bruce
    
    On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
    
    HI Chad
    
    That is really useful information, thanks.
    
    I am finding one interesting thing. If I request the same data twice,
    the first time takes about twice as long as the second. I assume there
    is some caching going on somewhere in your systems.
    
    I have done some experiments here and am finding no difference between
    bulk and dataselect. The biggest difference is with request time
    window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
    hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
    will be contiguous on your system and there is socket and other
    overhead.
    
    Might do some more playing, but seems a wash from the outside
    perspective for at least single channel single time window requests.
    
    thanks
    Philip
    
    On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi,
    
    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
    
    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
    
    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
    
    Chad
    
    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
    
    Hi
    
    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/
    
    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?
    
    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: default in bulkdataselect
    
    2012-09-04 20:50:37
    
    Hi
    
    I am doing the same request 10 times in a row, so I guess that
    explains it. Yep, performance testing can be VERY tricky. I guess I
    should increment the request each time.
    
    Doing that, I am seeing more variation within runs than between the
    two services. Sometimes one is faster, sometimes the other, but no
    clear winner. Ah well, I will just assume they are roughly the same
    and use whichever seems easiest at the time.
    
    thanks
    Philip
    
    On Tue, Sep 4, 2012 at 11:51 AM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    Performance testing can be a little tricky.
    
    Are you by any chance making the same call(s) to ws-dataselect more than once?
    
    The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
    will normally be much faster.
    
    Other than that, I would be surprised to see on method being a lot faster than the other.
    
    -B
    
    On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:
    
    Hi Bruce and Chad
    
    Turns out there was a bug in my code, and my tests of the
    bulkdataselect were actually using regular old dataselect, hence the
    very similar numbers. :(
    
    However, now that I really am using bulkdataselect, I am finding that
    it is slower, by about a factor of 1.5 to 2.5. This seems surprising
    given your information that bulkdataselect avoided hitting a disk on
    your end. For example here are some numbers asking for IU ANMO 00 BHZ
    starting at 2012-09-01T00:00:00:
    
    dataselect bulk
    10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
    1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
    10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s
    
    Each of these is average of 10 runs, each request separated by 2 seconds.
    
    Any thoughts on why this would be the case? Basically I am trying to
    decide which of these two to use by default, and from what I can see
    dataselect is the winner.
    
    Of course these are only cases where it is a single channel, so maybe
    bulkdataselect becomes a winner when there are many channels and time
    windows.
    
    thanks
    Philip
    
    On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.
    
    You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
    archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
    reads the data into a buffer and holds it there for some period of time.
    
    -Bruce
    
    On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
    
    HI Chad
    
    That is really useful information, thanks.
    
    I am finding one interesting thing. If I request the same data twice,
    the first time takes about twice as long as the second. I assume there
    is some caching going on somewhere in your systems.
    
    I have done some experiments here and am finding no difference between
    bulk and dataselect. The biggest difference is with request time
    window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
    hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
    will be contiguous on your system and there is socket and other
    overhead.
    
    Might do some more playing, but seems a wash from the outside
    perspective for at least single channel single time window requests.
    
    thanks
    Philip
    
    On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi,
    
    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
    
    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
    
    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
    
    Chad
    
    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
    
    Hi
    
    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/
    
    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?
    
    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Chad Trabant
    
    Re: default in bulkdataselect
    
    2012-09-04 18:00:08
    
    Hi Philip,
    
    All things being relatively equal I would suggest using the bulkdataselect service whenever possible as it takes fewer resources at the DMC.
    
    thanks,
    Chad
    
    On Sep 4, 2012, at 10:50 AM, Philip Crotwell wrote:
    
    Hi
    
    I am doing the same request 10 times in a row, so I guess that
    explains it. Yep, performance testing can be VERY tricky. I guess I
    should increment the request each time.
    
    Doing that, I am seeing more variation within runs than between the
    two services. Sometimes one is faster, sometimes the other, but no
    clear winner. Ah well, I will just assume they are roughly the same
    and use whichever seems easiest at the time.
    
    thanks
    Philip
    
    On Tue, Sep 4, 2012 at 11:51 AM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    Performance testing can be a little tricky.
    
    Are you by any chance making the same call(s) to ws-dataselect more than once?
    
    The reason I ask is that ws-dataselect caches requests. Subsequent calls after an initial request
    will normally be much faster.
    
    Other than that, I would be surprised to see on method being a lot faster than the other.
    
    -B
    
    On Sep 4, 2012, at 7:52 AM, Philip Crotwell wrote:
    
    Hi Bruce and Chad
    
    Turns out there was a bug in my code, and my tests of the
    bulkdataselect were actually using regular old dataselect, hence the
    very similar numbers. :(
    
    However, now that I really am using bulkdataselect, I am finding that
    it is slower, by about a factor of 1.5 to 2.5. This seems surprising
    given your information that bulkdataselect avoided hitting a disk on
    your end. For example here are some numbers asking for IU ANMO 00 BHZ
    starting at 2012-09-01T00:00:00:
    
    dataselect bulk
    10 min 0.3761 sec 36.75618 kb/s 0.6533 sec 21.160263 kb/s
    1 hour 0.5165 sec 153.64957 kb/s 1.1879001 sec 66.80697 kb/s
    10 hour 1.0785 sec 732.03894 kb/s 1.8532999 sec 425.99905 kb/s
    
    Each of these is average of 10 runs, each request separated by 2 seconds.
    
    Any thoughts on why this would be the case? Basically I am trying to
    decide which of these two to use by default, and from what I can see
    dataselect is the winner.
    
    Of course these are only cases where it is a single channel, so maybe
    bulkdataselect becomes a winner when there are many channels and time
    windows.
    
    thanks
    Philip
    
    On Thu, Aug 30, 2012 at 5:43 PM, Bruce Weertman
    <bruce<at>iris.washington.edu> wrote:
    
    Philip:
    
    The reason bulkdataselect is faster the second time round probably has to do with the underlying NFS filesystem.
    
    You would see the same behavior if you were on one of our internal machines and you just cat - ed one of the
    archive files. The first time you did the cat it would take much longer than the second time. The NFS filesystem
    reads the data into a buffer and holds it there for some period of time.
    
    -Bruce
    
    On Aug 30, 2012, at 1:57 PM, Philip Crotwell wrote:
    
    HI Chad
    
    That is really useful information, thanks.
    
    I am finding one interesting thing. If I request the same data twice,
    the first time takes about twice as long as the second. I assume there
    is some caching going on somewhere in your systems.
    
    I have done some experiments here and am finding no difference between
    bulk and dataselect. The biggest difference is with request time
    window, as to be expected. For 10 minutes of data I get 40 kb/s, for 1
    hour 240 kb/s and for 10 hours 960 kb/s. Makes sense as these data
    will be contiguous on your system and there is socket and other
    overhead.
    
    Might do some more playing, but seems a wash from the outside
    perspective for at least single channel single time window requests.
    
    thanks
    Philip
    
    On Wed, Aug 29, 2012 at 11:51 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hi,
    
    Yes, the default values are as you guessed. By default there is no limitation based on segment length, the documentation has been updated.
    
    Regarding ws-dataselect versus ws-bulkdataselect performance for large miniSEED requests: early tests indicated there is a performance difference for the user, but I haven't tested for a while and it's dependent on a number of factors. Also, there is a difference for the DMC internally. In short, if you just want raw miniSEED data ws-bulkdataselect is the preferred interface.
    
    A bit more explanation:
    For the ws-dataselect requests the data are placed into an internal cache for use, for example, by other services. The user needs to wait for the data to be extracted and cached. For ws-bulkdataselect requests the data are not cached, but are effectively streamed back to the user from the storage system directly. The extraction and caching can be really fast compared to the network connection to the user, so this difference is not always obvious but it would likely add up. For large requests of raw data ws-bulkdataselect is preferred as it uses fewer resources at the DMC.
    
    Chad
    
    On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:
    
    Hi
    
    What are the default values for minimumlength and longestonly? I am
    guessing 0 and false, but the docs don't say.
    http://www.iris.edu/ws/bulkdataselect/
    
    Also, have you found a performance increase with bulkdataselect over
    dataselect for large miniseed downloads?
    
    thanks
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
Bruce Weertman

Re: default in bulkdataselect

2012-08-29 16:25:13

Philip:

You are correct.

Put it another way, It will, by default, give you any data that it can that fits in the time window.

Cheers,
-Bruce

On Aug 29, 2012, at 5:40 AM, Philip Crotwell wrote:

Hi

What are the default values for minimumlength and longestonly? I am
guessing 0 and false, but the docs don't say.
http://www.iris.edu/ws/bulkdataselect/

Also, have you found a performance increase with bulkdataselect over
dataselect for large miniseed downloads?

thanks
Philip
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: default in bulkdataselect

Connect