SAGE: Thread: ws_station network identifier

Started: 2011-06-14 21:00:27

Last activity: 2011-06-15 17:31:26

Topics: Web Services

John West

2011-06-14 21:00:27

Hello.

I'm using the station webservice in EMERALD to maintain a local cache of
network, station, and component metadata. In the Network level, reuse of
network codes makes it difficult to differentiate between new and modified
networks, e.g., if a network EndDate changes, my system registers it as a
new usage of the network code instead of modification of an existing
network.

Is there some unique identifier for each network which can be included in
the web service?

Thanks!

-- John

Chad Trabant

Re: ws_station network identifier

2011-06-14 23:58:43

Hello.

In general, networks, like stations and channels, have the notion of a start time and an end time. For permanent networks there are normally not breaks in the continuity. For temporary networks there are often blocks of years allocated for specific experiments, for example XY 2005-2006, XY 2007-2009 and XY 2010-2010. We would not consider those temporary networks to be modifications of an existing network, but instead to be logically different networks. Essentially the network code combined with the start and end time uniquely identifies a "network", when the dates change and the network code is recycled it should be considered a new network. Not sure I understood your question, did that help at all?

Chad

On Jun 14, 2011, at 2:00 PM, John D. West wrote:

Hello.

I'm using the station webservice in EMERALD to maintain a local cache of network, station, and component metadata. In the Network level, reuse of network codes makes it difficult to differentiate between new and modified networks, e.g., if a network EndDate changes, my system registers it as a new usage of the network code instead of modification of an existing network.

Is there some unique identifier for each network which can be included in the web service?

Thanks!

-- John
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
- John West
  
  Re: ws_station network identifier
  
  2011-06-15 00:04:49
  
  That was what I assumed from the output of the web service. The question is:
  can a start date or end date EVER change? If an incorrect date is entered
  and then later corrected, I end up with overlapping networks because network
  code + start date + end date combine to form the unique identifier.
  
  -- John
  
  On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:
  
  Hello.
  
  In general, networks, like stations and channels, have the notion of a
  start time and an end time. For permanent networks there are normally not
  breaks in the continuity. For temporary networks there are often blocks of
  years allocated for specific experiments, for example XY 2005-2006, XY
  2007-2009 and XY 2010-2010. We would not consider those temporary networks
  to be modifications of an existing network, but instead to be logically
  different networks. Essentially the network code combined with the start
  and end time uniquely identifies a "network", when the dates change and the
  network code is recycled it should be considered a new network. Not sure I
  understood your question, did that help at all?
  
  Chad
  
  On Jun 14, 2011, at 2:00 PM, John D. West wrote:
  
  Hello.
  
  I'm using the station webservice in EMERALD to maintain a local cache of
  
  network, station, and component metadata. In the Network level, reuse of
  network codes makes it difficult to differentiate between new and modified
  networks, e.g., if a network EndDate changes, my system registers it as a
  new usage of the network code instead of modification of an existing
  network.
  
  Is there some unique identifier for each network which can be included in
  
  the web service?
  
  Thanks!
  
  -- John
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  - Chad Trabant
    
    Re: ws_station network identifier
    
    2011-06-15 00:45:36
    
    Got it. The network start/end dates don't change often but on occasion they do. I think the most common case is when a temporary network code is extended to match an extended experiment time window. The only other useful identifier that I can think of is the network description contained in the <Description> tags, although that is subject to change as well but also doesn't change often. Perhaps by checking the description you can figure out when it's the same network versus something new more often than not.
    
    Chad
    
    On Jun 14, 2011, at 5:04 PM, John D. West wrote:
    
    That was what I assumed from the output of the web service. The question is: can a start date or end date EVER change? If an incorrect date is entered and then later corrected, I end up with overlapping networks because network code + start date + end date combine to form the unique identifier.
    
    -- John
    
    On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Hello.
    
    In general, networks, like stations and channels, have the notion of a start time and an end time. For permanent networks there are normally not breaks in the continuity. For temporary networks there are often blocks of years allocated for specific experiments, for example XY 2005-2006, XY 2007-2009 and XY 2010-2010. We would not consider those temporary networks to be modifications of an existing network, but instead to be logically different networks. Essentially the network code combined with the start and end time uniquely identifies a "network", when the dates change and the network code is recycled it should be considered a new network. Not sure I understood your question, did that help at all?
    
    Chad
    
    On Jun 14, 2011, at 2:00 PM, John D. West wrote:
    
    Hello.
    
    I'm using the station webservice in EMERALD to maintain a local cache of network, station, and component metadata. In the Network level, reuse of network codes makes it difficult to differentiate between new and modified networks, e.g., if a network EndDate changes, my system registers it as a new usage of the network code instead of modification of an existing network.
    
    Is there some unique identifier for each network which can be included in the web service?
    
    Thanks!
    
    -- John
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    John West
    
    Re: ws_station network identifier
    
    2011-06-15 00:49:59
    
    OK, thanks. I was hoping that your internal database had a unique identifier
    I could use to catch such changes and prevent duplicates.
    
    -- John
    
    On Tue, Jun 14, 2011 at 5:45 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:
    
    Got it. The network start/end dates don't change often but on occasion they
    do. I think the most common case is when a temporary network code is
    extended to match an extended experiment time window. The only other useful
    identifier that I can think of is the network description contained in the
    <Description> tags, although that is subject to change as well but also
    doesn't change often. Perhaps by checking the description you can figure
    out when it's the same network versus something new more often than not.
    
    Chad
    
    On Jun 14, 2011, at 5:04 PM, John D. West wrote:
    
    That was what I assumed from the output of the web service. The question
    is: can a start date or end date EVER change? If an incorrect date is
    entered and then later corrected, I end up with overlapping networks because
    network code + start date + end date combine to form the unique identifier.
    
    -- John
    
    On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:
    
    Hello.
    
    In general, networks, like stations and channels, have the notion of a
    start time and an end time. For permanent networks there are normally not
    breaks in the continuity. For temporary networks there are often blocks of
    years allocated for specific experiments, for example XY 2005-2006, XY
    2007-2009 and XY 2010-2010. We would not consider those temporary networks
    to be modifications of an existing network, but instead to be logically
    different networks. Essentially the network code combined with the start
    and end time uniquely identifies a "network", when the dates change and the
    network code is recycled it should be considered a new network. Not sure I
    understood your question, did that help at all?
    
    Chad
    
    On Jun 14, 2011, at 2:00 PM, John D. West wrote:
    
    Hello.
    
    I'm using the station webservice in EMERALD to maintain a local cache of
    
    network, station, and component metadata. In the Network level, reuse of
    network codes makes it difficult to differentiate between new and modified
    networks, e.g., if a network EndDate changes, my system registers it as a
    new usage of the network code instead of modification of an existing
    network.
    
    Is there some unique identifier for each network which can be included
    
    in the web service?
    
    Thanks!
    
    -- John
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: ws_station network identifier
    
    2011-06-15 04:08:37
    
    Hi John
    
    I have had more than a few headaches along the lines of what you are
    describing. There is good news and bad news from my experiences. The
    good news is that mostly you can use the network code alone for
    permanent networks and network code and begin year for temporary
    networks, ie BK and XA2007 are mostly unique and fixed. The bad news
    is that even this is only "mostly" a unique identifier. In general I
    think the permanent network codes are single and unique and temporary
    network codes are issued for a given begin year while they may be
    extended, ie end date change, it would be really weird for the begin
    date to change.
    
    You should NOT use the begin date as part of the key for permanent
    networks as those have changed over the years. A some point in the
    past the begin time for permanent networks was dynamically determined
    from the earliest data at the DMC, not sure if that is still the case.
    So some networks were in the database with some data and then later
    they sent in additional "old" data, causing the begin times to move
    backwards. For example BK used to start in the 80s I think, but now
    starts in the 30s?
    
    More bad news is that the AF network (I think I am remembering
    correctly), a single permanent network, at some point split into two
    networks due to issues related to some data being restricted and some
    not. So my software started having real problems because it was coded
    to assume that the 2 char network code was unique for permanent
    networks and suddenly there were 2 distinct networks (at least at the
    software level) with the same code. I think there is work at the DMC
    to redo the notion of restricted data so that this bifurcation of that
    network will no longer be an issue in the future, but just pointing it
    out as an example of how limited the options are for creating a unique
    ID based on anything data "in" a network. Basically all fields are
    subject to change, meaning nothing can be assured to be a unique id.
    Big :(
    
    I think this is the argument given way back when people were creating
    database normalization theories and arguing for meaningless integer
    database ids, because any ID based on real world data is subject to
    change and so can not be counted on for a good id.
    
    One more peice of bad news, the same problems that exist in the
    network level also exist at the station and channel level, except that
    they are even more likely to change.
    
    I should also say that this is not a fault of the DMC, they don't
    control when or how networks make changes to their metadata. But it is
    a problem none the less as we simply do not have a globally unique,
    non-changing identifier for any of our metadata. You do the best you
    can and try to put code in to catch when things change. I have had
    very limited success and grumble with regularity about how hard it is
    to keep a metadata database in sync with the upstream one. It is just
    a really really hard problem with no good solutions as far as I can
    see. If you come up with a good answer please, please let me know.
    
    Good luck...
    Philip
    
    On Tue, Jun 14, 2011 at 8:45 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Got it. The network start/end dates don't change often but on occasion they
    do. I think the most common case is when a temporary network code is
    extended to match an extended experiment time window. The only other useful
    identifier that I can think of is the network description contained in the
    <Description> tags, although that is subject to change as well but also
    doesn't change often. Perhaps by checking the description you can figure
    out when it's the same network versus something new more often than not.
    Chad
    On Jun 14, 2011, at 5:04 PM, John D. West wrote:
    
    That was what I assumed from the output of the web service. The question is:
    can a start date or end date EVER change? If an incorrect date is entered
    and then later corrected, I end up with overlapping networks because network
    code + start date + end date combine to form the unique identifier.
    -- John
    
    On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>
    wrote:
    
    Hello.
    
    In general, networks, like stations and channels, have the notion of a
    start time and an end time. For permanent networks there are normally not
    breaks in the continuity. For temporary networks there are often blocks of
    years allocated for specific experiments, for example XY 2005-2006, XY
    2007-2009 and XY 2010-2010. We would not consider those temporary networks
    to be modifications of an existing network, but instead to be logically
    different networks. Essentially the network code combined with the start
    and end time uniquely identifies a "network", when the dates change and the
    network code is recycled it should be considered a new network. Not sure I
    understood your question, did that help at all?
    
    Chad
    
    On Jun 14, 2011, at 2:00 PM, John D. West wrote:
    
    Hello.
    
    I'm using the station webservice in EMERALD to maintain a local cache of
    network, station, and component metadata. In the Network level, reuse of
    network codes makes it difficult to differentiate between new and modified
    networks, e.g., if a network EndDate changes, my system registers it as a
    new usage of the network code instead of modification of an existing
    network.
    
    Is there some unique identifier for each network which can be included
    in the web service?
    
    Thanks!
    
    -- John
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    John West
    
    Re: ws_station network identifier
    
    2011-06-15 17:31:26
    
    Hi, Philip.
    
    Thanks for all of the info. I'm working on a set of rules on handling such
    updates and would like your thoughts on them when I'm done. It seems clear
    that there will always be exceptions, so I think EMERALD should include a
    way to automatically disseminate corrections when needed.
    
    Incidentally, I'm a big believer in numeric surrogate primary keys on
    database tables and use them throughout EMERALD.
    
    Thanks!
    
    -- John
    
    On Tue, Jun 14, 2011 at 6:08 PM, Philip Crotwell <crotwell<at>seis.sc.edu>wrote:
    
    Hi John
    
    I have had more than a few headaches along the lines of what you are
    describing. There is good news and bad news from my experiences. The
    good news is that mostly you can use the network code alone for
    permanent networks and network code and begin year for temporary
    networks, ie BK and XA2007 are mostly unique and fixed. The bad news
    is that even this is only "mostly" a unique identifier. In general I
    think the permanent network codes are single and unique and temporary
    network codes are issued for a given begin year while they may be
    extended, ie end date change, it would be really weird for the begin
    date to change.
    
    You should NOT use the begin date as part of the key for permanent
    networks as those have changed over the years. A some point in the
    past the begin time for permanent networks was dynamically determined
    from the earliest data at the DMC, not sure if that is still the case.
    So some networks were in the database with some data and then later
    they sent in additional "old" data, causing the begin times to move
    backwards. For example BK used to start in the 80s I think, but now
    starts in the 30s?
    
    More bad news is that the AF network (I think I am remembering
    correctly), a single permanent network, at some point split into two
    networks due to issues related to some data being restricted and some
    not. So my software started having real problems because it was coded
    to assume that the 2 char network code was unique for permanent
    networks and suddenly there were 2 distinct networks (at least at the
    software level) with the same code. I think there is work at the DMC
    to redo the notion of restricted data so that this bifurcation of that
    network will no longer be an issue in the future, but just pointing it
    out as an example of how limited the options are for creating a unique
    ID based on anything data "in" a network. Basically all fields are
    subject to change, meaning nothing can be assured to be a unique id.
    Big :(
    
    I think this is the argument given way back when people were creating
    database normalization theories and arguing for meaningless integer
    database ids, because any ID based on real world data is subject to
    change and so can not be counted on for a good id.
    
    One more peice of bad news, the same problems that exist in the
    network level also exist at the station and channel level, except that
    they are even more likely to change.
    
    I should also say that this is not a fault of the DMC, they don't
    control when or how networks make changes to their metadata. But it is
    a problem none the less as we simply do not have a globally unique,
    non-changing identifier for any of our metadata. You do the best you
    can and try to put code in to catch when things change. I have had
    very limited success and grumble with regularity about how hard it is
    to keep a metadata database in sync with the upstream one. It is just
    a really really hard problem with no good solutions as far as I can
    see. If you come up with a good answer please, please let me know.
    
    Good luck...
    Philip
    
    On Tue, Jun 14, 2011 at 8:45 PM, Chad Trabant <chad<at>iris.washington.edu>
    wrote:
    
    Got it. The network start/end dates don't change often but on occasion
    
    they
    
    do. I think the most common case is when a temporary network code is
    extended to match an extended experiment time window. The only other
    
    useful
    
    identifier that I can think of is the network description contained in
    
    the
    
    <Description> tags, although that is subject to change as well but also
    doesn't change often. Perhaps by checking the description you can figure
    out when it's the same network versus something new more often than not.
    Chad
    On Jun 14, 2011, at 5:04 PM, John D. West wrote:
    
    That was what I assumed from the output of the web service. The question
    
    is:
    
    can a start date or end date EVER change? If an incorrect date is entered
    and then later corrected, I end up with overlapping networks because
    
    network
    
    code + start date + end date combine to form the unique identifier.
    -- John
    
    On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>
    wrote:
    
    Hello.
    
    In general, networks, like stations and channels, have the notion of a
    start time and an end time. For permanent networks there are normally
    
    not
    
    breaks in the continuity. For temporary networks there are often blocks
    
    of
    
    years allocated for specific experiments, for example XY 2005-2006, XY
    2007-2009 and XY 2010-2010. We would not consider those temporary
    
    networks
    
    to be modifications of an existing network, but instead to be logically
    different networks. Essentially the network code combined with the
    
    start
    
    and end time uniquely identifies a "network", when the dates change and
    
    the
    
    network code is recycled it should be considered a new network. Not
    
    sure I
    
    understood your question, did that help at all?
    
    Chad
    
    On Jun 14, 2011, at 2:00 PM, John D. West wrote:
    
    Hello.
    
    I'm using the station webservice in EMERALD to maintain a local cache
    
    of
    
    network, station, and component metadata. In the Network level, reuse
    
    of
    
    network codes makes it difficult to differentiate between new and
    
    modified
    
    networks, e.g., if a network EndDate changes, my system registers it
    
    as a
    
    new usage of the network code instead of modification of an existing
    network.
    
    Is there some unique identifier for each network which can be included
    in the web service?
    
    Thanks!
    
    -- John
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: ws_station network identifier

Connect