Thread: ws_station network identifier

Started: 2011-06-14 21:00:27
Last activity: 2011-06-15 17:31:26
Topics: Web Services
John West
2011-06-14 21:00:27
Hello.

I'm using the station webservice in EMERALD to maintain a local cache of
network, station, and component metadata. In the Network level, reuse of
network codes makes it difficult to differentiate between new and modified
networks, e.g., if a network EndDate changes, my system registers it as a
new usage of the network code instead of modification of an existing
network.

Is there some unique identifier for each network which can be included in
the web service?

Thanks!

-- John

  • Chad Trabant
    2011-06-14 23:58:43

    Hello.

    In general, networks, like stations and channels, have the notion of a start time and an end time. For permanent networks there are normally not breaks in the continuity. For temporary networks there are often blocks of years allocated for specific experiments, for example XY 2005-2006, XY 2007-2009 and XY 2010-2010. We would not consider those temporary networks to be modifications of an existing network, but instead to be logically different networks. Essentially the network code combined with the start and end time uniquely identifies a "network", when the dates change and the network code is recycled it should be considered a new network. Not sure I understood your question, did that help at all?

    Chad

    On Jun 14, 2011, at 2:00 PM, John D. West wrote:

    Hello.

    I'm using the station webservice in EMERALD to maintain a local cache of network, station, and component metadata. In the Network level, reuse of network codes makes it difficult to differentiate between new and modified networks, e.g., if a network EndDate changes, my system registers it as a new usage of the network code instead of modification of an existing network.

    Is there some unique identifier for each network which can be included in the web service?

    Thanks!

    -- John
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices



    • John West
      2011-06-15 00:04:49
      That was what I assumed from the output of the web service. The question is:
      can a start date or end date EVER change? If an incorrect date is entered
      and then later corrected, I end up with overlapping networks because network
      code + start date + end date combine to form the unique identifier.

      -- John


      On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:


      Hello.

      In general, networks, like stations and channels, have the notion of a
      start time and an end time. For permanent networks there are normally not
      breaks in the continuity. For temporary networks there are often blocks of
      years allocated for specific experiments, for example XY 2005-2006, XY
      2007-2009 and XY 2010-2010. We would not consider those temporary networks
      to be modifications of an existing network, but instead to be logically
      different networks. Essentially the network code combined with the start
      and end time uniquely identifies a "network", when the dates change and the
      network code is recycled it should be considered a new network. Not sure I
      understood your question, did that help at all?

      Chad

      On Jun 14, 2011, at 2:00 PM, John D. West wrote:

      Hello.

      I'm using the station webservice in EMERALD to maintain a local cache of
      network, station, and component metadata. In the Network level, reuse of
      network codes makes it difficult to differentiate between new and modified
      networks, e.g., if a network EndDate changes, my system registers it as a
      new usage of the network code instead of modification of an existing
      network.

      Is there some unique identifier for each network which can be included in
      the web service?

      Thanks!

      -- John
      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices



      • Chad Trabant
        2011-06-15 00:45:36

        Got it. The network start/end dates don't change often but on occasion they do. I think the most common case is when a temporary network code is extended to match an extended experiment time window. The only other useful identifier that I can think of is the network description contained in the <Description> tags, although that is subject to change as well but also doesn't change often. Perhaps by checking the description you can figure out when it's the same network versus something new more often than not.

        Chad

        On Jun 14, 2011, at 5:04 PM, John D. West wrote:

        That was what I assumed from the output of the web service. The question is: can a start date or end date EVER change? If an incorrect date is entered and then later corrected, I end up with overlapping networks because network code + start date + end date combine to form the unique identifier.

        -- John


        On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

        Hello.

        In general, networks, like stations and channels, have the notion of a start time and an end time. For permanent networks there are normally not breaks in the continuity. For temporary networks there are often blocks of years allocated for specific experiments, for example XY 2005-2006, XY 2007-2009 and XY 2010-2010. We would not consider those temporary networks to be modifications of an existing network, but instead to be logically different networks. Essentially the network code combined with the start and end time uniquely identifies a "network", when the dates change and the network code is recycled it should be considered a new network. Not sure I understood your question, did that help at all?

        Chad

        On Jun 14, 2011, at 2:00 PM, John D. West wrote:

        Hello.

        I'm using the station webservice in EMERALD to maintain a local cache of network, station, and component metadata. In the Network level, reuse of network codes makes it difficult to differentiate between new and modified networks, e.g., if a network EndDate changes, my system registers it as a new usage of the network code instead of modification of an existing network.

        Is there some unique identifier for each network which can be included in the web service?

        Thanks!

        -- John
        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices




        • John West
          2011-06-15 00:49:59
          OK, thanks. I was hoping that your internal database had a unique identifier
          I could use to catch such changes and prevent duplicates.

          -- John


          On Tue, Jun 14, 2011 at 5:45 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:


          Got it. The network start/end dates don't change often but on occasion they
          do. I think the most common case is when a temporary network code is
          extended to match an extended experiment time window. The only other useful
          identifier that I can think of is the network description contained in the
          <Description> tags, although that is subject to change as well but also
          doesn't change often. Perhaps by checking the description you can figure
          out when it's the same network versus something new more often than not.

          Chad

          On Jun 14, 2011, at 5:04 PM, John D. West wrote:

          That was what I assumed from the output of the web service. The question
          is: can a start date or end date EVER change? If an incorrect date is
          entered and then later corrected, I end up with overlapping networks because
          network code + start date + end date combine to form the unique identifier.

          -- John


          On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>wrote:


          Hello.

          In general, networks, like stations and channels, have the notion of a
          start time and an end time. For permanent networks there are normally not
          breaks in the continuity. For temporary networks there are often blocks of
          years allocated for specific experiments, for example XY 2005-2006, XY
          2007-2009 and XY 2010-2010. We would not consider those temporary networks
          to be modifications of an existing network, but instead to be logically
          different networks. Essentially the network code combined with the start
          and end time uniquely identifies a "network", when the dates change and the
          network code is recycled it should be considered a new network. Not sure I
          understood your question, did that help at all?

          Chad

          On Jun 14, 2011, at 2:00 PM, John D. West wrote:

          Hello.

          I'm using the station webservice in EMERALD to maintain a local cache of
          network, station, and component metadata. In the Network level, reuse of
          network codes makes it difficult to differentiate between new and modified
          networks, e.g., if a network EndDate changes, my system registers it as a
          new usage of the network code instead of modification of an existing
          network.

          Is there some unique identifier for each network which can be included
          in the web service?

          Thanks!

          -- John
          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices





        • Philip Crotwell
          2011-06-15 04:08:37
          Hi John

          I have had more than a few headaches along the lines of what you are
          describing. There is good news and bad news from my experiences. The
          good news is that mostly you can use the network code alone for
          permanent networks and network code and begin year for temporary
          networks, ie BK and XA2007 are mostly unique and fixed. The bad news
          is that even this is only "mostly" a unique identifier. In general I
          think the permanent network codes are single and unique and temporary
          network codes are issued for a given begin year while they may be
          extended, ie end date change, it would be really weird for the begin
          date to change.

          You should NOT use the begin date as part of the key for permanent
          networks as those have changed over the years. A some point in the
          past the begin time for permanent networks was dynamically determined
          from the earliest data at the DMC, not sure if that is still the case.
          So some networks were in the database with some data and then later
          they sent in additional "old" data, causing the begin times to move
          backwards. For example BK used to start in the 80s I think, but now
          starts in the 30s?

          More bad news is that the AF network (I think I am remembering
          correctly), a single permanent network, at some point split into two
          networks due to issues related to some data being restricted and some
          not. So my software started having real problems because it was coded
          to assume that the 2 char network code was unique for permanent
          networks and suddenly there were 2 distinct networks (at least at the
          software level) with the same code. I think there is work at the DMC
          to redo the notion of restricted data so that this bifurcation of that
          network will no longer be an issue in the future, but just pointing it
          out as an example of how limited the options are for creating a unique
          ID based on anything data "in" a network. Basically all fields are
          subject to change, meaning nothing can be assured to be a unique id.
          Big :(

          I think this is the argument given way back when people were creating
          database normalization theories and arguing for meaningless integer
          database ids, because any ID based on real world data is subject to
          change and so can not be counted on for a good id.

          One more peice of bad news, the same problems that exist in the
          network level also exist at the station and channel level, except that
          they are even more likely to change.

          I should also say that this is not a fault of the DMC, they don't
          control when or how networks make changes to their metadata. But it is
          a problem none the less as we simply do not have a globally unique,
          non-changing identifier for any of our metadata. You do the best you
          can and try to put code in to catch when things change. I have had
          very limited success and grumble with regularity about how hard it is
          to keep a metadata database in sync with the upstream one. It is just
          a really really hard problem with no good solutions as far as I can
          see. If you come up with a good answer please, please let me know.

          Good luck...
          Philip

          On Tue, Jun 14, 2011 at 8:45 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

          Got it. The network start/end dates don't change often but on occasion they
          do.  I think the most common case is when a temporary network code is
          extended to match an extended experiment time window.  The only other useful
          identifier that I can think of is the network description contained in the
          <Description> tags, although that is subject to change as well but also
          doesn't change often.  Perhaps by checking the description you can figure
          out when it's the same network versus something new more often than not.
          Chad
          On Jun 14, 2011, at 5:04 PM, John D. West wrote:

          That was what I assumed from the output of the web service. The question is:
          can a start date or end date EVER change? If an incorrect date is entered
          and then later corrected, I end up with overlapping networks because network
          code + start date + end date combine to form the unique identifier.
          -- John


          On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>
          wrote:

          Hello.

          In general, networks, like stations and channels, have the notion of a
          start time and an end time.  For permanent networks there are normally not
          breaks in the continuity.  For temporary networks there are often blocks of
          years allocated for specific experiments, for example XY 2005-2006, XY
          2007-2009 and XY 2010-2010.  We would not consider those temporary networks
          to be modifications of an existing network, but instead to be logically
          different networks.  Essentially the network code combined with the start
          and end time uniquely identifies a "network", when the dates change and the
          network code is recycled it should be considered a new network.  Not sure I
          understood your question, did that help at all?

          Chad

          On Jun 14, 2011, at 2:00 PM, John D. West wrote:

          Hello.

          I'm using the station webservice in EMERALD to maintain a local cache of
          network, station, and component metadata. In the Network level, reuse of
          network codes makes it difficult to differentiate between new and modified
          networks, e.g., if a network EndDate changes, my system registers it as a
          new usage of the network code instead of modification of an existing
          network.

          Is there some unique identifier for each network which can be included
          in the web service?

          Thanks!

          -- John
          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices




          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices




          • John West
            2011-06-15 17:31:26
            Hi, Philip.

            Thanks for all of the info. I'm working on a set of rules on handling such
            updates and would like your thoughts on them when I'm done. It seems clear
            that there will always be exceptions, so I think EMERALD should include a
            way to automatically disseminate corrections when needed.

            Incidentally, I'm a big believer in numeric surrogate primary keys on
            database tables and use them throughout EMERALD.

            Thanks!

            -- John


            On Tue, Jun 14, 2011 at 6:08 PM, Philip Crotwell <crotwell<at>seis.sc.edu>wrote:

            Hi John

            I have had more than a few headaches along the lines of what you are
            describing. There is good news and bad news from my experiences. The
            good news is that mostly you can use the network code alone for
            permanent networks and network code and begin year for temporary
            networks, ie BK and XA2007 are mostly unique and fixed. The bad news
            is that even this is only "mostly" a unique identifier. In general I
            think the permanent network codes are single and unique and temporary
            network codes are issued for a given begin year while they may be
            extended, ie end date change, it would be really weird for the begin
            date to change.

            You should NOT use the begin date as part of the key for permanent
            networks as those have changed over the years. A some point in the
            past the begin time for permanent networks was dynamically determined
            from the earliest data at the DMC, not sure if that is still the case.
            So some networks were in the database with some data and then later
            they sent in additional "old" data, causing the begin times to move
            backwards. For example BK used to start in the 80s I think, but now
            starts in the 30s?

            More bad news is that the AF network (I think I am remembering
            correctly), a single permanent network, at some point split into two
            networks due to issues related to some data being restricted and some
            not. So my software started having real problems because it was coded
            to assume that the 2 char network code was unique for permanent
            networks and suddenly there were 2 distinct networks (at least at the
            software level) with the same code. I think there is work at the DMC
            to redo the notion of restricted data so that this bifurcation of that
            network will no longer be an issue in the future, but just pointing it
            out as an example of how limited the options are for creating a unique
            ID based on anything data "in" a network. Basically all fields are
            subject to change, meaning nothing can be assured to be a unique id.
            Big :(

            I think this is the argument given way back when people were creating
            database normalization theories and arguing for meaningless integer
            database ids, because any ID based on real world data is subject to
            change and so can not be counted on for a good id.

            One more peice of bad news, the same problems that exist in the
            network level also exist at the station and channel level, except that
            they are even more likely to change.

            I should also say that this is not a fault of the DMC, they don't
            control when or how networks make changes to their metadata. But it is
            a problem none the less as we simply do not have a globally unique,
            non-changing identifier for any of our metadata. You do the best you
            can and try to put code in to catch when things change. I have had
            very limited success and grumble with regularity about how hard it is
            to keep a metadata database in sync with the upstream one. It is just
            a really really hard problem with no good solutions as far as I can
            see. If you come up with a good answer please, please let me know.

            Good luck...
            Philip

            On Tue, Jun 14, 2011 at 8:45 PM, Chad Trabant <chad<at>iris.washington.edu>
            wrote:

            Got it. The network start/end dates don't change often but on occasion
            they
            do. I think the most common case is when a temporary network code is
            extended to match an extended experiment time window. The only other
            useful
            identifier that I can think of is the network description contained in
            the
            <Description> tags, although that is subject to change as well but also
            doesn't change often. Perhaps by checking the description you can figure
            out when it's the same network versus something new more often than not.
            Chad
            On Jun 14, 2011, at 5:04 PM, John D. West wrote:

            That was what I assumed from the output of the web service. The question
            is:
            can a start date or end date EVER change? If an incorrect date is entered
            and then later corrected, I end up with overlapping networks because
            network
            code + start date + end date combine to form the unique identifier.
            -- John


            On Tue, Jun 14, 2011 at 4:58 PM, Chad Trabant <chad<at>iris.washington.edu>
            wrote:

            Hello.

            In general, networks, like stations and channels, have the notion of a
            start time and an end time. For permanent networks there are normally
            not
            breaks in the continuity. For temporary networks there are often blocks
            of
            years allocated for specific experiments, for example XY 2005-2006, XY
            2007-2009 and XY 2010-2010. We would not consider those temporary
            networks
            to be modifications of an existing network, but instead to be logically
            different networks. Essentially the network code combined with the
            start
            and end time uniquely identifies a "network", when the dates change and
            the
            network code is recycled it should be considered a new network. Not
            sure I
            understood your question, did that help at all?

            Chad

            On Jun 14, 2011, at 2:00 PM, John D. West wrote:

            Hello.

            I'm using the station webservice in EMERALD to maintain a local cache
            of
            network, station, and component metadata. In the Network level, reuse
            of
            network codes makes it difficult to differentiate between new and
            modified
            networks, e.g., if a network EndDate changes, my system registers it
            as a
            new usage of the network code instead of modification of an existing
            network.

            Is there some unique identifier for each network which can be included
            in the web service?

            Thanks!

            -- John
            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices




            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices




13:23:42 v.22510d55