Thread: A question of location ID, how to represent empty IDs in XML?

Started: 2014-07-23 17:30:46
Last activity: 2014-08-14 15:59:34
Topics: Web Services

Hello WS users and developers,

A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

Here are the options being considered for mapping an empty location ID in SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

Thanks for reading this far. Your opinion and input is appreciated.

regards,
Chad



  • Hi

    Years ago we had full SEED. Then because of keeping metadata updated,
    we switched to a separation into dataless SEED + miniseed. Now,
    because of the complexities and limitations of dataless SEED, the
    future looks like StationXML + miniseed. I am all for this change, but
    how the location id is resolved really needs to address not just what
    do we do in StationXML, but what do we do in StationXML + miniseed.

    I also lean towards "--" for the simple reason that there are so many
    instances where I have been bitten by spaces or nulls. Even though I
    know about this, I still get caught. File names, urls, user gui
    displays, etc all have problems with spaces nor nulls and as a
    practical matter it is harder to see something that isn't there than
    something that is there. Furthermore, using null or space-space is
    really hard as a command line argument in the shell. That said, "--"
    already means "long option name" in many *nix programs, so if we were
    starting from scratch, underscores like "__" might be a better choice.
    The SEED manual already lists underscore as a separate item in the
    flags section (p32), so maybe worth considering.

    But if option 3 is choosen, would there be any possibility of amending
    the SEED spec so that "--" is actually valid within the location id
    field, with the caveat that it is synonymous with space-space/null,
    but "--" is the preferred value? I realize that doing a global search
    and replace on a petabyte of miniseed data is probably not going to
    happen, but it would be really nice if whatever location id is in
    StationXML, it is exactly 2 characters and is the exact same 2
    characters as in miniseed.

    Frankly the whole idea of making location ids "optional" was a real
    mistake IMHO. I am sure that anyone that has every written code to
    deal with location ids has something that looks like:
    if (locid == null or locid == "" or locid == " " or locid == "--")
    then locid = "--"
    which is just a painfully stupid thing to have to do over and over and
    over again. Grumble grumble grumble. :(

    Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
    station or channel codes, so addressing that at the same time might be
    wise.

    My $0.02, please pick one string, and only one string, and use it everywhere.

    thanks
    Philip


    On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

    Hello WS users and developers,

    A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

    Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

    Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

    There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

    Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

    As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

    Here are the options being considered for mapping an empty location ID in SEED to StationXML:

    1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

    2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

    3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

    All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

    In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

    Thanks for reading this far. Your opinion and input is appreciated.

    regards,
    Chad


    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices



    • On Jul 23, 2014, at 11:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

      Hi

      Years ago we had full SEED. Then because of keeping metadata updated,
      we switched to a separation into dataless SEED + miniseed. Now,
      because of the complexities and limitations of dataless SEED, the
      future looks like StationXML + miniseed. I am all for this change, but
      how the location id is resolved really needs to address not just what
      do we do in StationXML, but what do we do in StationXML + miniseed.

      I also lean towards "--" for the simple reason that there are so many
      instances where I have been bitten by spaces or nulls. Even though I
      know about this, I still get caught. File names, urls, user gui
      displays, etc all have problems with spaces nor nulls and as a
      practical matter it is harder to see something that isn't there than
      something that is there. Furthermore, using null or space-space is
      really hard as a command line argument in the shell. That said, "--"
      already means "long option name" in many *nix programs, so if we were
      starting from scratch, underscores like "__" might be a better choice.
      The SEED manual already lists underscore as a separate item in the
      flags section (p32), so maybe worth considering.

      Hi Philip, thanks for your thoughts.

      The underscore character is certainly another option. What I do not like about it is low readability, in particular in URLs they can become completely lost.

      But if option 3 is choosen, would there be any possibility of amending
      the SEED spec so that "--" is actually valid within the location id
      field, with the caveat that it is synonymous with space-space/null,
      but "--" is the preferred value? I realize that doing a global search
      and replace on a petabyte of miniseed data is probably not going to
      happen, but it would be really nice if whatever location id is in
      StationXML, it is exactly 2 characters and is the exact same 2
      characters as in miniseed.

      If the FDSN were to go the route of "--" in StationXML it seems natural to extend the conversation to potential changes in SEED headers and data records. That is just a bigger can of worms and would take more time to address. The idea the two should be treated synonymously is just what I have in mind and would allow us to transition over time.

      Frankly the whole idea of making location ids "optional" was a real
      mistake IMHO. I am sure that anyone that has every written code to
      deal with location ids has something that looks like:
      if (locid == null or locid == "" or locid == " " or locid == "--")
      then locid = "--"
      which is just a painfully stupid thing to have to do over and over and
      over again. Grumble grumble grumble. :(

      Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
      station or channel codes, so addressing that at the same time might be
      wise.

      Indeed, this should be clarified in the SEED spec.

      Chad

      My $0.02, please pick one string, and only one string, and use it everywhere.

      thanks
      Philip


      On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

      Hello WS users and developers,

      A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

      Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

      Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

      There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

      Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

      As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

      Here are the options being considered for mapping an empty location ID in SEED to StationXML:

      1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

      2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

      3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

      All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

      In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

      Thanks for reading this far. Your opinion and input is appreciated.

      regards,
      Chad


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices

      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices



  • Is modifying stationxml schema (to allow null location, required=false) a
    possibility? example:
    <Channel startDate="1992-09-23T00:00:00" restrictedStatus="open"
    endDate="1994-04-01T00:00:00" code="BHE">
    vs
    <Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="
    open" endDate="1994-04-01T00:00:00" code="BHE">
    vs
    <Channel locationCode="--" startDate="1992-09-23T00:00:00"
    restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">

    It is very reasonable to have a null value for location in any object
    representation of station schema. " " or "" is inaccurate and only
    introduces more trouble and complexity.


    If changing the schema is not an option then " " or "" is a very bad idea.
    Many parsers treat "" or " " as empty and will ignore them. If
    translating this into SEED is the issue, then it is the convertor
    responsibility to take care of the conversion.

    Yazan


    On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu>
    wrote:


    Hello WS users and developers,

    A recent discussion between FDSN data centers is centered on
    representation of empty location IDs in StationXML, the default format
    returned by the fdsnws-station web service. The DMC may be changing how it
    represents location ID in XML and text formats based on these discussions.
    We are asking for input as any such change will effect users of our
    metadata service.

    Some background: In the SEED channel naming scheme there is a hierarchy of
    network, station, location and channel identifiers. Of these, it is only
    the location ID that is commonly accepted to be empty. In the SEED format
    the location ID is a two-character field, where the value is left justified
    and padded with spaces if needed. When the value is empty the field is
    simply two spaces of padding.

    Historically, and presumably to avoid having an empty location ID, the DMC
    has represented “empty” location IDs as a string of two spaces. Following
    this practice, we express this in StationXML by setting the locationCode
    attribute to a string of two spaces. We have done this so long we
    sometimes forget that it is not compliant with a strict reading of SEED, at
    best it falls into the vagaries of SEED, on the other hand we have been
    doing it for years with no apparent problems (in fact it has helpfully
    avoided an empty core identifier).

    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string when the
    SEED value is empty. The justification is that this follows the SEED rules
    of trimming the padding spaces from the values.

    Unfortunately this means there are now flavors of StationXML that are
    incompatible in the core channel name identifiers. In other words, two
    StationXML documents for the same SEED channel appear, without extra field
    translation, to be different channels.

    As most of you are users of SEED and StationXML metadata (at some level)
    and some of you have written code to parse these formats and manage the
    data returned by the DMC and other FDSN data centers, we are asking for
    your input regarding the potential solutions.

    Here are the options being considered for mapping an empty location ID in
    SEED to StationXML:

    1) Set locationCode to two spaces. While the DMC and users have been
    using this for a long while, it is not precisely the SEED value (but the
    mapping could be formalized). Also, whitespace in attributes does have
    some theoretical challenges: the wonky rules for XML attributes related to
    whitespace handling require removal of spaces in some cases (we have never
    heard of problems though).

    2) Set locationCode to an empty string. This would match the strict value
    present in SEED, an empty identifier.

    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.

    All of these solutions are viable in that we can make them work in code,
    it is a matter of choosing one for future FDSN metadata, pick your poison
    so to speak.

    In my personal opinion, an empty location ID is an unfortunate quirk of
    SEED that we should rectify in StationXML. An empty identifier can be
    confused for “unknown” if the programmer is not careful, which is
    semantically very different than “set to empty”. The two-space strings
    that the DMC is currently using are also not ideal, they are hard for
    humans to read and potentially weird with XML rules. The dashed location
    ID avoids these issues but requires the most change. I also think
    requiring all readers of StationXML to translate (e.g. remove padding) is a
    bad idea, the values in SEED should be uniquely mapped to values in
    StationXML.

    Thanks for reading this far. Your opinion and input is appreciated.

    regards,
    Chad


    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices


    • Hi WS folks,

      For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python (my language of choice):

      "Beautiful is better than ugly.
      Explicit is better than implicit.
      Simple is better than complex.
      Complex is better than complicated.
      Flat is better than nested.
      Sparse is better than dense.
      Readability counts.
      Special cases aren't special enough to break the rules.
      Although practicality beats purity.
      Errors should never pass silently.
      Unless explicitly silenced."

      Number 2 is especially relevant here:
      "Explicit is better than implicit."

      Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.

      Just my $0.02.

      - Rob Newman, IRIS DMC


      On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:

      Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
      <Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
      vs
      <Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
      vs
      <Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">

      It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.


      If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.

      Yazan


      On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

      Hello WS users and developers,

      A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

      Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

      Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

      There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

      Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

      As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

      Here are the options being considered for mapping an empty location ID in SEED to StationXML:

      1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

      2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

      3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

      All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

      In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

      Thanks for reading this far. Your opinion and input is appreciated.

      regards,
      Chad





      • Hello Rob,

        Rob Newman wrote on 24.07.2014 18:51:
        For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html (my language of choice):

        "Beautiful is better than ugly.
        Explicit is better than implicit.
        Simple is better than complex.
        Complex is better than complicated.
        Flat is better than nested.
        Sparse is better than dense.
        Readability counts.
        Special cases aren't special enough to break the rules.
        Although practicality beats purity.
        Errors should never pass silently.
        Unless explicitly silenced."

        I'd add "Compatible is better than incompatible." :)

        Number 2 is especially relevant here:
        "Explicit is better than implicit."

        My favorite would be:
        "Special cases aren't special enough to break the rules."

        Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.

        But what if we simply *mean* "empty string"?

        The issue is not about beauty, pain or ease. It's about standard
        conformance. We already have a channel naming standard. If a new data
        format cannot accommodate existing channel naming, then the new format
        is flawed. But that's not even the case here...

        An XML document that contains

        <Channel locationCode="" ...

        is not malformed. There's an attribute that *explicitly* contains an
        empty string and a parser has to produce it as such. Not as null, nil or
        none, but as an empty string. Otherwise the parser is broken and needs
        to be fixed, not the data!

        Again: It's not about beauty. We all agree that current channel naming
        is not particularly beautiful and has limitations. But our business is
        not to try to solve that issue now and here.

        Cheers
        Joachim

        • It sounds like you are saying "change is hard, so we shouldn't do it".
          I would argue that change is hard and so if we don't do it now it will
          never happen. StationXML is new enough that there is already a
          disruption, we should seize the chance. If we do not do something now
          about null loc ids, it will be a decade or two before we get another
          chance.

          It is time to drive the stake through the heart of null location ids.
          Kill the evil while we have a chance.

          Philip


          On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
          Hello Rob,

          Rob Newman wrote on 24.07.2014 18:51:

          For what it's worth, I would also vote for the "--" standard. To quote
          from the Zen of Python
          http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
          (my language of choice):


          "Beautiful is better than ugly.
          Explicit is better than implicit.
          Simple is better than complex.
          Complex is better than complicated.
          Flat is better than nested.
          Sparse is better than dense.
          Readability counts.
          Special cases aren't special enough to break the rules.
          Although practicality beats purity.
          Errors should never pass silently.
          Unless explicitly silenced."


          I'd add "Compatible is better than incompatible." :)


          Number 2 is especially relevant here:
          "Explicit is better than implicit."


          My favorite would be:

          "Special cases aren't special enough to break the rules."

          Quoted whitespace and nulls are painful. Code what you mean, and mean what
          you code. It's easier for everyone.


          But what if we simply *mean* "empty string"?

          The issue is not about beauty, pain or ease. It's about standard
          conformance. We already have a channel naming standard. If a new data format
          cannot accommodate existing channel naming, then the new format is flawed.
          But that's not even the case here...

          An XML document that contains

          <Channel locationCode="" ...

          is not malformed. There's an attribute that *explicitly* contains an empty
          string and a parser has to produce it as such. Not as null, nil or none, but
          as an empty string. Otherwise the parser is broken and needs to be fixed,
          not the data!

          Again: It's not about beauty. We all agree that current channel naming is
          not particularly beautiful and has limitations. But our business is not to
          try to solve that issue now and here.

          Cheers
          Joachim

          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices

          • Hi Philip and All,

            I totaly agree with Joachim, was planning to answer but he was much
            faster. What you guys are proposing is not a solution. the station XML
            supports nicely the empty string and it is not null. There is a type
            difference here in Python and in any other language and can be nicely
            handled internally.

            Also the location id is not just a string it is a key entry to link
            miniseed to metadata and making an exception at this level just
            because a user interface cannot proper render it without ambiguity
            does not sounds like a proper way proposal. I am not favorable in
            creating an exception that will have to be carried over along the
            decades to come. Alternatives solutions for this issue should be
            searched on the end user interface.

            with my best regards,

            Marcelo Bianchi
            --


            2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
            It sounds like you are saying "change is hard, so we shouldn't do it".
            I would argue that change is hard and so if we don't do it now it will
            never happen. StationXML is new enough that there is already a
            disruption, we should seize the chance. If we do not do something now
            about null loc ids, it will be a decade or two before we get another
            chance.

            It is time to drive the stake through the heart of null location ids.
            Kill the evil while we have a chance.

            Philip


            On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
            Hello Rob,

            Rob Newman wrote on 24.07.2014 18:51:

            For what it's worth, I would also vote for the "--" standard. To quote
            from the Zen of Python
            http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
            (my language of choice):


            "Beautiful is better than ugly.
            Explicit is better than implicit.
            Simple is better than complex.
            Complex is better than complicated.
            Flat is better than nested.
            Sparse is better than dense.
            Readability counts.
            Special cases aren't special enough to break the rules.
            Although practicality beats purity.
            Errors should never pass silently.
            Unless explicitly silenced."


            I'd add "Compatible is better than incompatible." :)


            Number 2 is especially relevant here:
            "Explicit is better than implicit."


            My favorite would be:

            "Special cases aren't special enough to break the rules."

            Quoted whitespace and nulls are painful. Code what you mean, and mean what
            you code. It's easier for everyone.


            But what if we simply *mean* "empty string"?

            The issue is not about beauty, pain or ease. It's about standard
            conformance. We already have a channel naming standard. If a new data format
            cannot accommodate existing channel naming, then the new format is flawed.
            But that's not even the case here...

            An XML document that contains

            <Channel locationCode="" ...

            is not malformed. There's an attribute that *explicitly* contains an empty
            string and a parser has to produce it as such. Not as null, nil or none, but
            as an empty string. Otherwise the parser is broken and needs to be fixed,
            not the data!

            Again: It's not about beauty. We all agree that current channel naming is
            not particularly beautiful and has limitations. But our business is not to
            try to solve that issue now and here.

            Cheers
            Joachim

            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices
            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices


            • Hi Marcelo,

              Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.

              I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.

              If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.

              If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.

              regards,
              Chad


              ----- Original Message -----
              From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
              To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
              Sent: Friday, July 25, 2014 7:38:17 PM
              Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?

              Hi Philip and All,

              I totaly agree with Joachim, was planning to answer but he was much
              faster. What you guys are proposing is not a solution. the station XML
              supports nicely the empty string and it is not null. There is a type
              difference here in Python and in any other language and can be nicely
              handled internally.

              Also the location id is not just a string it is a key entry to link
              miniseed to metadata and making an exception at this level just
              because a user interface cannot proper render it without ambiguity
              does not sounds like a proper way proposal. I am not favorable in
              creating an exception that will have to be carried over along the
              decades to come. Alternatives solutions for this issue should be
              searched on the end user interface.

              with my best regards,

              Marcelo Bianchi
              --


              2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
              It sounds like you are saying "change is hard, so we shouldn't do it".
              I would argue that change is hard and so if we don't do it now it will
              never happen. StationXML is new enough that there is already a
              disruption, we should seize the chance. If we do not do something now
              about null loc ids, it will be a decade or two before we get another
              chance.

              It is time to drive the stake through the heart of null location ids.
              Kill the evil while we have a chance.

              Philip


              On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
              Hello Rob,

              Rob Newman wrote on 24.07.2014 18:51:

              For what it's worth, I would also vote for the "--" standard. To quote
              from the Zen of Python
              http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
              (my language of choice):


              "Beautiful is better than ugly.
              Explicit is better than implicit.
              Simple is better than complex.
              Complex is better than complicated.
              Flat is better than nested.
              Sparse is better than dense.
              Readability counts.
              Special cases aren't special enough to break the rules.
              Although practicality beats purity.
              Errors should never pass silently.
              Unless explicitly silenced."


              I'd add "Compatible is better than incompatible." :)


              Number 2 is especially relevant here:
              "Explicit is better than implicit."


              My favorite would be:

              "Special cases aren't special enough to break the rules."

              Quoted whitespace and nulls are painful. Code what you mean, and mean what
              you code. It's easier for everyone.


              But what if we simply *mean* "empty string"?

              The issue is not about beauty, pain or ease. It's about standard
              conformance. We already have a channel naming standard. If a new data format
              cannot accommodate existing channel naming, then the new format is flawed.
              But that's not even the case here...

              An XML document that contains

              <Channel locationCode="" ...

              is not malformed. There's an attribute that *explicitly* contains an empty
              string and a parser has to produce it as such. Not as null, nil or none, but
              as an empty string. Otherwise the parser is broken and needs to be fixed,
              not the data!

              Again: It's not about beauty. We all agree that current channel naming is
              not particularly beautiful and has limitations. But our business is not to
              try to solve that issue now and here.

              Cheers
              Joachim

              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices
              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices
              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices

              • Hello all,

                Can someone give a concise statement of the original problem being
                discussed, it only or primarily a concern about XML?

                It seems to me that with modern languages a string that is empty or has
                1-N spaces is the same thing - there are often implicit or explicit
                trim() function hiding in a processing pipeline. A null string is not
                the same. So an empty or blank string is the same, valid location code,
                and null is undefined or uninitialized location code.

                With regards to the "--" pseudo for the location code, is this not
                needed because sometimes it is not possible or difficult to represent an
                empty string or even a string? For example on the command line or in a
                restful WS URI? (Or a URI on the command line!) So it may be that the
                use of "--" for intermediate processing and requests could be tolerated
                and somehow official, while empty or only-blanks strings official and
                for persistent data.

                Just my 0.02EUR = $0.0268

                Best regards to all,

                Anthony


                On 27/07/2014 04:52, Chad Trabant wrote:
                Hi Marcelo,

                Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.

                I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.

                If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.

                If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.

                regards,
                Chad


                ----- Original Message -----
                From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                Sent: Friday, July 25, 2014 7:38:17 PM
                Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?

                Hi Philip and All,

                I totaly agree with Joachim, was planning to answer but he was much
                faster. What you guys are proposing is not a solution. the station XML
                supports nicely the empty string and it is not null. There is a type
                difference here in Python and in any other language and can be nicely
                handled internally.

                Also the location id is not just a string it is a key entry to link
                miniseed to metadata and making an exception at this level just
                because a user interface cannot proper render it without ambiguity
                does not sounds like a proper way proposal. I am not favorable in
                creating an exception that will have to be carried over along the
                decades to come. Alternatives solutions for this issue should be
                searched on the end user interface.

                with my best regards,

                Marcelo Bianchi
                --


                2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
                It sounds like you are saying "change is hard, so we shouldn't do it".
                I would argue that change is hard and so if we don't do it now it will
                never happen. StationXML is new enough that there is already a
                disruption, we should seize the chance. If we do not do something now
                about null loc ids, it will be a decade or two before we get another
                chance.

                It is time to drive the stake through the heart of null location ids.
                Kill the evil while we have a chance.

                Philip


                On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                Hello Rob,

                Rob Newman wrote on 24.07.2014 18:51:
                For what it's worth, I would also vote for the "--" standard. To quote
                from the Zen of Python
                http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                (my language of choice):


                "Beautiful is better than ugly.
                Explicit is better than implicit.
                Simple is better than complex.
                Complex is better than complicated.
                Flat is better than nested.
                Sparse is better than dense.
                Readability counts.
                Special cases aren't special enough to break the rules.
                Although practicality beats purity.
                Errors should never pass silently.
                Unless explicitly silenced."

                I'd add "Compatible is better than incompatible." :)


                Number 2 is especially relevant here:
                "Explicit is better than implicit."

                My favorite would be:

                "Special cases aren't special enough to break the rules."

                Quoted whitespace and nulls are painful. Code what you mean, and mean what
                you code. It's easier for everyone.

                But what if we simply *mean* "empty string"?

                The issue is not about beauty, pain or ease. It's about standard
                conformance. We already have a channel naming standard. If a new data format
                cannot accommodate existing channel naming, then the new format is flawed.
                But that's not even the case here...

                An XML document that contains

                <Channel locationCode="" ...

                is not malformed. There's an attribute that *explicitly* contains an empty
                string and a parser has to produce it as such. Not as null, nil or none, but
                as an empty string. Otherwise the parser is broken and needs to be fixed,
                not the data!

                Again: It's not about beauty. We all agree that current channel naming is
                not particularly beautiful and has limitations. But our business is not to
                try to solve that issue now and here.

                Cheers
                Joachim

                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices
                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices
                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices
                _______________________________________________
                webservices mailing list
                webservices<at>iris.washington.edu
                http://www.iris.washington.edu/mailman/listinfo/webservices


                --
                Sent from my iClayTablet

                ------------------------------------------------------------------------

                *Anthony Lomax*
                *161 Allée du Micocoulier, 06370 Mouans-Sartoux, France*
                *tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net
                <anthony<at>alomax.net> web: http://www.alomax.net
                http://www.alomax.net/ *

                *Twitter: * *@ALomaxNet http://twitter.com/ALomaxNet*
                *Science & Special Topics: * *http://www.alomax.net/science*
                *Software: * *http://www.alomax.net/software* *- updates: *
                *https://twitter.com/ALomaxNet*
                ------------------------------------------------------------------------

                • Hi

                  Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                  make a stab at the underlying issue. :)

                  Here, with lots of stuff cut out, is how a channel is "identified" in
                  stationXML via the fdsn station web service at the IRIS DMC,
                  http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                  <Network code="GE" >
                  <Station code="UGM">
                  <Channel locationCode=" " code="BHZ">

                  Another implementation of the same web service (not sure of url) gives
                  back this:

                  <Network code="GE" >
                  <Station code="UGM">
                  <Channel locationCode="" code="BHZ">

                  with locationCode="" vs =" " being the difference under consideration.

                  There are two basic issues being discussed (and yes, more beer would help! :)

                  1) Should all valid stationXML documents be required to use the exact
                  same string of characters to represent the location id for this
                  channel. This is would allow a comparison operation to be "simple" in
                  that it can compare the attribute values without additional
                  processing.

                  2) If we agree to 1), then what should those exact characters be? The
                  current top choices are
                  a) empty=""
                  b) two spaces=" "
                  c) two dashes="--".

                  1) seems less controversial than 2) in that greater compatibility is
                  generally seen as positive.

                  This is primarily a question about the form of the stationXML
                  documents, but obviously there are connections to the way requests are
                  formed, the relationship to miniseed/seed, the way things are coded in
                  software and how much detailed understanding we expect of end users.

                  Philip



                  On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                  Hello all,

                  Can someone give a concise statement of the original problem being
                  discussed, it only or primarily a concern about XML?

                  It seems to me that with modern languages a string that is empty or has 1-N
                  spaces is the same thing - there are often implicit or explicit trim()
                  function hiding in a processing pipeline. A null string is not the same.
                  So an empty or blank string is the same, valid location code, and null is
                  undefined or uninitialized location code.

                  With regards to the "--" pseudo for the location code, is this not needed
                  because sometimes it is not possible or difficult to represent an empty
                  string or even a string? For example on the command line or in a restful WS
                  URI? (Or a URI on the command line!) So it may be that the use of "--" for
                  intermediate processing and requests could be tolerated and somehow
                  official, while empty or only-blanks strings official and for persistent
                  data.

                  Just my 0.02€ = $0.0268

                  Best regards to all,

                  Anthony



                  On 27/07/2014 04:52, Chad Trabant wrote:

                  Hi Marcelo,

                  Thanks for your thoughts as well. Something that you and Joachim are not
                  addressing are the concerns about an empty ID that have been brought up by
                  more than one person. The answer that empty strings are technically
                  possible and it all works in Python/SeisComP is less than satisfying. The
                  observations from Python, ObsPy and SeisComP are a few of many that need to
                  be taken into account.

                  I agree that there is a long tail consideration for the "--" location ID
                  solution. Understand that some folks find an empty ID to be problematic
                  regardless of whether it is XML, SEED, text, whatever, then you might see
                  where this proposal comes from. Yes, we would need to treat empty location
                  IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                  you will need to map empty IDs to empty strings, NULL and whatever an XML
                  parser might or might not produce for a long time as well (think beyond
                  Python and SeisComP). Either is possible, only one of them is a unique
                  mapping.

                  If the main considerations are for the least amount of disruption the the
                  answer is obvious to me: the FDSN can sanction that the two-space string is
                  the XML synonym for the empty SEED location ID and we adjust the schema to
                  make sure a string of whitespaces is preserved. Then SeisComP can change
                  its relatively new StationXML implementation and ALL existing clients will
                  be compatible with all metadata and, mostly importantly, we would have
                  consistent metadata.

                  If the empty string ID representation is adopted it would would, in effect,
                  mean that the DMC would need to change its metadata service and (more
                  importantly) all users of the DMC's metadata service would need to
                  transition to a new metadata channel naming scheme. This is certainly not
                  out of the question, but it is not something we would do without careful
                  consideration. I do not find the two-space strings all that great, but they
                  are here and something the DMC and users of the DMC have dealt with. Issues
                  have been identified with empty location IDs by us and our users. If DMC is
                  going to change, and push the change on all users of the DMC's StationXML,
                  it would be much more compelling to have a solution that addresses the low
                  level issues.

                  regards,
                  Chad


                  ----- Original Message -----
                  From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                  To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                  Sent: Friday, July 25, 2014 7:38:17 PM
                  Subject: Re: [webservices] A question of location ID, how to represent empty
                  IDs in XML?

                  Hi Philip and All,

                  I totaly agree with Joachim, was planning to answer but he was much
                  faster. What you guys are proposing is not a solution. the station XML
                  supports nicely the empty string and it is not null. There is a type
                  difference here in Python and in any other language and can be nicely
                  handled internally.

                  Also the location id is not just a string it is a key entry to link
                  miniseed to metadata and making an exception at this level just
                  because a user interface cannot proper render it without ambiguity
                  does not sounds like a proper way proposal. I am not favorable in
                  creating an exception that will have to be carried over along the
                  decades to come. Alternatives solutions for this issue should be
                  searched on the end user interface.

                  with my best regards,

                  Marcelo Bianchi
                  --


                  2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                  It sounds like you are saying "change is hard, so we shouldn't do it".
                  I would argue that change is hard and so if we don't do it now it will
                  never happen. StationXML is new enough that there is already a
                  disruption, we should seize the chance. If we do not do something now
                  about null loc ids, it will be a decade or two before we get another
                  chance.

                  It is time to drive the stake through the heart of null location ids.
                  Kill the evil while we have a chance.

                  Philip


                  On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                  Hello Rob,

                  Rob Newman wrote on 24.07.2014 18:51:

                  For what it's worth, I would also vote for the "--" standard. To quote
                  from the Zen of Python
                  http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                  (my language of choice):


                  "Beautiful is better than ugly.
                  Explicit is better than implicit.
                  Simple is better than complex.
                  Complex is better than complicated.
                  Flat is better than nested.
                  Sparse is better than dense.
                  Readability counts.
                  Special cases aren't special enough to break the rules.
                  Although practicality beats purity.
                  Errors should never pass silently.
                  Unless explicitly silenced."

                  I'd add "Compatible is better than incompatible." :)


                  Number 2 is especially relevant here:
                  "Explicit is better than implicit."

                  My favorite would be:

                  "Special cases aren't special enough to break the rules."

                  Quoted whitespace and nulls are painful. Code what you mean, and mean what
                  you code. It's easier for everyone.

                  But what if we simply *mean* "empty string"?

                  The issue is not about beauty, pain or ease. It's about standard
                  conformance. We already have a channel naming standard. If a new data format
                  cannot accommodate existing channel naming, then the new format is flawed.
                  But that's not even the case here...

                  An XML document that contains

                  <Channel locationCode="" ...

                  is not malformed. There's an attribute that *explicitly* contains an empty
                  string and a parser has to produce it as such. Not as null, nil or none, but
                  as an empty string. Otherwise the parser is broken and needs to be fixed,
                  not the data!

                  Again: It's not about beauty. We all agree that current channel naming is
                  not particularly beautiful and has limitations. But our business is not to
                  try to solve that issue now and here.

                  Cheers
                  Joachim

                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices

                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices

                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices
                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices


                  --
                  Sent from my iClayTablet

                  ________________________________

                  Anthony Lomax
                  161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                  tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                  http://www.alomax.net

                  Twitter: @ALomaxNet
                  Science & Special Topics: http://www.alomax.net/science
                  Software: http://www.alomax.net/software - updates:
                  https://twitter.com/ALomaxNet
                  ________________________________

                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices



                  • One more thing is that this is not something that we can resolve based
                    on the XML spec as all three variations are well-formed and can be
                    valid XML depending on the schema.

                    There is another issue in that white space in xml attributes can be
                    normalized by the parsers, but this behavior is not standard across
                    all parsers, so dealing with attributes that are not limited to
                    non-whitespace characters means that you likely have to consider
                    empth, one space and two spaces, and even N spaces as all being
                    equivalent. Depending on the parser, you may be able to have this
                    handled for you, or you may have to code explicitly for the cases.

                    I think per the xml spec, even these two are considered "the same" as well:
                    locationCode="
                    "
                    locationCode="

                    "
                    as newlines in attributes can be normalized to whilespace on parsing.
                    But again, exactly how it is done depends on the parser.

                    Philip

                    PS I am NOT advocating we choose newline-newline as the default
                    location id!!! :)



                    On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                    Hi

                    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                    make a stab at the underlying issue. :)

                    Here, with lots of stuff cut out, is how a channel is "identified" in
                    stationXML via the fdsn station web service at the IRIS DMC,
                    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode=" " code="BHZ">

                    Another implementation of the same web service (not sure of url) gives
                    back this:

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode="" code="BHZ">

                    with locationCode="" vs =" " being the difference under consideration.

                    There are two basic issues being discussed (and yes, more beer would help! :)

                    1) Should all valid stationXML documents be required to use the exact
                    same string of characters to represent the location id for this
                    channel. This is would allow a comparison operation to be "simple" in
                    that it can compare the attribute values without additional
                    processing.

                    2) If we agree to 1), then what should those exact characters be? The
                    current top choices are
                    a) empty=""
                    b) two spaces=" "
                    c) two dashes="--".

                    1) seems less controversial than 2) in that greater compatibility is
                    generally seen as positive.

                    This is primarily a question about the form of the stationXML
                    documents, but obviously there are connections to the way requests are
                    formed, the relationship to miniseed/seed, the way things are coded in
                    software and how much detailed understanding we expect of end users.

                    Philip



                    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                    Hello all,

                    Can someone give a concise statement of the original problem being
                    discussed, it only or primarily a concern about XML?

                    It seems to me that with modern languages a string that is empty or has 1-N
                    spaces is the same thing - there are often implicit or explicit trim()
                    function hiding in a processing pipeline. A null string is not the same.
                    So an empty or blank string is the same, valid location code, and null is
                    undefined or uninitialized location code.

                    With regards to the "--" pseudo for the location code, is this not needed
                    because sometimes it is not possible or difficult to represent an empty
                    string or even a string? For example on the command line or in a restful WS
                    URI? (Or a URI on the command line!) So it may be that the use of "--" for
                    intermediate processing and requests could be tolerated and somehow
                    official, while empty or only-blanks strings official and for persistent
                    data.

                    Just my 0.02€ = $0.0268

                    Best regards to all,

                    Anthony



                    On 27/07/2014 04:52, Chad Trabant wrote:

                    Hi Marcelo,

                    Thanks for your thoughts as well. Something that you and Joachim are not
                    addressing are the concerns about an empty ID that have been brought up by
                    more than one person. The answer that empty strings are technically
                    possible and it all works in Python/SeisComP is less than satisfying. The
                    observations from Python, ObsPy and SeisComP are a few of many that need to
                    be taken into account.

                    I agree that there is a long tail consideration for the "--" location ID
                    solution. Understand that some folks find an empty ID to be problematic
                    regardless of whether it is XML, SEED, text, whatever, then you might see
                    where this proposal comes from. Yes, we would need to treat empty location
                    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                    you will need to map empty IDs to empty strings, NULL and whatever an XML
                    parser might or might not produce for a long time as well (think beyond
                    Python and SeisComP). Either is possible, only one of them is a unique
                    mapping.

                    If the main considerations are for the least amount of disruption the the
                    answer is obvious to me: the FDSN can sanction that the two-space string is
                    the XML synonym for the empty SEED location ID and we adjust the schema to
                    make sure a string of whitespaces is preserved. Then SeisComP can change
                    its relatively new StationXML implementation and ALL existing clients will
                    be compatible with all metadata and, mostly importantly, we would have
                    consistent metadata.

                    If the empty string ID representation is adopted it would would, in effect,
                    mean that the DMC would need to change its metadata service and (more
                    importantly) all users of the DMC's metadata service would need to
                    transition to a new metadata channel naming scheme. This is certainly not
                    out of the question, but it is not something we would do without careful
                    consideration. I do not find the two-space strings all that great, but they
                    are here and something the DMC and users of the DMC have dealt with. Issues
                    have been identified with empty location IDs by us and our users. If DMC is
                    going to change, and push the change on all users of the DMC's StationXML,
                    it would be much more compelling to have a solution that addresses the low
                    level issues.

                    regards,
                    Chad


                    ----- Original Message -----
                    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                    Sent: Friday, July 25, 2014 7:38:17 PM
                    Subject: Re: [webservices] A question of location ID, how to represent empty
                    IDs in XML?

                    Hi Philip and All,

                    I totaly agree with Joachim, was planning to answer but he was much
                    faster. What you guys are proposing is not a solution. the station XML
                    supports nicely the empty string and it is not null. There is a type
                    difference here in Python and in any other language and can be nicely
                    handled internally.

                    Also the location id is not just a string it is a key entry to link
                    miniseed to metadata and making an exception at this level just
                    because a user interface cannot proper render it without ambiguity
                    does not sounds like a proper way proposal. I am not favorable in
                    creating an exception that will have to be carried over along the
                    decades to come. Alternatives solutions for this issue should be
                    searched on the end user interface.

                    with my best regards,

                    Marcelo Bianchi
                    --


                    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                    It sounds like you are saying "change is hard, so we shouldn't do it".
                    I would argue that change is hard and so if we don't do it now it will
                    never happen. StationXML is new enough that there is already a
                    disruption, we should seize the chance. If we do not do something now
                    about null loc ids, it will be a decade or two before we get another
                    chance.

                    It is time to drive the stake through the heart of null location ids.
                    Kill the evil while we have a chance.

                    Philip


                    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                    Hello Rob,

                    Rob Newman wrote on 24.07.2014 18:51:

                    For what it's worth, I would also vote for the "--" standard. To quote
                    from the Zen of Python
                    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                    (my language of choice):


                    "Beautiful is better than ugly.
                    Explicit is better than implicit.
                    Simple is better than complex.
                    Complex is better than complicated.
                    Flat is better than nested.
                    Sparse is better than dense.
                    Readability counts.
                    Special cases aren't special enough to break the rules.
                    Although practicality beats purity.
                    Errors should never pass silently.
                    Unless explicitly silenced."

                    I'd add "Compatible is better than incompatible." :)


                    Number 2 is especially relevant here:
                    "Explicit is better than implicit."

                    My favorite would be:

                    "Special cases aren't special enough to break the rules."

                    Quoted whitespace and nulls are painful. Code what you mean, and mean what
                    you code. It's easier for everyone.

                    But what if we simply *mean* "empty string"?

                    The issue is not about beauty, pain or ease. It's about standard
                    conformance. We already have a channel naming standard. If a new data format
                    cannot accommodate existing channel naming, then the new format is flawed.
                    But that's not even the case here...

                    An XML document that contains

                    <Channel locationCode="" ...

                    is not malformed. There's an attribute that *explicitly* contains an empty
                    string and a parser has to produce it as such. Not as null, nil or none, but
                    as an empty string. Otherwise the parser is broken and needs to be fixed,
                    not the data!

                    Again: It's not about beauty. We all agree that current channel naming is
                    not particularly beautiful and has limitations. But our business is not to
                    try to solve that issue now and here.

                    Cheers
                    Joachim

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices
                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    --
                    Sent from my iClayTablet

                    ________________________________

                    Anthony Lomax
                    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                    http://www.alomax.net

                    Twitter: @ALomaxNet
                    Science & Special Topics: http://www.alomax.net/science
                    Software: http://www.alomax.net/software - updates:
                    https://twitter.com/ALomaxNet
                    ________________________________

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices



                    • The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.

                      http://www.w3.org/TR/REC-xml/#AVNormalize

                      I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.

                      Lion


                      On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                      One more thing is that this is not something that we can resolve based
                      on the XML spec as all three variations are well-formed and can be
                      valid XML depending on the schema.

                      There is another issue in that white space in xml attributes can be
                      normalized by the parsers, but this behavior is not standard across
                      all parsers, so dealing with attributes that are not limited to
                      non-whitespace characters means that you likely have to consider
                      empth, one space and two spaces, and even N spaces as all being
                      equivalent. Depending on the parser, you may be able to have this
                      handled for you, or you may have to code explicitly for the cases.

                      I think per the xml spec, even these two are considered "the same" as well:
                      locationCode="
                      "
                      locationCode="

                      "
                      as newlines in attributes can be normalized to whilespace on parsing.
                      But again, exactly how it is done depends on the parser.

                      Philip

                      PS I am NOT advocating we choose newline-newline as the default
                      location id!!! :)



                      On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                      Hi

                      Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                      make a stab at the underlying issue. :)

                      Here, with lots of stuff cut out, is how a channel is "identified" in
                      stationXML via the fdsn station web service at the IRIS DMC,
                      http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                      <Network code="GE" >
                      <Station code="UGM">
                      <Channel locationCode=" " code="BHZ">

                      Another implementation of the same web service (not sure of url) gives
                      back this:

                      <Network code="GE" >
                      <Station code="UGM">
                      <Channel locationCode="" code="BHZ">

                      with locationCode="" vs =" " being the difference under consideration.

                      There are two basic issues being discussed (and yes, more beer would help! :)

                      1) Should all valid stationXML documents be required to use the exact
                      same string of characters to represent the location id for this
                      channel. This is would allow a comparison operation to be "simple" in
                      that it can compare the attribute values without additional
                      processing.

                      2) If we agree to 1), then what should those exact characters be? The
                      current top choices are
                      a) empty=""
                      b) two spaces=" "
                      c) two dashes="--".

                      1) seems less controversial than 2) in that greater compatibility is
                      generally seen as positive.

                      This is primarily a question about the form of the stationXML
                      documents, but obviously there are connections to the way requests are
                      formed, the relationship to miniseed/seed, the way things are coded in
                      software and how much detailed understanding we expect of end users.

                      Philip



                      On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                      Hello all,

                      Can someone give a concise statement of the original problem being
                      discussed, it only or primarily a concern about XML?

                      It seems to me that with modern languages a string that is empty or has 1-N
                      spaces is the same thing - there are often implicit or explicit trim()
                      function hiding in a processing pipeline. A null string is not the same.
                      So an empty or blank string is the same, valid location code, and null is
                      undefined or uninitialized location code.

                      With regards to the "--" pseudo for the location code, is this not needed
                      because sometimes it is not possible or difficult to represent an empty
                      string or even a string? For example on the command line or in a restful WS
                      URI? (Or a URI on the command line!) So it may be that the use of "--" for
                      intermediate processing and requests could be tolerated and somehow
                      official, while empty or only-blanks strings official and for persistent
                      data.

                      Just my 0.02€ = $0.0268

                      Best regards to all,

                      Anthony



                      On 27/07/2014 04:52, Chad Trabant wrote:

                      Hi Marcelo,

                      Thanks for your thoughts as well. Something that you and Joachim are not
                      addressing are the concerns about an empty ID that have been brought up by
                      more than one person. The answer that empty strings are technically
                      possible and it all works in Python/SeisComP is less than satisfying. The
                      observations from Python, ObsPy and SeisComP are a few of many that need to
                      be taken into account.

                      I agree that there is a long tail consideration for the "--" location ID
                      solution. Understand that some folks find an empty ID to be problematic
                      regardless of whether it is XML, SEED, text, whatever, then you might see
                      where this proposal comes from. Yes, we would need to treat empty location
                      IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                      you will need to map empty IDs to empty strings, NULL and whatever an XML
                      parser might or might not produce for a long time as well (think beyond
                      Python and SeisComP). Either is possible, only one of them is a unique
                      mapping.

                      If the main considerations are for the least amount of disruption the the
                      answer is obvious to me: the FDSN can sanction that the two-space string is
                      the XML synonym for the empty SEED location ID and we adjust the schema to
                      make sure a string of whitespaces is preserved. Then SeisComP can change
                      its relatively new StationXML implementation and ALL existing clients will
                      be compatible with all metadata and, mostly importantly, we would have
                      consistent metadata.

                      If the empty string ID representation is adopted it would would, in effect,
                      mean that the DMC would need to change its metadata service and (more
                      importantly) all users of the DMC's metadata service would need to
                      transition to a new metadata channel naming scheme. This is certainly not
                      out of the question, but it is not something we would do without careful
                      consideration. I do not find the two-space strings all that great, but they
                      are here and something the DMC and users of the DMC have dealt with. Issues
                      have been identified with empty location IDs by us and our users. If DMC is
                      going to change, and push the change on all users of the DMC's StationXML,
                      it would be much more compelling to have a solution that addresses the low
                      level issues.

                      regards,
                      Chad


                      ----- Original Message -----
                      From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                      To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                      Sent: Friday, July 25, 2014 7:38:17 PM
                      Subject: Re: [webservices] A question of location ID, how to represent empty
                      IDs in XML?

                      Hi Philip and All,

                      I totaly agree with Joachim, was planning to answer but he was much
                      faster. What you guys are proposing is not a solution. the station XML
                      supports nicely the empty string and it is not null. There is a type
                      difference here in Python and in any other language and can be nicely
                      handled internally.

                      Also the location id is not just a string it is a key entry to link
                      miniseed to metadata and making an exception at this level just
                      because a user interface cannot proper render it without ambiguity
                      does not sounds like a proper way proposal. I am not favorable in
                      creating an exception that will have to be carried over along the
                      decades to come. Alternatives solutions for this issue should be
                      searched on the end user interface.

                      with my best regards,

                      Marcelo Bianchi
                      --


                      2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                      It sounds like you are saying "change is hard, so we shouldn't do it".
                      I would argue that change is hard and so if we don't do it now it will
                      never happen. StationXML is new enough that there is already a
                      disruption, we should seize the chance. If we do not do something now
                      about null loc ids, it will be a decade or two before we get another
                      chance.

                      It is time to drive the stake through the heart of null location ids.
                      Kill the evil while we have a chance.

                      Philip


                      On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                      Hello Rob,

                      Rob Newman wrote on 24.07.2014 18:51:

                      For what it's worth, I would also vote for the "--" standard. To quote
                      from the Zen of Python
                      http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                      (my language of choice):


                      "Beautiful is better than ugly.
                      Explicit is better than implicit.
                      Simple is better than complex.
                      Complex is better than complicated.
                      Flat is better than nested.
                      Sparse is better than dense.
                      Readability counts.
                      Special cases aren't special enough to break the rules.
                      Although practicality beats purity.
                      Errors should never pass silently.
                      Unless explicitly silenced."

                      I'd add "Compatible is better than incompatible." :)


                      Number 2 is especially relevant here:
                      "Explicit is better than implicit."

                      My favorite would be:

                      "Special cases aren't special enough to break the rules."

                      Quoted whitespace and nulls are painful. Code what you mean, and mean what
                      you code. It's easier for everyone.

                      But what if we simply *mean* "empty string"?

                      The issue is not about beauty, pain or ease. It's about standard
                      conformance. We already have a channel naming standard. If a new data format
                      cannot accommodate existing channel naming, then the new format is flawed.
                      But that's not even the case here...

                      An XML document that contains

                      <Channel locationCode="" ...

                      is not malformed. There's an attribute that *explicitly* contains an empty
                      string and a parser has to produce it as such. Not as null, nil or none, but
                      as an empty string. Otherwise the parser is broken and needs to be fixed,
                      not the data!

                      Again: It's not about beauty. We all agree that current channel naming is
                      not particularly beautiful and has limitations. But our business is not to
                      try to solve that issue now and here.

                      Cheers
                      Joachim

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices
                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices


                      --
                      Sent from my iClayTablet

                      ________________________________

                      Anthony Lomax
                      161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                      tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                      http://www.alomax.net

                      Twitter: @ALomaxNet
                      Science & Special Topics: http://www.alomax.net/science
                      Software: http://www.alomax.net/software - updates:
                      https://twitter.com/ALomaxNet
                      ________________________________

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices


                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices



                      • That spec also says:
                        If the attribute type is not CDATA, then the XML processor MUST
                        further process the normalized attribute value by discarding any
                        leading and trailing space (#x20) characters, and by replacing
                        sequences of space (#x20) characters by a single space (#x20)
                        character.

                        So, by this you should always end up with an empty string even if you
                        have two or more spaces. My experience with parsers is that this does
                        not happen, but since it is in the spec it could. You mileage may
                        vary...

                        Philip


                        On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
                        <krischer<at>geophysik.uni-muenchen.de> wrote:
                        The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.

                        http://www.w3.org/TR/REC-xml/#AVNormalize

                        I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.

                        Lion


                        On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                        One more thing is that this is not something that we can resolve based
                        on the XML spec as all three variations are well-formed and can be
                        valid XML depending on the schema.

                        There is another issue in that white space in xml attributes can be
                        normalized by the parsers, but this behavior is not standard across
                        all parsers, so dealing with attributes that are not limited to
                        non-whitespace characters means that you likely have to consider
                        empth, one space and two spaces, and even N spaces as all being
                        equivalent. Depending on the parser, you may be able to have this
                        handled for you, or you may have to code explicitly for the cases.

                        I think per the xml spec, even these two are considered "the same" as well:
                        locationCode="
                        "
                        locationCode="

                        "
                        as newlines in attributes can be normalized to whilespace on parsing.
                        But again, exactly how it is done depends on the parser.

                        Philip

                        PS I am NOT advocating we choose newline-newline as the default
                        location id!!! :)



                        On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                        Hi

                        Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                        make a stab at the underlying issue. :)

                        Here, with lots of stuff cut out, is how a channel is "identified" in
                        stationXML via the fdsn station web service at the IRIS DMC,
                        http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                        <Network code="GE" >
                        <Station code="UGM">
                        <Channel locationCode=" " code="BHZ">

                        Another implementation of the same web service (not sure of url) gives
                        back this:

                        <Network code="GE" >
                        <Station code="UGM">
                        <Channel locationCode="" code="BHZ">

                        with locationCode="" vs =" " being the difference under consideration.

                        There are two basic issues being discussed (and yes, more beer would help! :)

                        1) Should all valid stationXML documents be required to use the exact
                        same string of characters to represent the location id for this
                        channel. This is would allow a comparison operation to be "simple" in
                        that it can compare the attribute values without additional
                        processing.

                        2) If we agree to 1), then what should those exact characters be? The
                        current top choices are
                        a) empty=""
                        b) two spaces=" "
                        c) two dashes="--".

                        1) seems less controversial than 2) in that greater compatibility is
                        generally seen as positive.

                        This is primarily a question about the form of the stationXML
                        documents, but obviously there are connections to the way requests are
                        formed, the relationship to miniseed/seed, the way things are coded in
                        software and how much detailed understanding we expect of end users.

                        Philip



                        On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                        Hello all,

                        Can someone give a concise statement of the original problem being
                        discussed, it only or primarily a concern about XML?

                        It seems to me that with modern languages a string that is empty or has 1-N
                        spaces is the same thing - there are often implicit or explicit trim()
                        function hiding in a processing pipeline. A null string is not the same.
                        So an empty or blank string is the same, valid location code, and null is
                        undefined or uninitialized location code.

                        With regards to the "--" pseudo for the location code, is this not needed
                        because sometimes it is not possible or difficult to represent an empty
                        string or even a string? For example on the command line or in a restful WS
                        URI? (Or a URI on the command line!) So it may be that the use of "--" for
                        intermediate processing and requests could be tolerated and somehow
                        official, while empty or only-blanks strings official and for persistent
                        data.

                        Just my 0.02€ = $0.0268

                        Best regards to all,

                        Anthony



                        On 27/07/2014 04:52, Chad Trabant wrote:

                        Hi Marcelo,

                        Thanks for your thoughts as well. Something that you and Joachim are not
                        addressing are the concerns about an empty ID that have been brought up by
                        more than one person. The answer that empty strings are technically
                        possible and it all works in Python/SeisComP is less than satisfying. The
                        observations from Python, ObsPy and SeisComP are a few of many that need to
                        be taken into account.

                        I agree that there is a long tail consideration for the "--" location ID
                        solution. Understand that some folks find an empty ID to be problematic
                        regardless of whether it is XML, SEED, text, whatever, then you might see
                        where this proposal comes from. Yes, we would need to treat empty location
                        IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                        you will need to map empty IDs to empty strings, NULL and whatever an XML
                        parser might or might not produce for a long time as well (think beyond
                        Python and SeisComP). Either is possible, only one of them is a unique
                        mapping.

                        If the main considerations are for the least amount of disruption the the
                        answer is obvious to me: the FDSN can sanction that the two-space string is
                        the XML synonym for the empty SEED location ID and we adjust the schema to
                        make sure a string of whitespaces is preserved. Then SeisComP can change
                        its relatively new StationXML implementation and ALL existing clients will
                        be compatible with all metadata and, mostly importantly, we would have
                        consistent metadata.

                        If the empty string ID representation is adopted it would would, in effect,
                        mean that the DMC would need to change its metadata service and (more
                        importantly) all users of the DMC's metadata service would need to
                        transition to a new metadata channel naming scheme. This is certainly not
                        out of the question, but it is not something we would do without careful
                        consideration. I do not find the two-space strings all that great, but they
                        are here and something the DMC and users of the DMC have dealt with. Issues
                        have been identified with empty location IDs by us and our users. If DMC is
                        going to change, and push the change on all users of the DMC's StationXML,
                        it would be much more compelling to have a solution that addresses the low
                        level issues.

                        regards,
                        Chad


                        ----- Original Message -----
                        From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                        To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                        Sent: Friday, July 25, 2014 7:38:17 PM
                        Subject: Re: [webservices] A question of location ID, how to represent empty
                        IDs in XML?

                        Hi Philip and All,

                        I totaly agree with Joachim, was planning to answer but he was much
                        faster. What you guys are proposing is not a solution. the station XML
                        supports nicely the empty string and it is not null. There is a type
                        difference here in Python and in any other language and can be nicely
                        handled internally.

                        Also the location id is not just a string it is a key entry to link
                        miniseed to metadata and making an exception at this level just
                        because a user interface cannot proper render it without ambiguity
                        does not sounds like a proper way proposal. I am not favorable in
                        creating an exception that will have to be carried over along the
                        decades to come. Alternatives solutions for this issue should be
                        searched on the end user interface.

                        with my best regards,

                        Marcelo Bianchi
                        --


                        2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                        It sounds like you are saying "change is hard, so we shouldn't do it".
                        I would argue that change is hard and so if we don't do it now it will
                        never happen. StationXML is new enough that there is already a
                        disruption, we should seize the chance. If we do not do something now
                        about null loc ids, it will be a decade or two before we get another
                        chance.

                        It is time to drive the stake through the heart of null location ids.
                        Kill the evil while we have a chance.

                        Philip


                        On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                        Hello Rob,

                        Rob Newman wrote on 24.07.2014 18:51:

                        For what it's worth, I would also vote for the "--" standard. To quote
                        from the Zen of Python
                        http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                        (my language of choice):


                        "Beautiful is better than ugly.
                        Explicit is better than implicit.
                        Simple is better than complex.
                        Complex is better than complicated.
                        Flat is better than nested.
                        Sparse is better than dense.
                        Readability counts.
                        Special cases aren't special enough to break the rules.
                        Although practicality beats purity.
                        Errors should never pass silently.
                        Unless explicitly silenced."

                        I'd add "Compatible is better than incompatible." :)


                        Number 2 is especially relevant here:
                        "Explicit is better than implicit."

                        My favorite would be:

                        "Special cases aren't special enough to break the rules."

                        Quoted whitespace and nulls are painful. Code what you mean, and mean what
                        you code. It's easier for everyone.

                        But what if we simply *mean* "empty string"?

                        The issue is not about beauty, pain or ease. It's about standard
                        conformance. We already have a channel naming standard. If a new data format
                        cannot accommodate existing channel naming, then the new format is flawed.
                        But that's not even the case here...

                        An XML document that contains

                        <Channel locationCode="" ...

                        is not malformed. There's an attribute that *explicitly* contains an empty
                        string and a parser has to produce it as such. Not as null, nil or none, but
                        as an empty string. Otherwise the parser is broken and needs to be fixed,
                        not the data!

                        Again: It's not about beauty. We all agree that current channel naming is
                        not particularly beautiful and has limitations. But our business is not to
                        try to solve that issue now and here.

                        Cheers
                        Joachim

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices
                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        --
                        Sent from my iClayTablet

                        ________________________________

                        Anthony Lomax
                        161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                        tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                        http://www.alomax.net

                        Twitter: @ALomaxNet
                        Science & Special Topics: http://www.alomax.net/science
                        Software: http://www.alomax.net/software - updates:
                        https://twitter.com/ALomaxNet
                        ________________________________

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        • Well in that case the only sensible solution seems to be to use an empty string to encode an empty location.

                          Lion


                          On 28 Jul 2014, at 16:48, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                          That spec also says:
                          If the attribute type is not CDATA, then the XML processor MUST
                          further process the normalized attribute value by discarding any
                          leading and trailing space (#x20) characters, and by replacing
                          sequences of space (#x20) characters by a single space (#x20)
                          character.

                          So, by this you should always end up with an empty string even if you
                          have two or more spaces. My experience with parsers is that this does
                          not happen, but since it is in the spec it could. You mileage may
                          vary...

                          Philip


                          On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
                          <krischer<at>geophysik.uni-muenchen.de> wrote:
                          The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.

                          http://www.w3.org/TR/REC-xml/#AVNormalize

                          I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.

                          Lion


                          On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                          One more thing is that this is not something that we can resolve based
                          on the XML spec as all three variations are well-formed and can be
                          valid XML depending on the schema.

                          There is another issue in that white space in xml attributes can be
                          normalized by the parsers, but this behavior is not standard across
                          all parsers, so dealing with attributes that are not limited to
                          non-whitespace characters means that you likely have to consider
                          empth, one space and two spaces, and even N spaces as all being
                          equivalent. Depending on the parser, you may be able to have this
                          handled for you, or you may have to code explicitly for the cases.

                          I think per the xml spec, even these two are considered "the same" as well:
                          locationCode="
                          "
                          locationCode="

                          "
                          as newlines in attributes can be normalized to whilespace on parsing.
                          But again, exactly how it is done depends on the parser.

                          Philip

                          PS I am NOT advocating we choose newline-newline as the default
                          location id!!! :)



                          On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                          Hi

                          Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                          make a stab at the underlying issue. :)

                          Here, with lots of stuff cut out, is how a channel is "identified" in
                          stationXML via the fdsn station web service at the IRIS DMC,
                          http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                          <Network code="GE" >
                          <Station code="UGM">
                          <Channel locationCode=" " code="BHZ">

                          Another implementation of the same web service (not sure of url) gives
                          back this:

                          <Network code="GE" >
                          <Station code="UGM">
                          <Channel locationCode="" code="BHZ">

                          with locationCode="" vs =" " being the difference under consideration.

                          There are two basic issues being discussed (and yes, more beer would help! :)

                          1) Should all valid stationXML documents be required to use the exact
                          same string of characters to represent the location id for this
                          channel. This is would allow a comparison operation to be "simple" in
                          that it can compare the attribute values without additional
                          processing.

                          2) If we agree to 1), then what should those exact characters be? The
                          current top choices are
                          a) empty=""
                          b) two spaces=" "
                          c) two dashes="--".

                          1) seems less controversial than 2) in that greater compatibility is
                          generally seen as positive.

                          This is primarily a question about the form of the stationXML
                          documents, but obviously there are connections to the way requests are
                          formed, the relationship to miniseed/seed, the way things are coded in
                          software and how much detailed understanding we expect of end users.

                          Philip



                          On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                          Hello all,

                          Can someone give a concise statement of the original problem being
                          discussed, it only or primarily a concern about XML?

                          It seems to me that with modern languages a string that is empty or has 1-N
                          spaces is the same thing - there are often implicit or explicit trim()
                          function hiding in a processing pipeline. A null string is not the same.
                          So an empty or blank string is the same, valid location code, and null is
                          undefined or uninitialized location code.

                          With regards to the "--" pseudo for the location code, is this not needed
                          because sometimes it is not possible or difficult to represent an empty
                          string or even a string? For example on the command line or in a restful WS
                          URI? (Or a URI on the command line!) So it may be that the use of "--" for
                          intermediate processing and requests could be tolerated and somehow
                          official, while empty or only-blanks strings official and for persistent
                          data.

                          Just my 0.02€ = $0.0268

                          Best regards to all,

                          Anthony



                          On 27/07/2014 04:52, Chad Trabant wrote:

                          Hi Marcelo,

                          Thanks for your thoughts as well. Something that you and Joachim are not
                          addressing are the concerns about an empty ID that have been brought up by
                          more than one person. The answer that empty strings are technically
                          possible and it all works in Python/SeisComP is less than satisfying. The
                          observations from Python, ObsPy and SeisComP are a few of many that need to
                          be taken into account.

                          I agree that there is a long tail consideration for the "--" location ID
                          solution. Understand that some folks find an empty ID to be problematic
                          regardless of whether it is XML, SEED, text, whatever, then you might see
                          where this proposal comes from. Yes, we would need to treat empty location
                          IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                          you will need to map empty IDs to empty strings, NULL and whatever an XML
                          parser might or might not produce for a long time as well (think beyond
                          Python and SeisComP). Either is possible, only one of them is a unique
                          mapping.

                          If the main considerations are for the least amount of disruption the the
                          answer is obvious to me: the FDSN can sanction that the two-space string is
                          the XML synonym for the empty SEED location ID and we adjust the schema to
                          make sure a string of whitespaces is preserved. Then SeisComP can change
                          its relatively new StationXML implementation and ALL existing clients will
                          be compatible with all metadata and, mostly importantly, we would have
                          consistent metadata.

                          If the empty string ID representation is adopted it would would, in effect,
                          mean that the DMC would need to change its metadata service and (more
                          importantly) all users of the DMC's metadata service would need to
                          transition to a new metadata channel naming scheme. This is certainly not
                          out of the question, but it is not something we would do without careful
                          consideration. I do not find the two-space strings all that great, but they
                          are here and something the DMC and users of the DMC have dealt with. Issues
                          have been identified with empty location IDs by us and our users. If DMC is
                          going to change, and push the change on all users of the DMC's StationXML,
                          it would be much more compelling to have a solution that addresses the low
                          level issues.

                          regards,
                          Chad


                          ----- Original Message -----
                          From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                          To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                          Sent: Friday, July 25, 2014 7:38:17 PM
                          Subject: Re: [webservices] A question of location ID, how to represent empty
                          IDs in XML?

                          Hi Philip and All,

                          I totaly agree with Joachim, was planning to answer but he was much
                          faster. What you guys are proposing is not a solution. the station XML
                          supports nicely the empty string and it is not null. There is a type
                          difference here in Python and in any other language and can be nicely
                          handled internally.

                          Also the location id is not just a string it is a key entry to link
                          miniseed to metadata and making an exception at this level just
                          because a user interface cannot proper render it without ambiguity
                          does not sounds like a proper way proposal. I am not favorable in
                          creating an exception that will have to be carried over along the
                          decades to come. Alternatives solutions for this issue should be
                          searched on the end user interface.

                          with my best regards,

                          Marcelo Bianchi
                          --


                          2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                          It sounds like you are saying "change is hard, so we shouldn't do it".
                          I would argue that change is hard and so if we don't do it now it will
                          never happen. StationXML is new enough that there is already a
                          disruption, we should seize the chance. If we do not do something now
                          about null loc ids, it will be a decade or two before we get another
                          chance.

                          It is time to drive the stake through the heart of null location ids.
                          Kill the evil while we have a chance.

                          Philip


                          On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                          Hello Rob,

                          Rob Newman wrote on 24.07.2014 18:51:

                          For what it's worth, I would also vote for the "--" standard. To quote
                          from the Zen of Python
                          http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                          (my language of choice):


                          "Beautiful is better than ugly.
                          Explicit is better than implicit.
                          Simple is better than complex.
                          Complex is better than complicated.
                          Flat is better than nested.
                          Sparse is better than dense.
                          Readability counts.
                          Special cases aren't special enough to break the rules.
                          Although practicality beats purity.
                          Errors should never pass silently.
                          Unless explicitly silenced."

                          I'd add "Compatible is better than incompatible." :)


                          Number 2 is especially relevant here:
                          "Explicit is better than implicit."

                          My favorite would be:

                          "Special cases aren't special enough to break the rules."

                          Quoted whitespace and nulls are painful. Code what you mean, and mean what
                          you code. It's easier for everyone.

                          But what if we simply *mean* "empty string"?

                          The issue is not about beauty, pain or ease. It's about standard
                          conformance. We already have a channel naming standard. If a new data format
                          cannot accommodate existing channel naming, then the new format is flawed.
                          But that's not even the case here...

                          An XML document that contains

                          <Channel locationCode="" ...

                          is not malformed. There's an attribute that *explicitly* contains an empty
                          string and a parser has to produce it as such. Not as null, nil or none, but
                          as an empty string. Otherwise the parser is broken and needs to be fixed,
                          not the data!

                          Again: It's not about beauty. We all agree that current channel naming is
                          not particularly beautiful and has limitations. But our business is not to
                          try to solve that issue now and here.

                          Cheers
                          Joachim

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices
                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          --
                          Sent from my iClayTablet

                          ________________________________

                          Anthony Lomax
                          161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                          tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                          http://www.alomax.net

                          Twitter: @ALomaxNet
                          Science & Special Topics: http://www.alomax.net/science
                          Software: http://www.alomax.net/software - updates:
                          https://twitter.com/ALomaxNet
                          ________________________________

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices



                  • Hi all,

                    leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?

                    The following group will match any uppercase alphanumeric two letter code and two spaces:

                    ^([A-Z0-9]{2}| )$

                    It matches “AA”, “00”, “10”, “A1”, “ “ , …
                    but not “—“, “”, “-“, “a1”, ...

                    Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.


                    Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.

                    In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows. “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.


                    Cheers!

                    Lion


                    On 28 Jul 2014, at 15:37, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                    Hi

                    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                    make a stab at the underlying issue. :)

                    Here, with lots of stuff cut out, is how a channel is "identified" in
                    stationXML via the fdsn station web service at the IRIS DMC,
                    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode=" " code="BHZ">

                    Another implementation of the same web service (not sure of url) gives
                    back this:

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode="" code="BHZ">

                    with locationCode="" vs =" " being the difference under consideration.

                    There are two basic issues being discussed (and yes, more beer would help! :)

                    1) Should all valid stationXML documents be required to use the exact
                    same string of characters to represent the location id for this
                    channel. This is would allow a comparison operation to be "simple" in
                    that it can compare the attribute values without additional
                    processing.

                    2) If we agree to 1), then what should those exact characters be? The
                    current top choices are
                    a) empty=""
                    b) two spaces=" "
                    c) two dashes="--".

                    1) seems less controversial than 2) in that greater compatibility is
                    generally seen as positive.

                    This is primarily a question about the form of the stationXML
                    documents, but obviously there are connections to the way requests are
                    formed, the relationship to miniseed/seed, the way things are coded in
                    software and how much detailed understanding we expect of end users.

                    Philip



                    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                    Hello all,

                    Can someone give a concise statement of the original problem being
                    discussed, it only or primarily a concern about XML?

                    It seems to me that with modern languages a string that is empty or has 1-N
                    spaces is the same thing - there are often implicit or explicit trim()
                    function hiding in a processing pipeline. A null string is not the same.
                    So an empty or blank string is the same, valid location code, and null is
                    undefined or uninitialized location code.

                    With regards to the "--" pseudo for the location code, is this not needed
                    because sometimes it is not possible or difficult to represent an empty
                    string or even a string? For example on the command line or in a restful WS
                    URI? (Or a URI on the command line!) So it may be that the use of "--" for
                    intermediate processing and requests could be tolerated and somehow
                    official, while empty or only-blanks strings official and for persistent
                    data.

                    Just my 0.02€ = $0.0268

                    Best regards to all,

                    Anthony



                    On 27/07/2014 04:52, Chad Trabant wrote:

                    Hi Marcelo,

                    Thanks for your thoughts as well. Something that you and Joachim are not
                    addressing are the concerns about an empty ID that have been brought up by
                    more than one person. The answer that empty strings are technically
                    possible and it all works in Python/SeisComP is less than satisfying. The
                    observations from Python, ObsPy and SeisComP are a few of many that need to
                    be taken into account.

                    I agree that there is a long tail consideration for the "--" location ID
                    solution. Understand that some folks find an empty ID to be problematic
                    regardless of whether it is XML, SEED, text, whatever, then you might see
                    where this proposal comes from. Yes, we would need to treat empty location
                    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                    you will need to map empty IDs to empty strings, NULL and whatever an XML
                    parser might or might not produce for a long time as well (think beyond
                    Python and SeisComP). Either is possible, only one of them is a unique
                    mapping.

                    If the main considerations are for the least amount of disruption the the
                    answer is obvious to me: the FDSN can sanction that the two-space string is
                    the XML synonym for the empty SEED location ID and we adjust the schema to
                    make sure a string of whitespaces is preserved. Then SeisComP can change
                    its relatively new StationXML implementation and ALL existing clients will
                    be compatible with all metadata and, mostly importantly, we would have
                    consistent metadata.

                    If the empty string ID representation is adopted it would would, in effect,
                    mean that the DMC would need to change its metadata service and (more
                    importantly) all users of the DMC's metadata service would need to
                    transition to a new metadata channel naming scheme. This is certainly not
                    out of the question, but it is not something we would do without careful
                    consideration. I do not find the two-space strings all that great, but they
                    are here and something the DMC and users of the DMC have dealt with. Issues
                    have been identified with empty location IDs by us and our users. If DMC is
                    going to change, and push the change on all users of the DMC's StationXML,
                    it would be much more compelling to have a solution that addresses the low
                    level issues.

                    regards,
                    Chad


                    ----- Original Message -----
                    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                    Sent: Friday, July 25, 2014 7:38:17 PM
                    Subject: Re: [webservices] A question of location ID, how to represent empty
                    IDs in XML?

                    Hi Philip and All,

                    I totaly agree with Joachim, was planning to answer but he was much
                    faster. What you guys are proposing is not a solution. the station XML
                    supports nicely the empty string and it is not null. There is a type
                    difference here in Python and in any other language and can be nicely
                    handled internally.

                    Also the location id is not just a string it is a key entry to link
                    miniseed to metadata and making an exception at this level just
                    because a user interface cannot proper render it without ambiguity
                    does not sounds like a proper way proposal. I am not favorable in
                    creating an exception that will have to be carried over along the
                    decades to come. Alternatives solutions for this issue should be
                    searched on the end user interface.

                    with my best regards,

                    Marcelo Bianchi
                    --


                    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                    It sounds like you are saying "change is hard, so we shouldn't do it".
                    I would argue that change is hard and so if we don't do it now it will
                    never happen. StationXML is new enough that there is already a
                    disruption, we should seize the chance. If we do not do something now
                    about null loc ids, it will be a decade or two before we get another
                    chance.

                    It is time to drive the stake through the heart of null location ids.
                    Kill the evil while we have a chance.

                    Philip


                    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                    Hello Rob,

                    Rob Newman wrote on 24.07.2014 18:51:

                    For what it's worth, I would also vote for the "--" standard. To quote
                    from the Zen of Python
                    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                    (my language of choice):


                    "Beautiful is better than ugly.
                    Explicit is better than implicit.
                    Simple is better than complex.
                    Complex is better than complicated.
                    Flat is better than nested.
                    Sparse is better than dense.
                    Readability counts.
                    Special cases aren't special enough to break the rules.
                    Although practicality beats purity.
                    Errors should never pass silently.
                    Unless explicitly silenced."

                    I'd add "Compatible is better than incompatible." :)


                    Number 2 is especially relevant here:
                    "Explicit is better than implicit."

                    My favorite would be:

                    "Special cases aren't special enough to break the rules."

                    Quoted whitespace and nulls are painful. Code what you mean, and mean what
                    you code. It's easier for everyone.

                    But what if we simply *mean* "empty string"?

                    The issue is not about beauty, pain or ease. It's about standard
                    conformance. We already have a channel naming standard. If a new data format
                    cannot accommodate existing channel naming, then the new format is flawed.
                    But that's not even the case here...

                    An XML document that contains

                    <Channel locationCode="" ...

                    is not malformed. There's an attribute that *explicitly* contains an empty
                    string and a parser has to produce it as such. Not as null, nil or none, but
                    as an empty string. Otherwise the parser is broken and needs to be fixed,
                    not the data!

                    Again: It's not about beauty. We all agree that current channel naming is
                    not particularly beautiful and has limitations. But our business is not to
                    try to solve that issue now and here.

                    Cheers
                    Joachim

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices
                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    --
                    Sent from my iClayTablet

                    ________________________________

                    Anthony Lomax
                    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                    http://www.alomax.net

                    Twitter: @ALomaxNet
                    Science & Special Topics: http://www.alomax.net/science
                    Software: http://www.alomax.net/software - updates:
                    https://twitter.com/ALomaxNet
                    ________________________________

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices



                    • Hi Lion,

                      Lion Krischer [07/28/2014 03:54 PM]:
                      ^([A-Z0-9]{2}| )$

                      It matches “AA”, “00”, “10”, “A1”, “ “ , …
                      but not “—“, “”, “-“, “a1”, ...

                      'Not ""' is a problem as "" is a valid location code according to SEED
                      specification. Which is what all this is actually about. :)

                      In general I like the idea of using regular expressions if we use
                      ^([A-Z0-9]{2}| |)$

                      Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.


                      Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.

                      The most important consistency is with the SEED standard.

                      Cheers
                      Joachim


                      • On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:


                        The most important consistency is with the SEED standard.

                        I would argue that consistency for end users is the only thing that
                        matters. Consistency with the SEED spec may be a means to that end,
                        but if the end users do not perceive it as being consistent, it isn't
                        consistent.

                        To me, that means we need to look at the bigger picture. Ideally we
                        would have location ids that could be represented by exactly the same
                        characters in:
                        stationXML
                        miniseed
                        URLS
                        client displays
                        databases
                        and even email
                        in a way that is explicit, consistent and natural for the end user.

                        To be honest, I don't like any of the choices. If I had my way, loc
                        ids would have been defined as strictly two characters like
                        ^([A-Z0-9]{2})$, and 00 would have been what you used if you didn't
                        care. Alas, that horse has left the barn.

                        Maybe not even worth $0.02... :)
                        Philip



                        Cheers
                        Joachim

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        • On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                          On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:


                          The most important consistency is with the SEED standard.

                          I would argue that consistency for end users is the only thing that
                          matters. Consistency with the SEED spec may be a means to that end,
                          but if the end users do not perceive it as being consistent, it isn't
                          consistent.

                          To me, that means we need to look at the bigger picture. Ideally we
                          would have location ids that could be represented by exactly the same
                          characters in:
                          stationXML
                          miniseed
                          URLS
                          client displays
                          databases
                          and even email
                          in a way that is explicit, consistent and natural for the end user.

                          I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.

                          Here are some others I would add to the list:

                          use in command lines
                          use in other data formats

                          Chad



                          • Chad Trabant wrote on 31.07.2014 07:42:
                            On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                            On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                            The most important consistency is with the SEED standard.

                            I would argue that consistency for end users is the only thing that
                            matters. Consistency with the SEED spec may be a means to that end,
                            but if the end users do not perceive it as being consistent, it isn't
                            consistent.

                            To me, that means we need to look at the bigger picture. Ideally we
                            would have location ids that could be represented by exactly the same
                            characters in:
                            stationXML
                            miniseed
                            URLS
                            client displays
                            databases
                            and even email
                            in a way that is explicit, consistent and natural for the end user.

                            I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.

                            Here are some others I would add to the list:

                            use in command lines
                            use in other data formats

                            That's a pretty ambitious list considering...

                            Chad Trabant wrote on 31.07.2014 06:57:
                            There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, [...]

                            We are still talking here about a metadata format, aren't we? And you want to prescribe how users shall display empty location codes in GUI displays? You must be kidding...

                            The issue is *not* about other data formats. It is up to every developer to save empty location codes in whatever way they like in their formats, databases, bulletins etc. That is absolutely no problem and hence doesn't require a solution.

                            Here the issue is about representing data in XML. Since we have a well accepted and widely implemented channel naming standard *already*, and since users are working with StationXML *already*, what we need *now* is a clarification about the proper representation of *current* channel naming in StationXML.

                            Joachim

                      • Hi Joachim,

                        'Not ""' is a problem as "" is a valid location code according to SEED specification. Which is what all this is actually about. :)

                        In general I like the idea of using regular expressions if we use ^([A-Z0-9]{2}| |)$
                        The idea was to choose either “ “ or “” which both denote an empty location id. In SEED it is not possible to specify two actual spaces (and not an empty string) as the location identifier as right aligned spaces are considered padding characters and have to be removed by the processing software.

                        Allowing both would mean having two separate “encodings” for the same thing. I am fine with either it is just important that one is picked as the proper representation of an empty location id.

                        According to the SEED spec it appears that single letter location codes are also valid. Does that happen in the wild?

                        Cheers!

                        Lion



                        Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.


                        Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.

                        The most important consistency is with the SEED standard.

                        Cheers
                        Joachim

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices




                    • On Jul 28, 2014, at 6:54 AM, Lion Krischer <krischer<at>geophysik.uni-muenchen.de> wrote:

                      Hi all,

                      leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?

                      Hi Lion,

                      We should definately add the rules to the schema, we just need to decide what they are!

                      The following group will match any uppercase alphanumeric two letter code and two spaces:

                      ^([A-Z0-9]{2}| )$

                      It matches “AA”, “00”, “10”, “A1”, “ “ , …
                      but not “—“, “”, “-“, “a1”, ...

                      Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.


                      Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.

                      In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows.

                      Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.

                      Has anyone observed this automatic trimming on any system?

                      “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.

                      Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for selecting the empty SEED location IDs.

                      Chad


                      PS. here is my test data:

                      ------- chan.xml
                      <FDSNStationXML schemaVersion="1.0">
                      <Channel locationCode=" " startDate="2012-03-12T20:28:00" restrictedStatus="open" endDate="2599-12-31T23:59:59" code="BHZ">
                      </Channel>
                      </FDSNStationXML>
                      -------

                      Here is a test with Python:
                      -------
                      from xml.etree import ElementTree

                      with open('chan.xml', 'rt') as f:
                      tree = ElementTree.parse(f)

                      node = tree.find('./with_attributes')
                      print node.tag
                      for name, value in sorted(node.attrib.items()):
                      print ' %-4s = "%s"' % (name, value)
                      -------

                      which produces:
                      -------
                      Channel
                      code = "BHZ"
                      endDate = "2599-12-31T23:59:59"
                      locationCode = " "
                      restrictedStatus = "open"
                      startDate = "2012-03-12T20:28:00"
                      -------

                      No trimming.

                      Here is a test with Perl:
                      -------
                      use strict;
                      use warnings;
                      use XML::Simple;
                      use Data::Dumper;

                      my $file = 'chan.xml';

                      my $test_data = XMLin($file);
                      print Dumper($test_data);
                      -------

                      which produces:
                      -------
                      $VAR1 = {
                      'schemaVersion' => '1.0',
                      'Channel' => {
                      'locationCode' => ' ',
                      'endDate' => '2599-12-31T23:59:59',
                      'restrictedStatus' => 'open',
                      'startDate' => '2012-03-12T20:28:00',
                      'code' => 'BHZ'
                      }
                      };
                      -------

                      No trimming.

                      There are many more parsing options for Perl and Python and other languages of course, but this is pretty basic stuff. It is how a user such as myself would go about parsing and using StationXML.
                      • -----BEGIN PGP SIGNED MESSAGE-----
                        Hash: SHA1

                        Dear all,

                        Maybe some stupid questions: Are there actually any valid use cases
                        for having to distinct between empty and unknown location code within
                        the data? If so, does this than also apply for network, station,
                        channel codes? So if the community opts to go for unknown as well as
                        an empty/unset markers for the location field shouldn't be the same
                        markers used for unknown/unset network etc.?

                        In terms of existing StationXML parsers I assume most are just
                        stripping whitespaces from the location code and thus “” and “
                        “ should already work resulting in minimal disruption in the
                        users’ workflows.

                        Usually without DTD or XML schema definition, all whitespaces are
                        significant whitespaces and should be preserved by any XML parsers. I
                        guess Lion meant with StationXML parser more than the plain XML
                        parser. I don't know what other clients do, but ObsPy strips
                        internally all Net/Sta/Loc/Cha field values.

                        Cheers,
                        Robert


                        PS: for some reason did my previous mail sent last weekend not appear
                        at this list, also I didn't receive all replies to this thread as
                        archived in
                        http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
                        (e.g.
                        http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
                        was missing)- I didn't get any bounce or error message from mailman
                        either? Any idea?

                        - --
                        Dr. Robert Barsch

                        EGU Office Munich
                        Luisenstr. 37
                        80333 Munich
                        Germany

                        Phone: +49-89-21806565
                        Fax: +49-89-218017855
                        eMail: barsch<at>egu.eu
                        -----BEGIN PGP SIGNATURE-----
                        Version: GnuPG v2.0.17 (MingW32)
                        Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

                        iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
                        duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
                        =vpwD
                        -----END PGP SIGNATURE-----


                        • On Jul 31, 2014, at 12:20 AM, Robert Barsch <barsch<at>egu.eu> wrote:

                          -----BEGIN PGP SIGNED MESSAGE-----
                          Hash: SHA1

                          Dear all,

                          Maybe some stupid questions: Are there actually any valid use cases
                          for having to distinct between empty and unknown location code within
                          the data? If so, does this than also apply for network, station,
                          channel codes? So if the community opts to go for unknown as well as
                          an empty/unset markers for the location field shouldn't be the same
                          markers used for unknown/unset network etc.?

                          Hi Robert,

                          There is no rule in the SEED world preventing two channel names differing only by location ID, in fact it happens often. Since location can be empty it means that we can have both XX.STA.00.LHZ and XX.STA..LHZ, if location were described as "unknown" these two become ambiguous. I do not know off hand of any cases where the differences are between an empty location ID and an filled one, but it would be a weird case to eliminate (or even describe) in the specification.

                          In terms of existing StationXML parsers I assume most are just
                          stripping whitespaces from the location code and thus “” and “
                          “ should already work resulting in minimal disruption in the
                          users’ workflows.

                          Usually without DTD or XML schema definition, all whitespaces are
                          significant whitespaces and should be preserved by any XML parsers.

                          Ah. I think that is basically what I have been finding, thanks for the confirmation.

                          Chad


                          I
                          guess Lion meant with StationXML parser more than the plain XML
                          parser. I don't know what other clients do, but ObsPy strips
                          internally all Net/Sta/Loc/Cha field values.

                          Cheers,
                          Robert


                          PS: for some reason did my previous mail sent last weekend not appear
                          at this list, also I didn't receive all replies to this thread as
                          archived in
                          http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
                          (e.g.
                          http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
                          was missing)- I didn't get any bounce or error message from mailman
                          either? Any idea?

                          - --
                          Dr. Robert Barsch

                          EGU Office Munich
                          Luisenstr. 37
                          80333 Munich
                          Germany

                          Phone: +49-89-21806565
                          Fax: +49-89-218017855
                          eMail: barsch<at>egu.eu
                          -----BEGIN PGP SIGNATURE-----
                          Version: GnuPG v2.0.17 (MingW32)
                          Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

                          iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
                          duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
                          =vpwD
                          -----END PGP SIGNATURE-----
                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices



                      • Chad Trabant wrote on 31.07.2014 08:49:
                        In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows.

                        Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.

                        There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.

                        Has anyone observed this automatic trimming on any system?

                        No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.

                        In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)

                        “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.

                        Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.

                        The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.

                        Joachim


                        • On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                          Chad Trabant wrote on 31.07.2014 08:49:
                          In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows.

                          Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.

                          There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.

                          Has anyone observed this automatic trimming on any system?

                          No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.

                          In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)

                          HI Joachim,

                          You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?

                          “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.

                          Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.

                          The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.

                          Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.

                          Chad



                          • Chad Trabant wrote on 31.07.2014 09:42:
                            On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                            Has anyone observed this automatic trimming on any system?

                            No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.

                            In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)

                            HI Joachim,

                            You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?

                            The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.

                            Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.

                            Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point.

                            Joachim


                            • On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                              Chad Trabant wrote on 31.07.2014 09:42:
                              On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                              Has anyone observed this automatic trimming on any system?

                              No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.

                              In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)

                              HI Joachim,

                              You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?

                              The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.

                              Hi Joachim,

                              That is a strange transition from libmseed to web service clients that I do not understand. You appear fixated on updating the clients, but as I have said many times that, by itself, will not solve the actual problem; the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.

                              Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.

                              Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point

                              Here is what you said about mapping:

                              On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                              In general mappings are not the problem and are widely used anyway.


                              So what is the problem with mapping?

                              I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.

                              Chad

                              Joachim
                              _______________________________________________
                              webservices mailing list
                              webservices<at>iris.washington.edu
                              http://www.iris.washington.edu/mailman/listinfo/webservices



                              • Hi Chad,

                                after a well-deserved creative break a little more feedback from Potsdam
                                on our favorite topic. :)

                                Chad Trabant wrote on 31.07.2014 10:37:
                                On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                                Chad Trabant wrote on 31.07.2014 09:42:
                                On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                                Has anyone observed this automatic trimming on any system?

                                No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.

                                In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)

                                HI Joachim,

                                You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?

                                The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.

                                Hi Joachim,

                                That is a strange transition from libmseed to web service clients that I do not understand.

                                In libmseed you treat the two spaces differently than some web service
                                client code. While in libmseed you trim the spaces, resulting in an
                                empty string, in web service clients (like FetchData) you keep the
                                spaces. By simply trimming them there, too, and then matching against an
                                empty string, you would not only maintain consistency in your
                                interpretation of waveform and meta data, but also be more "accepting"
                                in what your clients are able to process. In particular this would
                                enable your clients to parse strictly SEED compliant empty location
                                codes, which currently is not possible.

                                You appear fixated on updating the clients,

                                Yes, absolutely! Because that's where the current problem can be fixed
                                most easily.

                                but as I have said many times that, by itself, will not solve the actual problem;

                                That depends on what you consider as "the actual problem". If it is
                                empty location codes, I would not view that as a problem at all.
                                Cosmetics at worst, but it is as it is and we can live with it. Can't you?

                                the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.

                                Once the issue is clarified, the clients will naturally be adopted to
                                the specification. The inconsistency is currently still at a
                                low/manageable level. In particular, there is absolutely nowhere an
                                inconsistency with (Mini)SEED headers, it's *currently* *only* a
                                relatively minor inconsistency at XML level that is not too big to be
                                handled. Besides the standard conformance this is IMHO the main
                                advantage of "" compared to "--".

                                Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.

                                Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point

                                Here is what you said about mapping:

                                On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
                                In general mappings are not the problem and are widely used anyway.

                                Yes, of course mappings are not a problem, especially not from a
                                technical point of view. But it also depends on the kind of mapping.

                                BTW, I stated the above in the context of the mapping "" <-> " ", which
                                is very easy using trim() et al. and in particular does not require any
                                change to the *current* channel naming conventions. And which is also
                                why I wrote "widely used". Technically a mapping to/from "--" would be
                                quite different, because the range of values that need to be tested
                                against e.g. in a simple comparison makes this more complicated. In
                                practice, of course, one can implement this once as a library function
                                or by creating a location code class and overloading the == operator.
                                This is still considerably more work than just calling trim().

                                With a mapping to/from "--" we also have the "forever" issue. With ""
                                vs. " " this is not an issue at all, especially in view of the existing
                                SEED headers. That's a big difference.

                                So what is the problem with mapping?

                                Because as already said, the mapping would be required "forever" due to
                                persistence of the data. In particular, you cannot declare existing
                                metadata invalid. Hence you would have to keep supporting "" and " ",
                                too, to maintain backward compatibility.

                                I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.

                                This is already a very technical discussion and where you detect
                                "personal indignation" is left as an exercise to the reader.

                                Here is the context: "StationXML is the new dataless SEED, as such it
                                should be compatible between data centers for at least the core
                                parameters. Currently StationXML produced by SeisComP3 and other data
                                centers for the same exact same channel can be documents that are
                                semantically different channels (NSLC do not match). We would not do
                                this with dataless SEED, right? Any notion that a reader of the XML
                                must apply rules to the core name values is rubbish, no transformation
                                should be needed at that point. These are documents that are being
                                stored as files, loaded into databases, and otherwise saved."

                                In my interpretation this is a statement against any mapping.

                                But I recognize we are all in a learning curve and opinions evolve and
                                sometimes change. Plus we are having two discussions on the list and
                                off-list, each with several sub-threads. This probably creates
                                additional confusion and doesn't quite help to focus on what the real
                                issues are *currently*. Channel naming can be discussed and should,
                                actually has been many times before, but can we not focus on what needs
                                to be solved in very short time without introducing additional
                                incompatibilities?

                                All frustration about ugly empty location codes aside, I maintain that
                                there are technically rather no issues with them. Nothing that cannot be
                                solved quickly with rather few modifications plus a clarification in the
                                FDSN StationXML specification. In fact I already proposed a clear
                                timeline you might want to comment on. What follows is a quote from my
                                email of July-24, 18:43 UTC to this list.

                                ----------------------------------------------------------------------

                                Actually we are currently seeking to solve a particular incompatibility
                                between FDSN StationXML produced by different services, but technically
                                that is much, *much* easier to achieve than the introduction of a new
                                and incompatible channel naming. I would welcome an intensified
                                discussion on the latter, but not in the context of the current FDSN
                                StationXML or web services.

                                It's actually quite strange that already now, early after the
                                introduction of FDSN StationXML, we are not only choking over minor
                                incompatibilities, but are discussing "solutions" to problems that
                                apparently noone had noticed they existed before StationXML... Looks
                                like shooting at sparrows with cannons, IMO.

                                There used to be a IASPEI working group on station codes that even came
                                up with a new channel naming "standard"[*], which, however, doesn't seem
                                to have gained much acceptance so far. Nevertheless this is the level at
                                which changes to channel naming need to be discussed, even though the
                                process may be frustratingly slow. But the impact of such a change is
                                just too big to be decided ad hoc.


                                To summarize:

                                We will not find a future-proof channel naming convention quickly.
                                Partial changes, especially if incompatible, should be absolutely avoided.

                                The particular problem we attempted (and still need) to solve in the
                                first place is a location code incompatibility due to differently strict
                                adherence to the SEED specification. Not surprisingly I prefer the
                                empty-string representation for the empty location code. To be
                                pragmatic, I propose the following time line:

                                * Accept that at least for a transitional period we have to accept the
                                existence of space-space and empty location codes.

                                * During a transitional period, don't change the servers that now
                                produce space-space location codes, as that would break compatibility
                                with some clients. We want to keep compatibility rather than introducing
                                new incompatibility.

                                * Instead update the clients to accept both space-space and empty
                                location codes by trimming trailing spaces if present. This is a
                                relatively minor change and IIRC this is on IRIS's agenda already, which
                                is highly appreciated.

                                At this point in time, interoperability is restored, even without
                                server-side changes. This is important as it may take quite some time
                                for the users to actually upgrade their clients; but it doesn't hurt anyone.

                                * Finally the server upgrades where needed. The decision as to when to
                                upgrade the server side can be made once it is considered appropriate;
                                there is absolutely no hurry from the client side.

                                The needed changes for the above proposal are very small compared to the
                                huge changes that would be required at every level to implement a new
                                channel naming convention. This may (and hopefully will) take place some
                                time in the future, but it requires a lot of preparation and
                                coordination. I am pretty sure that we will have a considerable number
                                of beers in the meantime.

                                Besides the beers, we should focus on finalizing the specification of
                                FDSN StationXML. There are too many under-defined elements even in the
                                xsd and the risk of serious incompatibilities is very high.

                                Cheers
                                Joachim


                                [*] http://www.isc.ac.uk/registries/download/IR_implementation.pdf

                  • Hi Philip

                    Philip Crotwell [07/28/2014 03:37 PM]:
                    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                    make a stab at the underlying issue.:)

                    Here, with lots of stuff cut out, is how a channel is "identified" in
                    stationXML via the fdsn station web service at the IRIS DMC,
                    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode=" " code="BHZ">

                    Another implementation of the same web service (not sure of url) gives
                    back this:

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode="" code="BHZ">

                    with locationCode="" vs =" " being the difference under consideration.

                    Exactly. Good that you provided this as example because we were already
                    getting lost so deep within the details that we may have forgotten that
                    this thread has just moved to this list and that it might not have been
                    clear to everybody what the issue actually is...

                    Even few lines of XML can (sometimes) help make things clearer. ;)

                    There are two basic issues being discussed (and yes, more beer would help!:)

                    1) Should all valid stationXML documents be required to use the exact
                    same string of characters to represent the location id for this
                    channel. This is would allow a comparison operation to be "simple" in
                    that it can compare the attribute values without additional
                    processing.

                    This would be ideal, but I think it is not realistic:

                    If "--" were introduced, it would be impossible not to keep supporting "
                    " and "" practically forever in order to maintain backward compatibility.

                    If "" were to become the preferred empty location code, we still have
                    probably billions of instances of " " out in the wild that should not
                    be declared invalid.

                    The same is true for " " resp. "".

                    In short some mapping is required anyway. Fortunately the mapping
                    between "" and " " is trivial by using methods like trim(), strip() or
                    so (depending on the language). Most seismic data handling software
                    already does it anyway because it's so obvious. For XML it's at least
                    ObsPy and SeisComP. SEED readers that trim the location code include
                    rdseed, libmseed and qlib2. All database engines provide a trim()
                    method, so database queries are not a problem either.

                    if trim(loc1) == trim(loc2) ...

                    may be slightly more expensive in terms of CPU cycles than

                    if loc1 == loc2 ...

                    but I presume that this is nowhere a real issue. With the added benefit
                    that the currently not strictly SEED compliant " " location code is
                    then within the valid range kind of automatically.

                    2) If we agree to 1), then what should those exact characters be? The
                    current top choices are
                    a) empty=""
                    b) two spaces=" "
                    c) two dashes="--".

                    1) seems less controversial than 2) in that greater compatibility is
                    generally seen as positive.

                    Compatibility is absolutely essential. This is probably the main reason
                    why even after more than 10 years of discussion about new channel
                    naming, there hasn't been any real progress AFAICS. And despite all
                    shortcomings the current NSLC is really remarkable as it is accepted and
                    used nearly everywhere. Don't put that at stake.

                    Thanks btw for your other comments about potential issues related to
                    white space.

                    Cheers
                    Joachim


                  • Thanks Philip, I think you have outlined the issues well.

                    Regarding issue #1, I strongly feel that we need to choose one representation, the sooner we stop creating incompatible metadata the better.

                    Regarding issue #2:

                    b) two spaces=" "

                    This is what IRIS currently does, not strictly SEED but avoids empty identifiers.

                    c) two dashes="--".

                    This would require work and continued mapping, the mapping is clear between SEED-based holdings and StationXML. SEED headers and data records could also be considered, but is a bigger can of worms.

                    a) empty=""


                    This is possibly the most straight forward mapping of SEED information, but leaves us with an empty string identifier.

                    Below are a few of the issues we note regarding empty identifiers

                    1) They are too similar to "unknown" (which results in potential ambiguity where channels are only differentiated by location ID):

                    a) In many languages an empty string evaluates to false; if, for example, when program is testing for and then extracting a value from an XML document parsed into a structure/object it could appear as if the value was not present. Of course the coding in probably every language can be done to avoid such a false negative, but it is a pitfall that we would be asking all future users and coders to know about.

                    b) In XPath (the query language for XSLT), which is used to search or translate XML, the matching of a string attribute usually uses the string() function. Specifying the string attribute to match when the attribute has a value is straightforward, when trying to match the empty string the query is for NOT string. In the boolean functions of XPath "a string is true if and only if its length is non-zero" (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a fringe technology, an empty string is not just another kind of string but an anomoly.

                    c) In JavaScript the getAttribute() method returns the same value whether the attribute was an empty string or unspecified. The method is no longer recommended but illustrates that such thinking is not limited to niche projects.

                    2) Organizing data in structures such as a nested hash is pretty common: %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty identifier as a key works in some languages but it is obtuse and unclear. I'm sure there are many other data structures that would use location by itself as a key.

                    3) Empty identifiers are difficult to specify on the command line, URLs, etc. and non-obvious many other places such as GUI fields. We have largely addressed this issue for FDSN web services (at the DMC for other mechanisms as well) by making "--" a synonym for the empty location ID. In other words we are already mapping "--" into the empty location ID for requests and users are learning this association. A further adoption of the synonym into the metadata would solve many of these problems.

                    4) While it is certainly not the FDSN's task to define data formats outside of its purview, the adoption or matching of the core channel naming fields in other formats is certainly in the FDSN's best interest. This has been happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially empty (optional?) location ID could make such adoption harder as it is an wrinkle, especially for space delimited formats. I believe these broader implications deserve some consideration.

                    I'm sure most developers could come up with solutions to the technical problems, but an empty identifier leaves the unfortunate wrinkles for all future users and coders.

                    Here is an example of someone that was confused by current metadata, I'll bet if there was a value in the locationCode it would have been easier:
                    https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file

                    There is a chance we will end up with the empty location identifier, but the considerations should go beyond an assumption that an empty string is the only choice.

                    Since an empty location field in SEED essentially means unset, perhaps we should consider making the locationCode attribute optional and leaving it out of the XML when it is empty in SEED. In this line of thinking, the empty string is just a hack to include a required attribute when in fact there is nothing to include. For me the "unset" aspect is unsettlingly similar to "unknown", but it's an idea preferred by at least one engineer at the DMC.

                    Chad


                    On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                    Hi

                    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                    make a stab at the underlying issue. :)

                    Here, with lots of stuff cut out, is how a channel is "identified" in
                    stationXML via the fdsn station web service at the IRIS DMC,
                    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode=" " code="BHZ">

                    Another implementation of the same web service (not sure of url) gives
                    back this:

                    <Network code="GE" >
                    <Station code="UGM">
                    <Channel locationCode="" code="BHZ">

                    with locationCode="" vs =" " being the difference under consideration.

                    There are two basic issues being discussed (and yes, more beer would help! :)

                    1) Should all valid stationXML documents be required to use the exact
                    same string of characters to represent the location id for this
                    channel. This is would allow a comparison operation to be "simple" in
                    that it can compare the attribute values without additional
                    processing.

                    2) If we agree to 1), then what should those exact characters be? The
                    current top choices are
                    a) empty=""
                    b) two spaces=" "
                    c) two dashes="--".

                    1) seems less controversial than 2) in that greater compatibility is
                    generally seen as positive.

                    This is primarily a question about the form of the stationXML
                    documents, but obviously there are connections to the way requests are
                    formed, the relationship to miniseed/seed, the way things are coded in
                    software and how much detailed understanding we expect of end users.

                    Philip



                    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
                    Hello all,

                    Can someone give a concise statement of the original problem being
                    discussed, it only or primarily a concern about XML?

                    It seems to me that with modern languages a string that is empty or has 1-N
                    spaces is the same thing - there are often implicit or explicit trim()
                    function hiding in a processing pipeline. A null string is not the same.
                    So an empty or blank string is the same, valid location code, and null is
                    undefined or uninitialized location code.

                    With regards to the "--" pseudo for the location code, is this not needed
                    because sometimes it is not possible or difficult to represent an empty
                    string or even a string? For example on the command line or in a restful WS
                    URI? (Or a URI on the command line!) So it may be that the use of "--" for
                    intermediate processing and requests could be tolerated and somehow
                    official, while empty or only-blanks strings official and for persistent
                    data.

                    Just my 0.02€ = $0.0268

                    Best regards to all,

                    Anthony



                    On 27/07/2014 04:52, Chad Trabant wrote:

                    Hi Marcelo,

                    Thanks for your thoughts as well. Something that you and Joachim are not
                    addressing are the concerns about an empty ID that have been brought up by
                    more than one person. The answer that empty strings are technically
                    possible and it all works in Python/SeisComP is less than satisfying. The
                    observations from Python, ObsPy and SeisComP are a few of many that need to
                    be taken into account.

                    I agree that there is a long tail consideration for the "--" location ID
                    solution. Understand that some folks find an empty ID to be problematic
                    regardless of whether it is XML, SEED, text, whatever, then you might see
                    where this proposal comes from. Yes, we would need to treat empty location
                    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                    you will need to map empty IDs to empty strings, NULL and whatever an XML
                    parser might or might not produce for a long time as well (think beyond
                    Python and SeisComP). Either is possible, only one of them is a unique
                    mapping.

                    If the main considerations are for the least amount of disruption the the
                    answer is obvious to me: the FDSN can sanction that the two-space string is
                    the XML synonym for the empty SEED location ID and we adjust the schema to
                    make sure a string of whitespaces is preserved. Then SeisComP can change
                    its relatively new StationXML implementation and ALL existing clients will
                    be compatible with all metadata and, mostly importantly, we would have
                    consistent metadata.

                    If the empty string ID representation is adopted it would would, in effect,
                    mean that the DMC would need to change its metadata service and (more
                    importantly) all users of the DMC's metadata service would need to
                    transition to a new metadata channel naming scheme. This is certainly not
                    out of the question, but it is not something we would do without careful
                    consideration. I do not find the two-space strings all that great, but they
                    are here and something the DMC and users of the DMC have dealt with. Issues
                    have been identified with empty location IDs by us and our users. If DMC is
                    going to change, and push the change on all users of the DMC's StationXML,
                    it would be much more compelling to have a solution that addresses the low
                    level issues.

                    regards,
                    Chad


                    ----- Original Message -----
                    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                    Sent: Friday, July 25, 2014 7:38:17 PM
                    Subject: Re: [webservices] A question of location ID, how to represent empty
                    IDs in XML?

                    Hi Philip and All,

                    I totaly agree with Joachim, was planning to answer but he was much
                    faster. What you guys are proposing is not a solution. the station XML
                    supports nicely the empty string and it is not null. There is a type
                    difference here in Python and in any other language and can be nicely
                    handled internally.

                    Also the location id is not just a string it is a key entry to link
                    miniseed to metadata and making an exception at this level just
                    because a user interface cannot proper render it without ambiguity
                    does not sounds like a proper way proposal. I am not favorable in
                    creating an exception that will have to be carried over along the
                    decades to come. Alternatives solutions for this issue should be
                    searched on the end user interface.

                    with my best regards,

                    Marcelo Bianchi
                    --


                    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                    It sounds like you are saying "change is hard, so we shouldn't do it".
                    I would argue that change is hard and so if we don't do it now it will
                    never happen. StationXML is new enough that there is already a
                    disruption, we should seize the chance. If we do not do something now
                    about null loc ids, it will be a decade or two before we get another
                    chance.

                    It is time to drive the stake through the heart of null location ids.
                    Kill the evil while we have a chance.

                    Philip


                    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                    Hello Rob,

                    Rob Newman wrote on 24.07.2014 18:51:

                    For what it's worth, I would also vote for the "--" standard. To quote
                    from the Zen of Python
                    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                    (my language of choice):


                    "Beautiful is better than ugly.
                    Explicit is better than implicit.
                    Simple is better than complex.
                    Complex is better than complicated.
                    Flat is better than nested.
                    Sparse is better than dense.
                    Readability counts.
                    Special cases aren't special enough to break the rules.
                    Although practicality beats purity.
                    Errors should never pass silently.
                    Unless explicitly silenced."

                    I'd add "Compatible is better than incompatible." :)


                    Number 2 is especially relevant here:
                    "Explicit is better than implicit."

                    My favorite would be:

                    "Special cases aren't special enough to break the rules."

                    Quoted whitespace and nulls are painful. Code what you mean, and mean what
                    you code. It's easier for everyone.

                    But what if we simply *mean* "empty string"?

                    The issue is not about beauty, pain or ease. It's about standard
                    conformance. We already have a channel naming standard. If a new data format
                    cannot accommodate existing channel naming, then the new format is flawed.
                    But that's not even the case here...

                    An XML document that contains

                    <Channel locationCode="" ...

                    is not malformed. There's an attribute that *explicitly* contains an empty
                    string and a parser has to produce it as such. Not as null, nil or none, but
                    as an empty string. Otherwise the parser is broken and needs to be fixed,
                    not the data!

                    Again: It's not about beauty. We all agree that current channel naming is
                    not particularly beautiful and has limitations. But our business is not to
                    try to solve that issue now and here.

                    Cheers
                    Joachim

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices
                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    --
                    Sent from my iClayTablet

                    ________________________________

                    Anthony Lomax
                    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                    http://www.alomax.net

                    Twitter: @ALomaxNet
                    Science & Special Topics: http://www.alomax.net/science
                    Software: http://www.alomax.net/software - updates:
                    https://twitter.com/ALomaxNet
                    ________________________________

                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    _______________________________________________
                    webservices mailing list
                    webservices<at>iris.washington.edu
                    http://www.iris.washington.edu/mailman/listinfo/webservices


                    • Hi

                      Just another data point, Earthworm, which is widely used by regional
                      networks globally, has long had the "dash dash is the same as space
                      space" convention. So dash dash is not something pulled out of thin
                      air, it is how at least I do things already.

                      And this shows that it is fairly common (if not technically correct)
                      for users to regard space-space as the location id instead of
                      regarding it as null with 2 spaces for padding. My guess is that very
                      few users are aware of this, and even as someone who has been writing
                      seismic software for a couple of decades I still think of the location
                      id as space-space, not null.

                      http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt

                      Philip


                      On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

                      Thanks Philip, I think you have outlined the issues well.

                      Regarding issue #1, I strongly feel that we need to choose one
                      representation, the sooner we stop creating incompatible metadata the
                      better.

                      Regarding issue #2:

                      b) two spaces=" "


                      This is what IRIS currently does, not strictly SEED but avoids empty
                      identifiers.

                      c) two dashes="--".


                      This would require work and continued mapping, the mapping is clear between
                      SEED-based holdings and StationXML. SEED headers and data records could
                      also be considered, but is a bigger can of worms.

                      a) empty=""


                      This is possibly the most straight forward mapping of SEED information, but
                      leaves us with an empty string identifier.

                      Below are a few of the issues we note regarding empty identifiers

                      1) They are too similar to "unknown" (which results in potential ambiguity
                      where channels are only differentiated by location ID):

                      a) In many languages an empty string evaluates to false; if, for example,
                      when program is testing for and then extracting a value from an XML document
                      parsed into a structure/object it could appear as if the value was not
                      present. Of course the coding in probably every language can be done to
                      avoid such a false negative, but it is a pitfall that we would be asking all
                      future users and coders to know about.

                      b) In XPath (the query language for XSLT), which is used to search or
                      translate XML, the matching of a string attribute usually uses the string()
                      function. Specifying the string attribute to match when the attribute has a
                      value is straightforward, when trying to match the empty string the query is
                      for NOT string. In the boolean functions of XPath "a string is true if and
                      only if its length is non-zero"
                      (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
                      fringe technology, an empty string is not just another kind of string but an
                      anomoly.

                      c) In JavaScript the getAttribute() method returns the same value whether
                      the attribute was an empty string or unspecified. The method is no longer
                      recommended but illustrates that such thinking is not limited to niche
                      projects.

                      2) Organizing data in structures such as a nested hash is pretty common:
                      %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
                      identifier as a key works in some languages but it is obtuse and unclear.
                      I'm sure there are many other data structures that would use location by
                      itself as a key.

                      3) Empty identifiers are difficult to specify on the command line, URLs,
                      etc. and non-obvious many other places such as GUI fields. We have largely
                      addressed this issue for FDSN web services (at the DMC for other mechanisms
                      as well) by making "--" a synonym for the empty location ID. In other words
                      we are already mapping "--" into the empty location ID for requests and
                      users are learning this association. A further adoption of the synonym into
                      the metadata would solve many of these problems.

                      4) While it is certainly not the FDSN's task to define data formats outside
                      of its purview, the adoption or matching of the core channel naming fields
                      in other formats is certainly in the FDSN's best interest. This has been
                      happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
                      empty (optional?) location ID could make such adoption harder as it is an
                      wrinkle, especially for space delimited formats. I believe these broader
                      implications deserve some consideration.

                      I'm sure most developers could come up with solutions to the technical
                      problems, but an empty identifier leaves the unfortunate wrinkles for all
                      future users and coders.

                      Here is an example of someone that was confused by current metadata, I'll
                      bet if there was a value in the locationCode it would have been easier:
                      https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file

                      There is a chance we will end up with the empty location identifier, but the
                      considerations should go beyond an assumption that an empty string is the
                      only choice.

                      Since an empty location field in SEED essentially means unset, perhaps we
                      should consider making the locationCode attribute optional and leaving it
                      out of the XML when it is empty in SEED. In this line of thinking, the
                      empty string is just a hack to include a required attribute when in fact
                      there is nothing to include. For me the "unset" aspect is unsettlingly
                      similar to "unknown", but it's an idea preferred by at least one engineer at
                      the DMC.

                      Chad


                      On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                      Hi

                      Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                      make a stab at the underlying issue. :)

                      Here, with lots of stuff cut out, is how a channel is "identified" in
                      stationXML via the fdsn station web service at the IRIS DMC,
                      http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                      <Network code="GE" >
                      <Station code="UGM">
                      <Channel locationCode=" " code="BHZ">

                      Another implementation of the same web service (not sure of url) gives
                      back this:

                      <Network code="GE" >
                      <Station code="UGM">
                      <Channel locationCode="" code="BHZ">

                      with locationCode="" vs =" " being the difference under consideration.

                      There are two basic issues being discussed (and yes, more beer would help!
                      :)

                      1) Should all valid stationXML documents be required to use the exact
                      same string of characters to represent the location id for this
                      channel. This is would allow a comparison operation to be "simple" in
                      that it can compare the attribute values without additional
                      processing.

                      2) If we agree to 1), then what should those exact characters be? The
                      current top choices are
                      a) empty=""
                      b) two spaces=" "
                      c) two dashes="--".

                      1) seems less controversial than 2) in that greater compatibility is
                      generally seen as positive.

                      This is primarily a question about the form of the stationXML
                      documents, but obviously there are connections to the way requests are
                      formed, the relationship to miniseed/seed, the way things are coded in
                      software and how much detailed understanding we expect of end users.

                      Philip



                      On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:

                      Hello all,

                      Can someone give a concise statement of the original problem being
                      discussed, it only or primarily a concern about XML?

                      It seems to me that with modern languages a string that is empty or has 1-N
                      spaces is the same thing - there are often implicit or explicit trim()
                      function hiding in a processing pipeline. A null string is not the same.
                      So an empty or blank string is the same, valid location code, and null is
                      undefined or uninitialized location code.

                      With regards to the "--" pseudo for the location code, is this not needed
                      because sometimes it is not possible or difficult to represent an empty
                      string or even a string? For example on the command line or in a restful WS
                      URI? (Or a URI on the command line!) So it may be that the use of "--" for
                      intermediate processing and requests could be tolerated and somehow
                      official, while empty or only-blanks strings official and for persistent
                      data.

                      Just my 0.02€ = $0.0268

                      Best regards to all,

                      Anthony



                      On 27/07/2014 04:52, Chad Trabant wrote:

                      Hi Marcelo,

                      Thanks for your thoughts as well. Something that you and Joachim are not
                      addressing are the concerns about an empty ID that have been brought up by
                      more than one person. The answer that empty strings are technically
                      possible and it all works in Python/SeisComP is less than satisfying. The
                      observations from Python, ObsPy and SeisComP are a few of many that need to
                      be taken into account.

                      I agree that there is a long tail consideration for the "--" location ID
                      solution. Understand that some folks find an empty ID to be problematic
                      regardless of whether it is XML, SEED, text, whatever, then you might see
                      where this proposal comes from. Yes, we would need to treat empty location
                      IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                      you will need to map empty IDs to empty strings, NULL and whatever an XML
                      parser might or might not produce for a long time as well (think beyond
                      Python and SeisComP). Either is possible, only one of them is a unique
                      mapping.

                      If the main considerations are for the least amount of disruption the the
                      answer is obvious to me: the FDSN can sanction that the two-space string is
                      the XML synonym for the empty SEED location ID and we adjust the schema to
                      make sure a string of whitespaces is preserved. Then SeisComP can change
                      its relatively new StationXML implementation and ALL existing clients will
                      be compatible with all metadata and, mostly importantly, we would have
                      consistent metadata.

                      If the empty string ID representation is adopted it would would, in effect,
                      mean that the DMC would need to change its metadata service and (more
                      importantly) all users of the DMC's metadata service would need to
                      transition to a new metadata channel naming scheme. This is certainly not
                      out of the question, but it is not something we would do without careful
                      consideration. I do not find the two-space strings all that great, but they
                      are here and something the DMC and users of the DMC have dealt with. Issues
                      have been identified with empty location IDs by us and our users. If DMC is
                      going to change, and push the change on all users of the DMC's StationXML,
                      it would be much more compelling to have a solution that addresses the low
                      level issues.

                      regards,
                      Chad


                      ----- Original Message -----
                      From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                      To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                      Sent: Friday, July 25, 2014 7:38:17 PM
                      Subject: Re: [webservices] A question of location ID, how to represent empty
                      IDs in XML?

                      Hi Philip and All,

                      I totaly agree with Joachim, was planning to answer but he was much
                      faster. What you guys are proposing is not a solution. the station XML
                      supports nicely the empty string and it is not null. There is a type
                      difference here in Python and in any other language and can be nicely
                      handled internally.

                      Also the location id is not just a string it is a key entry to link
                      miniseed to metadata and making an exception at this level just
                      because a user interface cannot proper render it without ambiguity
                      does not sounds like a proper way proposal. I am not favorable in
                      creating an exception that will have to be carried over along the
                      decades to come. Alternatives solutions for this issue should be
                      searched on the end user interface.

                      with my best regards,

                      Marcelo Bianchi
                      --


                      2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                      It sounds like you are saying "change is hard, so we shouldn't do it".
                      I would argue that change is hard and so if we don't do it now it will
                      never happen. StationXML is new enough that there is already a
                      disruption, we should seize the chance. If we do not do something now
                      about null loc ids, it will be a decade or two before we get another
                      chance.

                      It is time to drive the stake through the heart of null location ids.
                      Kill the evil while we have a chance.

                      Philip


                      On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                      Hello Rob,

                      Rob Newman wrote on 24.07.2014 18:51:

                      For what it's worth, I would also vote for the "--" standard. To quote
                      from the Zen of Python
                      http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                      (my language of choice):


                      "Beautiful is better than ugly.
                      Explicit is better than implicit.
                      Simple is better than complex.
                      Complex is better than complicated.
                      Flat is better than nested.
                      Sparse is better than dense.
                      Readability counts.
                      Special cases aren't special enough to break the rules.
                      Although practicality beats purity.
                      Errors should never pass silently.
                      Unless explicitly silenced."

                      I'd add "Compatible is better than incompatible." :)


                      Number 2 is especially relevant here:
                      "Explicit is better than implicit."

                      My favorite would be:

                      "Special cases aren't special enough to break the rules."

                      Quoted whitespace and nulls are painful. Code what you mean, and mean what
                      you code. It's easier for everyone.

                      But what if we simply *mean* "empty string"?

                      The issue is not about beauty, pain or ease. It's about standard
                      conformance. We already have a channel naming standard. If a new data format
                      cannot accommodate existing channel naming, then the new format is flawed.
                      But that's not even the case here...

                      An XML document that contains

                      <Channel locationCode="" ...

                      is not malformed. There's an attribute that *explicitly* contains an empty
                      string and a parser has to produce it as such. Not as null, nil or none, but
                      as an empty string. Otherwise the parser is broken and needs to be fixed,
                      not the data!

                      Again: It's not about beauty. We all agree that current channel naming is
                      not particularly beautiful and has limitations. But our business is not to
                      try to solve that issue now and here.

                      Cheers
                      Joachim

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices
                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices


                      --
                      Sent from my iClayTablet

                      ________________________________

                      Anthony Lomax
                      161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                      tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                      http://www.alomax.net

                      Twitter: @ALomaxNet
                      Science & Special Topics: http://www.alomax.net/science
                      Software: http://www.alomax.net/software - updates:
                      https://twitter.com/ALomaxNet
                      ________________________________

                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices


                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices



                      _______________________________________________
                      webservices mailing list
                      webservices<at>iris.washington.edu
                      http://www.iris.washington.edu/mailman/listinfo/webservices



                      • Yet another data point, going all the way back to vol 1 issue 1 of the
                        DMC newsletter introducing location ids:

                        "The Location Identifier is a two character code that, when used in
                        conjunction with the other data specifiers, uniquely identifies a data
                        stream."
                        and
                        "Historically, within a SEED volume, the Location Identifier was left
                        “blank” (consisted of two spaces)."
                        and
                        "GSN Use of Location Identifiers
                        Valid characters for location identifiers are [space, 0-9, A-Z][space,
                        0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "

                        http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/

                        From this it seems that location id was intended to be exactly 2
                        characters, not zero or two. My feeling is that we have a long
                        tradition of the location id being "space-space" and not null or
                        empty. Personally I really dislike space-space, but the only thing I
                        dislike more than space-space is empty.

                        Philip

                        On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
                        Hi

                        Just another data point, Earthworm, which is widely used by regional
                        networks globally, has long had the "dash dash is the same as space
                        space" convention. So dash dash is not something pulled out of thin
                        air, it is how at least I do things already.

                        And this shows that it is fairly common (if not technically correct)
                        for users to regard space-space as the location id instead of
                        regarding it as null with 2 spaces for padding. My guess is that very
                        few users are aware of this, and even as someone who has been writing
                        seismic software for a couple of decades I still think of the location
                        id as space-space, not null.

                        http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt

                        Philip


                        On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

                        Thanks Philip, I think you have outlined the issues well.

                        Regarding issue #1, I strongly feel that we need to choose one
                        representation, the sooner we stop creating incompatible metadata the
                        better.

                        Regarding issue #2:

                        b) two spaces=" "


                        This is what IRIS currently does, not strictly SEED but avoids empty
                        identifiers.

                        c) two dashes="--".


                        This would require work and continued mapping, the mapping is clear between
                        SEED-based holdings and StationXML. SEED headers and data records could
                        also be considered, but is a bigger can of worms.

                        a) empty=""


                        This is possibly the most straight forward mapping of SEED information, but
                        leaves us with an empty string identifier.

                        Below are a few of the issues we note regarding empty identifiers

                        1) They are too similar to "unknown" (which results in potential ambiguity
                        where channels are only differentiated by location ID):

                        a) In many languages an empty string evaluates to false; if, for example,
                        when program is testing for and then extracting a value from an XML document
                        parsed into a structure/object it could appear as if the value was not
                        present. Of course the coding in probably every language can be done to
                        avoid such a false negative, but it is a pitfall that we would be asking all
                        future users and coders to know about.

                        b) In XPath (the query language for XSLT), which is used to search or
                        translate XML, the matching of a string attribute usually uses the string()
                        function. Specifying the string attribute to match when the attribute has a
                        value is straightforward, when trying to match the empty string the query is
                        for NOT string. In the boolean functions of XPath "a string is true if and
                        only if its length is non-zero"
                        (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
                        fringe technology, an empty string is not just another kind of string but an
                        anomoly.

                        c) In JavaScript the getAttribute() method returns the same value whether
                        the attribute was an empty string or unspecified. The method is no longer
                        recommended but illustrates that such thinking is not limited to niche
                        projects.

                        2) Organizing data in structures such as a nested hash is pretty common:
                        %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
                        identifier as a key works in some languages but it is obtuse and unclear.
                        I'm sure there are many other data structures that would use location by
                        itself as a key.

                        3) Empty identifiers are difficult to specify on the command line, URLs,
                        etc. and non-obvious many other places such as GUI fields. We have largely
                        addressed this issue for FDSN web services (at the DMC for other mechanisms
                        as well) by making "--" a synonym for the empty location ID. In other words
                        we are already mapping "--" into the empty location ID for requests and
                        users are learning this association. A further adoption of the synonym into
                        the metadata would solve many of these problems.

                        4) While it is certainly not the FDSN's task to define data formats outside
                        of its purview, the adoption or matching of the core channel naming fields
                        in other formats is certainly in the FDSN's best interest. This has been
                        happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
                        empty (optional?) location ID could make such adoption harder as it is an
                        wrinkle, especially for space delimited formats. I believe these broader
                        implications deserve some consideration.

                        I'm sure most developers could come up with solutions to the technical
                        problems, but an empty identifier leaves the unfortunate wrinkles for all
                        future users and coders.

                        Here is an example of someone that was confused by current metadata, I'll
                        bet if there was a value in the locationCode it would have been easier:
                        https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file

                        There is a chance we will end up with the empty location identifier, but the
                        considerations should go beyond an assumption that an empty string is the
                        only choice.

                        Since an empty location field in SEED essentially means unset, perhaps we
                        should consider making the locationCode attribute optional and leaving it
                        out of the XML when it is empty in SEED. In this line of thinking, the
                        empty string is just a hack to include a required attribute when in fact
                        there is nothing to include. For me the "unset" aspect is unsettlingly
                        similar to "unknown", but it's an idea preferred by at least one engineer at
                        the DMC.

                        Chad


                        On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:

                        Hi

                        Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                        make a stab at the underlying issue. :)

                        Here, with lots of stuff cut out, is how a channel is "identified" in
                        stationXML via the fdsn station web service at the IRIS DMC,
                        http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                        <Network code="GE" >
                        <Station code="UGM">
                        <Channel locationCode=" " code="BHZ">

                        Another implementation of the same web service (not sure of url) gives
                        back this:

                        <Network code="GE" >
                        <Station code="UGM">
                        <Channel locationCode="" code="BHZ">

                        with locationCode="" vs =" " being the difference under consideration.

                        There are two basic issues being discussed (and yes, more beer would help!
                        :)

                        1) Should all valid stationXML documents be required to use the exact
                        same string of characters to represent the location id for this
                        channel. This is would allow a comparison operation to be "simple" in
                        that it can compare the attribute values without additional
                        processing.

                        2) If we agree to 1), then what should those exact characters be? The
                        current top choices are
                        a) empty=""
                        b) two spaces=" "
                        c) two dashes="--".

                        1) seems less controversial than 2) in that greater compatibility is
                        generally seen as positive.

                        This is primarily a question about the form of the stationXML
                        documents, but obviously there are connections to the way requests are
                        formed, the relationship to miniseed/seed, the way things are coded in
                        software and how much detailed understanding we expect of end users.

                        Philip



                        On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:

                        Hello all,

                        Can someone give a concise statement of the original problem being
                        discussed, it only or primarily a concern about XML?

                        It seems to me that with modern languages a string that is empty or has 1-N
                        spaces is the same thing - there are often implicit or explicit trim()
                        function hiding in a processing pipeline. A null string is not the same.
                        So an empty or blank string is the same, valid location code, and null is
                        undefined or uninitialized location code.

                        With regards to the "--" pseudo for the location code, is this not needed
                        because sometimes it is not possible or difficult to represent an empty
                        string or even a string? For example on the command line or in a restful WS
                        URI? (Or a URI on the command line!) So it may be that the use of "--" for
                        intermediate processing and requests could be tolerated and somehow
                        official, while empty or only-blanks strings official and for persistent
                        data.

                        Just my 0.02€ = $0.0268

                        Best regards to all,

                        Anthony



                        On 27/07/2014 04:52, Chad Trabant wrote:

                        Hi Marcelo,

                        Thanks for your thoughts as well. Something that you and Joachim are not
                        addressing are the concerns about an empty ID that have been brought up by
                        more than one person. The answer that empty strings are technically
                        possible and it all works in Python/SeisComP is less than satisfying. The
                        observations from Python, ObsPy and SeisComP are a few of many that need to
                        be taken into account.

                        I agree that there is a long tail consideration for the "--" location ID
                        solution. Understand that some folks find an empty ID to be problematic
                        regardless of whether it is XML, SEED, text, whatever, then you might see
                        where this proposal comes from. Yes, we would need to treat empty location
                        IDs and "--" as synonyms for a very long time. Empty strings in XML mean
                        you will need to map empty IDs to empty strings, NULL and whatever an XML
                        parser might or might not produce for a long time as well (think beyond
                        Python and SeisComP). Either is possible, only one of them is a unique
                        mapping.

                        If the main considerations are for the least amount of disruption the the
                        answer is obvious to me: the FDSN can sanction that the two-space string is
                        the XML synonym for the empty SEED location ID and we adjust the schema to
                        make sure a string of whitespaces is preserved. Then SeisComP can change
                        its relatively new StationXML implementation and ALL existing clients will
                        be compatible with all metadata and, mostly importantly, we would have
                        consistent metadata.

                        If the empty string ID representation is adopted it would would, in effect,
                        mean that the DMC would need to change its metadata service and (more
                        importantly) all users of the DMC's metadata service would need to
                        transition to a new metadata channel naming scheme. This is certainly not
                        out of the question, but it is not something we would do without careful
                        consideration. I do not find the two-space strings all that great, but they
                        are here and something the DMC and users of the DMC have dealt with. Issues
                        have been identified with empty location IDs by us and our users. If DMC is
                        going to change, and push the change on all users of the DMC's StationXML,
                        it would be much more compelling to have a solution that addresses the low
                        level issues.

                        regards,
                        Chad


                        ----- Original Message -----
                        From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                        To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                        Sent: Friday, July 25, 2014 7:38:17 PM
                        Subject: Re: [webservices] A question of location ID, how to represent empty
                        IDs in XML?

                        Hi Philip and All,

                        I totaly agree with Joachim, was planning to answer but he was much
                        faster. What you guys are proposing is not a solution. the station XML
                        supports nicely the empty string and it is not null. There is a type
                        difference here in Python and in any other language and can be nicely
                        handled internally.

                        Also the location id is not just a string it is a key entry to link
                        miniseed to metadata and making an exception at this level just
                        because a user interface cannot proper render it without ambiguity
                        does not sounds like a proper way proposal. I am not favorable in
                        creating an exception that will have to be carried over along the
                        decades to come. Alternatives solutions for this issue should be
                        searched on the end user interface.

                        with my best regards,

                        Marcelo Bianchi
                        --


                        2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                        It sounds like you are saying "change is hard, so we shouldn't do it".
                        I would argue that change is hard and so if we don't do it now it will
                        never happen. StationXML is new enough that there is already a
                        disruption, we should seize the chance. If we do not do something now
                        about null loc ids, it will be a decade or two before we get another
                        chance.

                        It is time to drive the stake through the heart of null location ids.
                        Kill the evil while we have a chance.

                        Philip


                        On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                        Hello Rob,

                        Rob Newman wrote on 24.07.2014 18:51:

                        For what it's worth, I would also vote for the "--" standard. To quote
                        from the Zen of Python
                        http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                        (my language of choice):


                        "Beautiful is better than ugly.
                        Explicit is better than implicit.
                        Simple is better than complex.
                        Complex is better than complicated.
                        Flat is better than nested.
                        Sparse is better than dense.
                        Readability counts.
                        Special cases aren't special enough to break the rules.
                        Although practicality beats purity.
                        Errors should never pass silently.
                        Unless explicitly silenced."

                        I'd add "Compatible is better than incompatible." :)


                        Number 2 is especially relevant here:
                        "Explicit is better than implicit."

                        My favorite would be:

                        "Special cases aren't special enough to break the rules."

                        Quoted whitespace and nulls are painful. Code what you mean, and mean what
                        you code. It's easier for everyone.

                        But what if we simply *mean* "empty string"?

                        The issue is not about beauty, pain or ease. It's about standard
                        conformance. We already have a channel naming standard. If a new data format
                        cannot accommodate existing channel naming, then the new format is flawed.
                        But that's not even the case here...

                        An XML document that contains

                        <Channel locationCode="" ...

                        is not malformed. There's an attribute that *explicitly* contains an empty
                        string and a parser has to produce it as such. Not as null, nil or none, but
                        as an empty string. Otherwise the parser is broken and needs to be fixed,
                        not the data!

                        Again: It's not about beauty. We all agree that current channel naming is
                        not particularly beautiful and has limitations. But our business is not to
                        try to solve that issue now and here.

                        Cheers
                        Joachim

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices
                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        --
                        Sent from my iClayTablet

                        ________________________________

                        Anthony Lomax
                        161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                        tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                        http://www.alomax.net

                        Twitter: @ALomaxNet
                        Science & Special Topics: http://www.alomax.net/science
                        Software: http://www.alomax.net/software - updates:
                        https://twitter.com/ALomaxNet
                        ________________________________

                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices


                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices



                        _______________________________________________
                        webservices mailing list
                        webservices<at>iris.washington.edu
                        http://www.iris.washington.edu/mailman/listinfo/webservices



                        • Hi,

                          I realize I am coming pretty late to the party here, but I'll chime in
                          anyway. At SCEDC (the archive for the Southern California Seismic
                          Network), we represent our empty location codes with two blank spaces as
                          well. I suspect the same is done at the Northern California Earthquake
                          Data Center.

                          There are great arguments here for each of the options presented, but I
                          think unless we decide to make location id optional we should not use an
                          empty string to denote an unset location id in StationXML. I think there
                          is enough variety in how an empty string is treated in different
                          programming languages and databases to be problematic. From some
                          databases' perspectives, you really might as well make it null at that
                          point.

                          So I would say, if location id is required, use a two character
                          substitution; my personal preference is two spaces as that seems to be the
                          convention (although it ain't pretty) - and we should consider in a future
                          version of stationXML making the location id optional.


                          Ellen


                          On Thu, Jul 31, 2014 at 5:59 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
                          wrote:

                          Yet another data point, going all the way back to vol 1 issue 1 of the
                          DMC newsletter introducing location ids:

                          "The Location Identifier is a two character code that, when used in
                          conjunction with the other data specifiers, uniquely identifies a data
                          stream."
                          and
                          "Historically, within a SEED volume, the Location Identifier was left
                          "blank" (consisted of two spaces)."
                          and
                          "GSN Use of Location Identifiers
                          Valid characters for location identifiers are [space, 0-9, A-Z][space,
                          0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "


                          http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/

                          From this it seems that location id was intended to be exactly 2
                          characters, not zero or two. My feeling is that we have a long
                          tradition of the location id being "space-space" and not null or
                          empty. Personally I really dislike space-space, but the only thing I
                          dislike more than space-space is empty.

                          Philip

                          On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
                          wrote:
                          Hi

                          Just another data point, Earthworm, which is widely used by regional
                          networks globally, has long had the "dash dash is the same as space
                          space" convention. So dash dash is not something pulled out of thin
                          air, it is how at least I do things already.

                          And this shows that it is fairly common (if not technically correct)
                          for users to regard space-space as the location id instead of
                          regarding it as null with 2 spaces for padding. My guess is that very
                          few users are aware of this, and even as someone who has been writing
                          seismic software for a couple of decades I still think of the location
                          id as space-space, not null.

                          http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt

                          Philip


                          On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu>
                          wrote:

                          Thanks Philip, I think you have outlined the issues well.

                          Regarding issue #1, I strongly feel that we need to choose one
                          representation, the sooner we stop creating incompatible metadata the
                          better.

                          Regarding issue #2:

                          b) two spaces=" "


                          This is what IRIS currently does, not strictly SEED but avoids empty
                          identifiers.

                          c) two dashes="--".


                          This would require work and continued mapping, the mapping is clear
                          between
                          SEED-based holdings and StationXML. SEED headers and data records could
                          also be considered, but is a bigger can of worms.

                          a) empty=""


                          This is possibly the most straight forward mapping of SEED information,
                          but
                          leaves us with an empty string identifier.

                          Below are a few of the issues we note regarding empty identifiers

                          1) They are too similar to "unknown" (which results in potential
                          ambiguity
                          where channels are only differentiated by location ID):

                          a) In many languages an empty string evaluates to false; if, for
                          example,
                          when program is testing for and then extracting a value from an XML
                          document
                          parsed into a structure/object it could appear as if the value was not
                          present. Of course the coding in probably every language can be done to
                          avoid such a false negative, but it is a pitfall that we would be
                          asking all
                          future users and coders to know about.

                          b) In XPath (the query language for XSLT), which is used to search or
                          translate XML, the matching of a string attribute usually uses the
                          string()
                          function. Specifying the string attribute to match when the attribute
                          has a
                          value is straightforward, when trying to match the empty string the
                          query is
                          for NOT string. In the boolean functions of XPath "a string is true if
                          and
                          only if its length is non-zero"
                          (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
                          fringe technology, an empty string is not just another kind of string
                          but an
                          anomoly.

                          c) In JavaScript the getAttribute() method returns the same value
                          whether
                          the attribute was an empty string or unspecified. The method is no
                          longer
                          recommended but illustrates that such thinking is not limited to niche
                          projects.

                          2) Organizing data in structures such as a nested hash is pretty common:
                          %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
                          identifier as a key works in some languages but it is obtuse and
                          unclear.
                          I'm sure there are many other data structures that would use location by
                          itself as a key.

                          3) Empty identifiers are difficult to specify on the command line, URLs,
                          etc. and non-obvious many other places such as GUI fields. We have
                          largely
                          addressed this issue for FDSN web services (at the DMC for other
                          mechanisms
                          as well) by making "--" a synonym for the empty location ID. In other
                          words
                          we are already mapping "--" into the empty location ID for requests and
                          users are learning this association. A further adoption of the synonym
                          into
                          the metadata would solve many of these problems.

                          4) While it is certainly not the FDSN's task to define data formats
                          outside
                          of its purview, the adoption or matching of the core channel naming
                          fields
                          in other formats is certainly in the FDSN's best interest. This has
                          been
                          happening for a long time already (ISF/IASPEI, GSE, etc.). The
                          potentially
                          empty (optional?) location ID could make such adoption harder as it is
                          an
                          wrinkle, especially for space delimited formats. I believe these
                          broader
                          implications deserve some consideration.

                          I'm sure most developers could come up with solutions to the technical
                          problems, but an empty identifier leaves the unfortunate wrinkles for
                          all
                          future users and coders.

                          Here is an example of someone that was confused by current metadata,
                          I'll
                          bet if there was a value in the locationCode it would have been easier:

                          https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file

                          There is a chance we will end up with the empty location identifier,
                          but the
                          considerations should go beyond an assumption that an empty string is
                          the
                          only choice.

                          Since an empty location field in SEED essentially means unset, perhaps
                          we
                          should consider making the locationCode attribute optional and leaving
                          it
                          out of the XML when it is empty in SEED. In this line of thinking, the
                          empty string is just a hack to include a required attribute when in fact
                          there is nothing to include. For me the "unset" aspect is unsettlingly
                          similar to "unknown", but it's an idea preferred by at least one
                          engineer at
                          the DMC.

                          Chad


                          On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
                          wrote:

                          Hi

                          Being on the cheap side of the Atlantic, I'll save us $0.00068 and
                          make a stab at the underlying issue. :)

                          Here, with lots of stuff cut out, is how a channel is "identified" in
                          stationXML via the fdsn station web service at the IRIS DMC,

                          http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404

                          <Network code="GE" >
                          <Station code="UGM">
                          <Channel locationCode=" " code="BHZ">

                          Another implementation of the same web service (not sure of url) gives
                          back this:

                          <Network code="GE" >
                          <Station code="UGM">
                          <Channel locationCode="" code="BHZ">

                          with locationCode="" vs =" " being the difference under consideration.

                          There are two basic issues being discussed (and yes, more beer would
                          help!
                          :)

                          1) Should all valid stationXML documents be required to use the exact
                          same string of characters to represent the location id for this
                          channel. This is would allow a comparison operation to be "simple" in
                          that it can compare the attribute values without additional
                          processing.

                          2) If we agree to 1), then what should those exact characters be? The
                          current top choices are
                          a) empty=""
                          b) two spaces=" "
                          c) two dashes="--".

                          1) seems less controversial than 2) in that greater compatibility is
                          generally seen as positive.

                          This is primarily a question about the form of the stationXML
                          documents, but obviously there are connections to the way requests are
                          formed, the relationship to miniseed/seed, the way things are coded in
                          software and how much detailed understanding we expect of end users.

                          Philip



                          On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:

                          Hello all,

                          Can someone give a concise statement of the original problem being
                          discussed, it only or primarily a concern about XML?

                          It seems to me that with modern languages a string that is empty or has
                          1-N
                          spaces is the same thing - there are often implicit or explicit trim()
                          function hiding in a processing pipeline. A null string is not the
                          same.
                          So an empty or blank string is the same, valid location code, and null
                          is
                          undefined or uninitialized location code.

                          With regards to the "--" pseudo for the location code, is this not
                          needed
                          because sometimes it is not possible or difficult to represent an empty
                          string or even a string? For example on the command line or in a
                          restful WS
                          URI? (Or a URI on the command line!) So it may be that the use of
                          "--" for
                          intermediate processing and requests could be tolerated and somehow
                          official, while empty or only-blanks strings official and for persistent
                          data.

                          Just my 0.02 EURO = $0.0268

                          Best regards to all,

                          Anthony



                          On 27/07/2014 04:52, Chad Trabant wrote:

                          Hi Marcelo,

                          Thanks for your thoughts as well. Something that you and Joachim are
                          not
                          addressing are the concerns about an empty ID that have been brought up
                          by
                          more than one person. The answer that empty strings are technically
                          possible and it all works in Python/SeisComP is less than satisfying.
                          The
                          observations from Python, ObsPy and SeisComP are a few of many that
                          need to
                          be taken into account.

                          I agree that there is a long tail consideration for the "--" location ID
                          solution. Understand that some folks find an empty ID to be problematic
                          regardless of whether it is XML, SEED, text, whatever, then you might
                          see
                          where this proposal comes from. Yes, we would need to treat empty
                          location
                          IDs and "--" as synonyms for a very long time. Empty strings in XML
                          mean
                          you will need to map empty IDs to empty strings, NULL and whatever an
                          XML
                          parser might or might not produce for a long time as well (think beyond
                          Python and SeisComP). Either is possible, only one of them is a unique
                          mapping.

                          If the main considerations are for the least amount of disruption the
                          the
                          answer is obvious to me: the FDSN can sanction that the two-space
                          string is
                          the XML synonym for the empty SEED location ID and we adjust the schema
                          to
                          make sure a string of whitespaces is preserved. Then SeisComP can
                          change
                          its relatively new StationXML implementation and ALL existing clients
                          will
                          be compatible with all metadata and, mostly importantly, we would have
                          consistent metadata.

                          If the empty string ID representation is adopted it would would, in
                          effect,
                          mean that the DMC would need to change its metadata service and (more
                          importantly) all users of the DMC's metadata service would need to
                          transition to a new metadata channel naming scheme. This is certainly
                          not
                          out of the question, but it is not something we would do without careful
                          consideration. I do not find the two-space strings all that great, but
                          they
                          are here and something the DMC and users of the DMC have dealt with.
                          Issues
                          have been identified with empty location IDs by us and our users. If
                          DMC is
                          going to change, and push the change on all users of the DMC's
                          StationXML,
                          it would be much more compelling to have a solution that addresses the
                          low
                          level issues.

                          regards,
                          Chad


                          ----- Original Message -----
                          From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
                          To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
                          Sent: Friday, July 25, 2014 7:38:17 PM
                          Subject: Re: [webservices] A question of location ID, how to represent
                          empty
                          IDs in XML?

                          Hi Philip and All,

                          I totaly agree with Joachim, was planning to answer but he was much
                          faster. What you guys are proposing is not a solution. the station XML
                          supports nicely the empty string and it is not null. There is a type
                          difference here in Python and in any other language and can be nicely
                          handled internally.

                          Also the location id is not just a string it is a key entry to link
                          miniseed to metadata and making an exception at this level just
                          because a user interface cannot proper render it without ambiguity
                          does not sounds like a proper way proposal. I am not favorable in
                          creating an exception that will have to be carried over along the
                          decades to come. Alternatives solutions for this issue should be
                          searched on the end user interface.

                          with my best regards,

                          Marcelo Bianchi
                          --


                          2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:

                          It sounds like you are saying "change is hard, so we shouldn't do it".
                          I would argue that change is hard and so if we don't do it now it will
                          never happen. StationXML is new enough that there is already a
                          disruption, we should seize the chance. If we do not do something now
                          about null loc ids, it will be a decade or two before we get another
                          chance.

                          It is time to drive the stake through the heart of null location ids.
                          Kill the evil while we have a chance.

                          Philip


                          On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de>
                          wrote:

                          Hello Rob,

                          Rob Newman wrote on 24.07.2014 18:51:

                          For what it's worth, I would also vote for the "--" standard. To quote
                          from the Zen of Python
                          <
                          http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
                          (my language of choice):


                          "Beautiful is better than ugly.
                          Explicit is better than implicit.
                          Simple is better than complex.
                          Complex is better than complicated.
                          Flat is better than nested.
                          Sparse is better than dense.
                          Readability counts.
                          Special cases aren't special enough to break the rules.
                          Although practicality beats purity.
                          Errors should never pass silently.
                          Unless explicitly silenced."

                          I'd add "Compatible is better than incompatible." :)


                          Number 2 is especially relevant here:
                          "Explicit is better than implicit."

                          My favorite would be:

                          "Special cases aren't special enough to break the rules."

                          Quoted whitespace and nulls are painful. Code what you mean, and mean
                          what
                          you code. It's easier for everyone.

                          But what if we simply *mean* "empty string"?

                          The issue is not about beauty, pain or ease. It's about standard
                          conformance. We already have a channel naming standard. If a new data
                          format
                          cannot accommodate existing channel naming, then the new format is
                          flawed.
                          But that's not even the case here...

                          An XML document that contains

                          <Channel locationCode="" ...

                          is not malformed. There's an attribute that *explicitly* contains an
                          empty
                          string and a parser has to produce it as such. Not as null, nil or
                          none, but
                          as an empty string. Otherwise the parser is broken and needs to be
                          fixed,
                          not the data!

                          Again: It's not about beauty. We all agree that current channel naming
                          is
                          not particularly beautiful and has limitations. But our business is not
                          to
                          try to solve that issue now and here.

                          Cheers
                          Joachim

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices
                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          --
                          Sent from my iClayTablet

                          ________________________________

                          Anthony Lomax
                          161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
                          tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
                          http://www.alomax.net

                          Twitter: @ALomaxNet
                          Science & Special Topics: http://www.alomax.net/science
                          Software: http://www.alomax.net/software - updates:
                          https://twitter.com/ALomaxNet
                          ________________________________

                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices



                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                          _______________________________________________
                          webservices mailing list
                          webservices<at>iris.washington.edu
                          http://www.iris.washington.edu/mailman/listinfo/webservices


                      • Hi Philip,

                        Philip Crotwell wrote on 07/31/2014 02:59 PM:
                        http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/

                        From this it seems that location id was intended to be exactly 2
                        characters, not zero or two. My feeling is that we have a long
                        tradition of the location id being "space-space" and not null or
                        empty. Personally I really dislike space-space, but the only thing I
                        dislike more than space-space is empty.

                        Now we have the above IRIS newsletter article vs. the FDSN standard.
                        Which one should be considered authoritative?

                        Philip Crotwell wrote on 07/31/2014 01:18 PM:
                        Just another data point, Earthworm, which is widely used by regional
                        networks globally, has long had the "dash dash is the same as space
                        space" convention. So dash dash is not something pulled out of thin
                        air, it is how at least I do things already.

                        OK, at least we know now where Chad and you got that idea from. ;)

                        And this shows that it is fairly common (if not technically correct)
                        for users to regard space-space as the location id instead of
                        regarding it as null with 2 spaces for padding. My guess is that very
                        few users are aware of this, and even as someone who has been writing
                        seismic software for a couple of decades I still think of the location
                        id as space-space, not null.

                        http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt

                        If you say Earthworm I say SAC. The location code in SAC is trimmed just
                        like in other software already mentioned. Does that convince you? Of
                        course not, as we are here discussing neither Earthworm nor SAC channel
                        naming convention.

                        This discussion is about FDSN standard channel naming. Obviously neither
                        Earthworm nor SAC count. Both use their own formats and within their
                        respective ecosystems they can of course represent the location code in
                        whatever way is considered appropriate, as long as the export to FDSN
                        formats is done properly.

                        Cheers
                        Joachim

              • Hi Chad

                Chad Trabant wrote on 27.07.2014 04:52:
                The answer that empty strings are technically possible and it all
                works in Python/SeisComP is less than satisfying. The observations
                from Python, ObsPy and SeisComP are a few of many that need to be
                taken into account.

                Please name a few. Not abstract claims or hearsay. Point us to client
                code that cannot parse an empty location code; only then someone can
                take a closer look at the matter and quite possibly provide help.

                Yes, we would need to treat empty location IDs and "--" as synonyms
                for a very long time. Empty strings in XML mean you will need to map
                empty IDs to empty strings, NULL and whatever an XML parser might or
                might not produce for a long time as well (think beyond Python and
                SeisComP). Either is possible, only one of them is a unique
                mapping.

                I don't accept the parser issues unless you provide examples; see above.

                In general mappings are not the problem and are widely used anyway. Can
                you name a single software that when reading (Mini)SEED does *not* map
                the location code from " " to ""? Even libmseed does!

                So why not be consistent and do the same when parsing XML? It would
                solve the current issues. You can then keep your two spaces as long as
                you like. ;)

                If the main considerations are for the least amount of disruption the
                the answer is obvious to me: the FDSN can sanction that the two-space
                string is the XML synonym for the empty SEED location ID and we
                adjust the schema to make sure a string of whitespaces is preserved.
                Then SeisComP can change its relatively new StationXML implementation
                and ALL existing clients will be compatible with all metadata and,
                mostly importantly, we would have consistent metadata.

                Chad, this whole discussion started back in early January with your
                complaint about the SeisComP fdsnws server implementation. You were
                alleging that 'The resulting StationXML includes empty location IDs
                (locationCode=“”), this is not allowed in SEED and therefore not allowed
                in StationXML.' If the SeisComP server were indeed producing wrong XML
                it would have been corrected long ago. But that's not the case! It's
                actually SeisComP that produces the more correct FDSN StationXML
                compared to IRIS XML, not only w.r.t. locationCode.

                Don't you think it is now time to roll up the sleeves and make your
                client codes work with standard compliant FDSN StationXML rather than
                doctoring an FDSN standard?

                If the empty string ID representation is adopted it would would, in
                effect, mean that the DMC would need to change its metadata service
                and (more importantly) all users of the DMC's metadata service would
                need to transition to a new metadata channel naming scheme. This is
                certainly not out of the question, but it is not something we would
                do without careful consideration. I do not find the two-space
                strings all that great, but they are here and something the DMC and
                users of the DMC have dealt with. Issues have been identified with
                empty location IDs by us and our users. If DMC is going to change,
                and push the change on all users of the DMC's StationXML, it would be
                much more compelling to have a solution that addresses the low level
                issues.

                Did you read my email of Thursday, 18:43 UTC? Following the ideas I
                outlined there, you are technically *not* required to change any of your
                servers. Only a few client codes are actually affected and even I was
                able to make the changes in one of those in 10 minutes. Of course, in
                total it will take longer, but if specific problematic cases related to
                parsing are identified and discussed, I am sure solutions can be found
                quickly. We have this list, we have skilled and enthusiastic people
                working on this, so why not use this as a platform even for more
                technical discussions? Or how about creating a "developer's corner"
                webservices-devel or so?

                Cheers
                Joachim


                • On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:

                  Hi Chad

                  Chad Trabant wrote on 27.07.2014 04:52:
                  The answer that empty strings are technically possible and it all
                  works in Python/SeisComP is less than satisfying. The observations
                  from Python, ObsPy and SeisComP are a few of many that need to be
                  taken into account.

                  Please name a few. Not abstract claims or hearsay. Point us to client
                  code that cannot parse an empty location code; only then someone can
                  take a closer look at the matter and quite possibly provide help.

                  OK, here are a few: IRIS-WS, IRIS Fetch scripts, irisFetch.m, JWEED and probably: SeisFile, EMERALD, Epicentral and all the other codes that users of the DMC have created to read the metadata we send them.

                  The statement that observations from Python, ObsPy and SeisComP alone are insufficient evidence for key changes to FDSN formats is not an abstract claim or hearsay, it is rather obvious since they are not the only (or even majority) systems handling these formats.

                  Yes, we would need to treat empty location IDs and "--" as synonyms
                  for a very long time. Empty strings in XML mean you will need to map
                  empty IDs to empty strings, NULL and whatever an XML parser might or
                  might not produce for a long time as well (think beyond Python and
                  SeisComP). Either is possible, only one of them is a unique
                  mapping.

                  I don't accept the parser issues unless you provide examples; see above.

                  In general mappings are not the problem and are widely used anyway. Can
                  you name a single software that when reading (Mini)SEED does *not* map
                  the location code from " " to ""? Even libmseed does!

                  The code that reads dataless SEED into the DMC's metadata tables. If you want two: the code that reads the values from the DMC's database and creates StationXML. But it doesn’t really matter.

                  Yes, collapsing the spaces is very common and in fact how SEED specifies that it be done, no one is arguing this that I have read.

                  So why not be consistent and do the same when parsing XML? It would
                  solve the current issues. You can then keep your two spaces as long as
                  you like. ;)

                  Yes, it totally makes sense to keep the same thing going in XML, except that there have been some issues identified in both SEED and XML and this is an opportunity to begin addressing the low level issue. In essence, the empty string solution is not ideal, even if it is the most appropriate mapping given the current rules. More on this later.

                  If the main considerations are for the least amount of disruption the
                  the answer is obvious to me: the FDSN can sanction that the two-space
                  string is the XML synonym for the empty SEED location ID and we
                  adjust the schema to make sure a string of whitespaces is preserved.
                  Then SeisComP can change its relatively new StationXML implementation
                  and ALL existing clients will be compatible with all metadata and,
                  mostly importantly, we would have consistent metadata.

                  Chad, this whole discussion started back in early January with your
                  complaint about the SeisComP fdsnws server implementation. You were
                  alleging that 'The resulting StationXML includes empty location IDs
                  (locationCode=“”), this is not allowed in SEED and therefore not allowed
                  in StationXML.'

                  And I have since written that my thoughts have changed, that indeed the location code in SEED does not contain spaces or is required to be two characters.

                  The point I was making is that the least number of users would be effected if the FDSN decided to require two characters and allow spaces. I say this because I believe most of the users of StationXML get their metadata from the DMC at the moment and have already dealt with the metadata in some way.

                  If the SeisComP server were indeed producing wrong XML
                  it would have been corrected long ago. But that's not the case! It's
                  actually SeisComP that produces the more correct FDSN StationXML
                  compared to IRIS XML, not only w.r.t. locationCode.

                  This statement is heavy on hubris and naivety.

                  There in no easy way to determine if any given StationXML document is fully "correct". The schema does not have enough information to vet the contents of a StationXML document, it basically checks to make sure the layout is correct, so XML schema validity is not sufficient for "correct". Currently, the StationXML contents are supposed to follow the guidelines defined in SEED. I think many of us agree that we should work to put as many of the content rules as possible into future versions of the schema to clarify many of the gray areas of StationXML. The concept of "more correct" is qualitative when used generally and is rarely or never more important than "compatible with the consensus”.

                  Such gray areas exist even within SEED. Within the FDSN here is how we have traditionally dealt with the gray areas: when implementing a piece of software to produce something already in production at another center(s) you usually use the other(s) as a reference (or collaborate with then). If important differences are found they are brought up and discussed civilly and a plan is made to make things compatible, usually with user impact being a high priority. Unfortunately, this is not how this current situation unfolded and we are left with incompatible metadata.

                  Don't you think it is now time to roll up the sleeves and make your
                  client codes work with standard compliant FDSN StationXML rather than
                  doctoring an FDSN standard?

                  You do not unilaterally decide what compliant FDSN StationXML is. As you well know I have made a proposal to the FDSN and asked for clarity on this issue, seems worth knowing where we are going.

                  If the empty string ID representation is adopted it would would, in
                  effect, mean that the DMC would need to change its metadata service
                  and (more importantly) all users of the DMC's metadata service would
                  need to transition to a new metadata channel naming scheme. This is
                  certainly not out of the question, but it is not something we would
                  do without careful consideration. I do not find the two-space
                  strings all that great, but they are here and something the DMC and
                  users of the DMC have dealt with. Issues have been identified with
                  empty location IDs by us and our users. If DMC is going to change,
                  and push the change on all users of the DMC's StationXML, it would be
                  much more compelling to have a solution that addresses the low level
                  issues.

                  Did you read my email of Thursday, 18:43 UTC? Following the ideas I
                  outlined there, you are technically *not* required to change any of your
                  servers. Only a few client codes are actually affected and even I was
                  able to make the changes in one of those in 10 minutes.

                  You miss the main issue: the metadata is incompatible, some servers must change. There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, in fact it is an artifact of the same “anachronism” that you claim to dislike, bringing us back to SEED-like parsing.

                  Of course, in
                  total it will take longer, but if specific problematic cases related to
                  parsing are identified and discussed, I am sure solutions can be found
                  quickly. We have this list, we have skilled and enthusiastic people
                  working on this, so why not use this as a platform even for more
                  technical discussions? Or how about creating a "developer's corner"
                  webservices-devel or so?

                  Thanks for the suggestions. Technical discussions in other sub-threads.

                  Chad

                  Cheers
                  Joachim
                  _______________________________________________
                  webservices mailing list
                  webservices<at>iris.washington.edu
                  http://www.iris.washington.edu/mailman/listinfo/webservices



          • Philip Crotwell [07/25/14 15:35]:
            It sounds like you are saying "change is hard, so we shouldn't do it".

            That depends very much on the kind of change I would say. The change
            that is currently being discussed is a hack that might help XML parser
            developers, with hefty repercussions otherwise.

            If that is the change, it indeed shouldn't be done.

            What I would highly welcome and support is a mature, future-proof
            channel naming concept (involving network codes, too!) with a clear
            implementation roadmap. There have been attempts in this direction, led
            by the USGS and the ISC, but they are not reflected in current FDSN
            StationXML.

            Cheers
            Joachim


    • Hi Yazan,

      (passing along our in-person conversation for the list)

      I do not think allowing a null or optional location ID is a good idea, here is why: in SEED there is always a location ID (the two-byte field cannot be left out), it is always known; when it is empty it is still a specific location ID. Allowing optional location ID in XML leaves a translation from StationXML to SEED a bit ambiguous. The spec would have to clarify that "not present" always means the empty location ID in SEED, I find this translation not nearly as clear and obvious as having a real value present.

      As you say, many parsers will have problems with "" or " ". It should not be up to every reader (e.g. converters) of StationXML to properly interpret the multiple possible results coming out of any parser, the formats should have a unique and unambiguous mapping.

      Chad

      On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:

      Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
      <Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
      vs
      <Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
      vs
      <Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">

      It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.


      If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.

      Yazan


      On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:

      Hello WS users and developers,

      A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

      Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

      Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

      There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

      Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

      As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

      Here are the options being considered for mapping an empty location ID in SEED to StationXML:

      1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

      2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

      3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

      All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

      In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

      Thanks for reading this far. Your opinion and input is appreciated.

      regards,
      Chad


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices

      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices


  • Hi Chad/Philip,

    thanks for reviving this discussion on the appropriate mailing list.

    Chad Trabant [07/23/14 19:30]:
    Some background: In the SEED channel naming scheme there is a
    hierarchy of network, station, location and channel identifiers. Of
    these, it is only the location ID that is commonly accepted to be
    empty. In the SEED format the location ID is a two-character field,
    where the value is left justified and padded with spaces if needed.
    When the value is empty the field is simply two spaces of padding.

    Historically, and presumably to avoid having an empty location ID,
    the DMC has represented “empty” location IDs as a string of two
    spaces.

    Note that the padding spaces do not form part of the location code
    string itself, according to the SEED specification, which only allows
    alphanumeric characters.

    Actually the location code is treated in the SEED specification not
    differently than e.g. a station code, from which trailing spaces are
    removed in every software that I know of.

    BTW, I think the two spaces are not there to avoid having an empty
    location ID, but are a relict from Fortran 77 days. :)

    Following this practice, we express this in StationXML by setting
    the locationCode attribute to a string of two spaces. We have done
    this so long we sometimes forget that it is not compliant with a
    strict reading of SEED, at best it falls into the vagaries of SEED,
    on the other hand we have been doing it for years with no apparent
    problems (in fact it has helpfully avoided an empty core
    identifier).

    On the other hand, even in the IRIS ecosystem the empty location code is
    prominently used as empty string. Not everywhere, but e.g. the
    well-known rdseed program removes the trailing spaces when reading SEED,
    resulting in an empty C string if there are two padding spaces in the
    location code field. A very natural way of dealing with the trailing
    spaces, especially in view of the clear specifications in the SEED
    manual. Also in the IRIS BUD file name convention (e.g. [1]), empty
    location codes become empty strings, with no apparent problems with
    mapping or otherwise.

    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string
    when the SEED value is empty. The justification is that this
    follows the SEED rules of trimming the padding spaces from the
    values.

    Unfortunately this means there are now flavors of StationXML that
    are incompatible in the core channel name identifiers. In other
    words, two StationXML documents for the same SEED channel appear,
    without extra field translation, to be different channels.

    This depends on how you evaluate the location code. If you simply follow
    the SEED specification and always trim the location code, like e.g.
    ObsPy and rdseed do, the problem you describe is avoided altogether.

    Of course, the requirement for removing trailing white space doesn't
    come without the cost of a few more CPU cycles. But if that were an
    issue we wouldn't be using XML, would we? Also, this rule would need to
    be written into the future specification of FDSN StationXML.

    As most of you are users of SEED and StationXML metadata (at some
    level) and some of you have written code to parse these formats and
    manage the data returned by the DMC and other FDSN data centers, we
    are asking for your input regarding the potential solutions.

    Here are the options being considered for mapping an empty location
    ID in SEED to StationXML:

    1) Set locationCode to two spaces. While the DMC and users have
    been using this for a long while, it is not precisely the SEED value
    (but the mapping could be formalized). Also, whitespace in
    attributes does have some theoretical challenges: the wonky rules for
    XML attributes related to whitespace handling require removal of
    spaces in some cases (we have never heard of problems though).

    2) Set locationCode to an empty string. This would match the strict
    value present in SEED, an empty identifier.

    And would be easy to keep compatible with the two spaces.

    This representation is also widely used for a long time already, incl.
    at IRIS (see above).

    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.

    Let's not mix request mechanisms with the data format. Data formats are
    a holy grail whereas request mechanisms change more frequently.

    Suppose we could retrieve full SEED using the web services. Even then it
    would be equally appropriate to use "--" on the request side. But there
    is no justification for breaking data format compatibility just for
    matching particular request mechanisms.

    All of these solutions are viable in that we can make them work in
    code, it is a matter of choosing one for future FDSN metadata, pick
    your poison so to speak.

    In my personal opinion, an empty location ID is an unfortunate quirk
    of SEED that we should rectify in StationXML. An empty identifier
    can be confused for “unknown” if the programmer is not careful,
    which is semantically very different than “set to empty”. The
    two-space strings that the DMC is currently using are also not ideal,
    they are hard for humans to read and potentially weird with XML
    rules. The dashed location ID avoids these issues but requires the
    most change. I also think requiring all readers of StationXML to
    translate (e.g. remove padding) is a bad idea, the values in SEED
    should be uniquely mapped to values in StationXML.

    I share your view that the empty location code is not optimal. However,
    the world is not perfect and the empty location code is a fact we have
    to live with and have been able to live with for decades. Seismologists
    have learned how to handle it. Existing software libraries make the
    empty location code as painless as possible. Technically it is a no-issue.

    The solution to the empty location code is not to incompatibly break a
    data format without a technical reason but only because of aesthetics.
    Empty strings are represented in XML without problems, particularly if
    used in XML attributes. In fact, it is an advantage of a modern XML
    format that we don't need the padding spaces etc. any more.

    Philip Crotwell [07/23/14 20:37]:
    Years ago we had full SEED. Then because of keeping metadata updated,
    we switched to a separation into dataless SEED + miniseed. Now,
    because of the complexities and limitations of dataless SEED, the
    future looks like StationXML + miniseed. I am all for this change,
    but how the location id is resolved really needs to address not just
    what do we do in StationXML, but what do we do in StationXML +
    miniseed.

    I also lean towards "--" for the simple reason that there are so many
    instances where I have been bitten by spaces or nulls. Even though I
    know about this, I still get caught. File names, urls, user gui
    displays, etc all have problems with spaces nor nulls and as a
    practical matter it is harder to see something that isn't there than
    something that is there. Furthermore, using null or space-space is
    really hard as a command line argument in the shell. That said, "--"
    already means "long option name" in many *nix programs, so if we
    were starting from scratch, underscores like "__" might be a better
    choice. The SEED manual already lists underscore as a separate item
    in the flags section (p32), so maybe worth considering.

    In all of the above cases it is the interfaces that have to deal with
    the empty location code. I agree that an empty string is not always easy
    to visualize, but we know how to deal with it. Nothing prevents us from
    using "--" or "__" in GUIs or external formats or input to the fdsnws's.
    I myself use "__" e.g. in pick lists for ease of visualization,
    awk/grep'ing etc.; but that has nothing to do with the XML or SEED
    representation. The same is true for the request formats; as long as the
    user knows how to explicitly specify an empty location code, it's fine.

    But if option 3 is choosen, would there be any possibility of
    amending the SEED spec so that "--" is actually valid within the
    location id field, with the caveat that it is synonymous with
    space-space/null, but "--" is the preferred value?

    This would mean that GE.UGM.--.BHZ and GE.UGM..BHZ are equivalent, in
    fact: identical stream ID's. Technically this is feasible. But are the
    downstream software repercussions, let alone the confusion among the
    data users a price we are willing to pay? I don't think so.

    I realize that doing a global search and replace on a petabyte of
    miniseed data is probably not going to happen, but it would be
    really nice if whatever location id is in StationXML, it is exactly
    2 characters and is the exact same 2 characters as in miniseed.

    On the other hand, the use of XML is a chance to get rid of the fixed
    field values with padding. This may not be relevant today, but it might
    become in the future.

    Frankly the whole idea of making location ids "optional" was a real
    mistake IMHO. I am sure that anyone that has every written code to
    deal with location ids has something that looks like: if (locid ==
    null or locid == "" or locid == " " or locid == "--") then locid =
    "--" which is just a painfully stupid thing to have to do over and
    over and over again. Grumble grumble grumble.:(

    But fortunately you do that only once and wrapping this into a library
    function is a no-brainer.

    On a side note, I am curious to know (technically) under what
    circumstances locid==null would evaluate to true, considering

    <xs:attribute name="locationCode" type="xs:string" use="required"/>

    from the xsd[2].

    Lastly, as far as I can tell the SEED spec doesn't disallow
    null/empty station or channel codes, so addressing that at the same
    time might be wise.

    I haven't come across any of those but there it makes sense. Yet I don't
    think we can or should prevent empty location codes. They are a very
    common reality.

    My $0.02, please pick one string, and only one string, and use it
    everywhere.

    If "only one string" is a requirement, it is probably the strongest
    argument against a change.

    "Only one string" will only work without deviation from the current use
    of SEED location code. We can't recode the archives, let alone the local
    archives users have built for their work over the years. Well,
    technically it could be done, but I think we all agree that we don't
    want to, as this would have to involve not only (Mini)SEED waveform data
    but also meta data and parametric data. How about... QuakeML archives?
    Datalogger firmware? We can't change all of that and if we add e.g. "--"
    to the range of *possible* location codes, we still have to continue to
    "forever" support the other representations in order to be backward
    compatible.

    Generally speaking, it is good to discuss future possibilities for
    channel naming conventions, not only with respect to the location code.
    But the naming should ideally be independent of the used data formats.
    XML is a big step towards becoming less dependent on the limits imposed
    by SEED, but we are not going to get rid of SEED for many years to come.

    Actually we are currently seeking to solve a particular incompatibility
    between FDSN StationXML produced by different services needs to be
    solved, but technically that is much, *much* easier to achieve than the
    introduction of a new and incompatible channel naming. I would welcome
    an intensified discussion on the latter, but not in the context of the
    current FDSN StationXML or web services.

    It's actually quite strange that already now, early after the
    introduction of FDSN StationXML, we are not only choking over minor
    incompatibilities, but are discussing "solutions" to problems that
    apparently noone had noticed they existed before StationXML... Looks
    like shooting at sparrows with cannons, IMO.

    There used to be a IASPEI working group on station codes that even came
    up with a new channel naming "standard"[3], which, however, doesn't seem
    to have gained much acceptance so far. Nevertheless this is the level at
    which changes to channel naming need to be discussed, even though the
    process may be frustratingly slow. But the impact of such a change is
    just too big to be decided ad hoc.


    To summarize:

    We will not find a future-proof channel naming convention quickly.
    Partial changes, especially if incompatible, should be absolutely avoided.

    The particular problem we attempted (and still need) to solve in the
    first place is a location code incompatibility due to differently strict
    adherence to the SEED specification. Not surprisingly I prefer the
    empty-string representation for the empty location code. To be
    pragmatic, I propose the following time line:

    * Accept that at least for a transitional period we have to accept the
    existence of space-space and empty location codes.

    * During a transitional period, don't change the servers that now
    produce space-space location codes, as that would break compatibility
    with some clients. We want to keep compatibility rather than introducing
    new incompatibility.

    * Instead update the clients to accept both space-space and empty
    location codes by trimming trailing spaces if present. This is a
    relatively minor change and IIRC this is on IRIS's agenda already, which
    is highly appreciated.

    At this point in time, interoperability is restored, even without
    server-side changes. This is important as it may take quite some time
    for the users to actually upgrade their clients; but it doesn't hurt anyone.

    * Finally the server upgrades where needed. The decision as to when to
    upgrade the server side can be made once it is considered appropriate;
    there is absolutely no hurry from the client side.

    The needed changes for the above proposal are very small compared to the
    huge changes that would be required at every level to implement a new
    channel naming convention. This may (and hopefully will) take place some
    time in the future, but it requires a lot of preparation and
    coordination. I am pretty sure that we will have a considerable number
    of beers in the meantime. ;)

    Besides the beers, we should focus on finalizing the specification of
    FDSN StationXML. There are too many under-defined elements even in the
    xsd and the risk of serious incompatibilities is very high.

    Cheers
    Joachim


    [1] http://www.iris.edu/bud_stuff/bud_dir/GE/UGM/UGM.GE..BHZ.2014.205
    [2] http://www.fdsn.org/xml/station/fdsn-station-1.0.xsd
    [3] http://www.isc.ac.uk/registries/download/IR_implementation.pdf

  • I've been following this thread, and thought it was time to chime in.

    IMHO, the FDSN web services should follow the SEED convention.
    The SEED convention states that station, network, channel, and location
    are all blank-padded fields of fixed lengths.
    To me, this means that that we should either use the full blank-padded
    fields for ALL of these identifiers, or for none of them.

    eg:

    <Network code="G " >
    <Station code="KIP ">
    <Channel locationCode=" " code="BHZ">

    or

    <Network code="G" >
    <Station code="KIP">
    <Channel locationCode="" code="BHZ">

    Personally I think the latter (blank trimmed) is better.

    I agree that the blank location code is a pain when dealing with
    Oracle, white-space delimited fields such as command lines, etc,
    but unless we change the SEED convention, I don't see that making
    an aliases of "-" or "--" in FDSN station XML improves the situation.

    AFAIK, the ONLY reason that we struggle with the two-blank issue is
    that certain software (eg Oracle) cannot distinguish between the
    the empty string (string of length 0) and NULL. Therefore, the DMC,
    NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
    for the location code.

    Unless we propose to change the SEED standard, all of our data in
    our archives, and all of our current acquisition systems, I think
    that we have to live with "emtpy" location codes.

    I have not seen any compelling argument for representing a blank (empty)
    location code in FDSN station XML as anything but the empty string.

    If you want to have "" and " " be equivalent in FDSN station XML,
    you can simply change the schema definition of the field to be a "token"
    rather than a "string", in which case any representation with blanks will
    be reduced to the empty string. Problem solved?

    I note that the NCEDC implementation currently uses 1 blank " "
    for empty location code. I have no problem changing this if we can
    agree on a convention.

    I also note ironically that the TA network run by IRIS is one of the
    largest networks in terms of stations, and uses blank location codes.

    My 2 cents...

    - Doug N

    On 07/23/2014 10:30 AM, Chad Trabant wrote:

    Hello WS users and developers,

    A recent discussion between FDSN data centers is centered on
    representation of empty location IDs in StationXML, the default
    format returned by the fdsnws-station web service. The DMC may be
    changing how it represents location ID in XML and text formats based
    on these discussions. We are asking for input as any such change will
    effect users of our metadata service.

    Some background: In the SEED channel naming scheme there is a
    hierarchy of network, station, location and channel identifiers. Of
    these, it is only the location ID that is commonly accepted to be
    empty. In the SEED format the location ID is a two-character field,
    where the value is left justified and padded with spaces if needed.
    When the value is empty the field is simply two spaces of padding.

    Historically, and presumably to avoid having an empty location ID,
    the DMC has represented “empty” location IDs as a string of two
    spaces. Following this practice, we express this in StationXML by
    setting the locationCode attribute to a string of two spaces. We have
    done this so long we sometimes forget that it is not compliant with a
    strict reading of SEED, at best it falls into the vagaries of SEED,
    on the other hand we have been doing it for years with no apparent
    problems (in fact it has helpfully avoided an empty core
    identifier).

    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string
    when the SEED value is empty. The justification is that this follows
    the SEED rules of trimming the padding spaces from the values.

    Unfortunately this means there are now flavors of StationXML that are
    incompatible in the core channel name identifiers. In other words,
    two StationXML documents for the same SEED channel appear, without
    extra field translation, to be different channels.

    As most of you are users of SEED and StationXML metadata (at some
    level) and some of you have written code to parse these formats and
    manage the data returned by the DMC and other FDSN data centers, we
    are asking for your input regarding the potential solutions.

    Here are the options being considered for mapping an empty location
    ID in SEED to StationXML:

    1) Set locationCode to two spaces. While the DMC and users have been
    using this for a long while, it is not precisely the SEED value (but
    the mapping could be formalized). Also, whitespace in attributes does
    have some theoretical challenges: the wonky rules for XML attributes
    related to whitespace handling require removal of spaces in some
    cases (we have never heard of problems though).

    2) Set locationCode to an empty string. This would match the strict
    value present in SEED, an empty identifier.

    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.

    All of these solutions are viable in that we can make them work in
    code, it is a matter of choosing one for future FDSN metadata, pick
    your poison so to speak.

    In my personal opinion, an empty location ID is an unfortunate quirk
    of SEED that we should rectify in StationXML. An empty identifier can
    be confused for “unknown” if the programmer is not careful, which is
    semantically very different than “set to empty”. The two-space
    strings that the DMC is currently using are also not ideal, they are
    hard for humans to read and potentially weird with XML rules. The
    dashed location ID avoids these issues but requires the most change.
    I also think requiring all readers of StationXML to translate (e.g.
    remove padding) is a bad idea, the values in SEED should be uniquely
    mapped to values in StationXML.

    Thanks for reading this far. Your opinion and input is appreciated.

    regards,
    Chad


    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices


    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)



    • SEED is a fixed width record format, most likely the reason for blank
      padded fields. I'd recommend not carrying that over into the XML format.

      The primary purpose of the channel code is to be a unique identifier, and
      an empty string is distinct from any non-empty value.


      Jeremy




      On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>
      wrote:

      I've been following this thread, and thought it was time to chime in.

      IMHO, the FDSN web services should follow the SEED convention.
      The SEED convention states that station, network, channel, and location
      are all blank-padded fields of fixed lengths.
      To me, this means that that we should either use the full blank-padded
      fields for ALL of these identifiers, or for none of them.

      eg:

      <Network code="G " >
      <Station code="KIP ">
      <Channel locationCode=" " code="BHZ">

      or

      <Network code="G" >
      <Station code="KIP">
      <Channel locationCode="" code="BHZ">

      Personally I think the latter (blank trimmed) is better.

      I agree that the blank location code is a pain when dealing with
      Oracle, white-space delimited fields such as command lines, etc,
      but unless we change the SEED convention, I don't see that making
      an aliases of "-" or "--" in FDSN station XML improves the situation.

      AFAIK, the ONLY reason that we struggle with the two-blank issue is
      that certain software (eg Oracle) cannot distinguish between the
      the empty string (string of length 0) and NULL. Therefore, the DMC,
      NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
      for the location code.

      Unless we propose to change the SEED standard, all of our data in
      our archives, and all of our current acquisition systems, I think
      that we have to live with "emtpy" location codes.

      I have not seen any compelling argument for representing a blank (empty)
      location code in FDSN station XML as anything but the empty string.

      If you want to have "" and " " be equivalent in FDSN station XML,
      you can simply change the schema definition of the field to be a "token"
      rather than a "string", in which case any representation with blanks will
      be reduced to the empty string. Problem solved?

      I note that the NCEDC implementation currently uses 1 blank " "
      for empty location code. I have no problem changing this if we can
      agree on a convention.

      I also note ironically that the TA network run by IRIS is one of the
      largest networks in terms of stations, and uses blank location codes.

      My 2 cents...

      - Doug N


      On 07/23/2014 10:30 AM, Chad Trabant wrote:


      Hello WS users and developers,

      A recent discussion between FDSN data centers is centered on
      representation of empty location IDs in StationXML, the default
      format returned by the fdsnws-station web service. The DMC may be
      changing how it represents location ID in XML and text formats based
      on these discussions. We are asking for input as any such change will
      effect users of our metadata service.

      Some background: In the SEED channel naming scheme there is a
      hierarchy of network, station, location and channel identifiers. Of
      these, it is only the location ID that is commonly accepted to be
      empty. In the SEED format the location ID is a two-character field,
      where the value is left justified and padded with spaces if needed.
      When the value is empty the field is simply two spaces of padding.

      Historically, and presumably to avoid having an empty location ID,
      the DMC has represented “empty” location IDs as a string of two
      spaces. Following this practice, we express this in StationXML by
      setting the locationCode attribute to a string of two spaces. We have
      done this so long we sometimes forget that it is not compliant with a
      strict reading of SEED, at best it falls into the vagaries of SEED,
      on the other hand we have been doing it for years with no apparent
      problems (in fact it has helpfully avoided an empty core
      identifier).

      There now exists another fdsnws-station implementation that returns
      StationXML with the locationCode attribute set to an empty string
      when the SEED value is empty. The justification is that this follows
      the SEED rules of trimming the padding spaces from the values.

      Unfortunately this means there are now flavors of StationXML that are
      incompatible in the core channel name identifiers. In other words,
      two StationXML documents for the same SEED channel appear, without
      extra field translation, to be different channels.

      As most of you are users of SEED and StationXML metadata (at some
      level) and some of you have written code to parse these formats and
      manage the data returned by the DMC and other FDSN data centers, we
      are asking for your input regarding the potential solutions.

      Here are the options being considered for mapping an empty location
      ID in SEED to StationXML:

      1) Set locationCode to two spaces. While the DMC and users have been
      using this for a long while, it is not precisely the SEED value (but
      the mapping could be formalized). Also, whitespace in attributes does
      have some theoretical challenges: the wonky rules for XML attributes
      related to whitespace handling require removal of spaces in some
      cases (we have never heard of problems though).

      2) Set locationCode to an empty string. This would match the strict
      value present in SEED, an empty identifier.

      3) Set locationCode to “--“ (two dashes). This avoids issues with
      whitespace in XML attribute values and avoids issues with an empty
      identifier. Also, this matches the request mechanisms where “--“ is
      accepted as a synonym for an empty location ID.

      All of these solutions are viable in that we can make them work in
      code, it is a matter of choosing one for future FDSN metadata, pick
      your poison so to speak.

      In my personal opinion, an empty location ID is an unfortunate quirk
      of SEED that we should rectify in StationXML. An empty identifier can
      be confused for “unknown” if the programmer is not careful, which is
      semantically very different than “set to empty”. The two-space
      strings that the DMC is currently using are also not ideal, they are
      hard for humans to read and potentially weird with XML rules. The
      dashed location ID avoids these issues but requires the most change.
      I also think requiring all readers of StationXML to translate (e.g.
      remove padding) is a bad idea, the values in SEED should be uniquely
      mapped to values in StationXML.

      Thanks for reading this far. Your opinion and input is appreciated.

      regards,
      Chad


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices


      --
      ------------------------------------------------------------------------
      Doug Neuhauser University of California, Berkeley
      doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
      Office: 510-642-0931 215 McCone Hall # 4760
      Fax: 510-643-5811 Berkeley, CA 94720-4760
      Remote: 530-752-5615 (Wed,Fri)



      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices



      • No argument that the padding spaces should be left behind.

        Assuming we are unwilling to actually address the blank location IDs we should consider making the attribute optional. It has been suggested by a few folks already.

        One could argue that from a purists' point of view a blank location in SEED is an unset location, so the purist mapping of this is to leave it unset in XML. This would be simply done by changing the schema to make the attribute optional. (this is not my favorite idea, I've argued against it, but at least it is cleanly follows SEED common practice).

        Allowing the value to be optional in SEED (within the limitations of a fixed width format) and required in StationXML is trying to eat your cake and keep it too. We do not force other optional string values of SEED to be required in the XML, so why make an exception for location?

        Chad

        On Aug 12, 2014, at 12:18 PM, Fee, Jeremy <jmfee<at>usgs.gov> wrote:

        SEED is a fixed width record format, most likely the reason for blank padded fields. I'd recommend not carrying that over into the XML format.

        The primary purpose of the channel code is to be a unique identifier, and an empty string is distinct from any non-empty value.

        Jeremy


        On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
        I've been following this thread, and thought it was time to chime in.

        IMHO, the FDSN web services should follow the SEED convention.
        The SEED convention states that station, network, channel, and location
        are all blank-padded fields of fixed lengths.
        To me, this means that that we should either use the full blank-padded
        fields for ALL of these identifiers, or for none of them.

        eg:

        <Network code="G " >
        <Station code="KIP ">
        <Channel locationCode=" " code="BHZ">

        or

        <Network code="G" >
        <Station code="KIP">
        <Channel locationCode="" code="BHZ">

        Personally I think the latter (blank trimmed) is better.

        I agree that the blank location code is a pain when dealing with
        Oracle, white-space delimited fields such as command lines, etc,
        but unless we change the SEED convention, I don't see that making
        an aliases of "-" or "--" in FDSN station XML improves the situation.

        AFAIK, the ONLY reason that we struggle with the two-blank issue is
        that certain software (eg Oracle) cannot distinguish between the
        the empty string (string of length 0) and NULL. Therefore, the DMC,
        NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
        for the location code.

        Unless we propose to change the SEED standard, all of our data in
        our archives, and all of our current acquisition systems, I think
        that we have to live with "emtpy" location codes.

        I have not seen any compelling argument for representing a blank (empty)
        location code in FDSN station XML as anything but the empty string.

        If you want to have "" and " " be equivalent in FDSN station XML,
        you can simply change the schema definition of the field to be a "token"
        rather than a "string", in which case any representation with blanks will
        be reduced to the empty string. Problem solved?

        I note that the NCEDC implementation currently uses 1 blank " "
        for empty location code. I have no problem changing this if we can
        agree on a convention.

        I also note ironically that the TA network run by IRIS is one of the
        largest networks in terms of stations, and uses blank location codes.

        My 2 cents...

        - Doug N


        On 07/23/2014 10:30 AM, Chad Trabant wrote:

        Hello WS users and developers,

        A recent discussion between FDSN data centers is centered on
        representation of empty location IDs in StationXML, the default
        format returned by the fdsnws-station web service. The DMC may be
        changing how it represents location ID in XML and text formats based
        on these discussions. We are asking for input as any such change will
        effect users of our metadata service.

        Some background: In the SEED channel naming scheme there is a
        hierarchy of network, station, location and channel identifiers. Of
        these, it is only the location ID that is commonly accepted to be
        empty. In the SEED format the location ID is a two-character field,
        where the value is left justified and padded with spaces if needed.
        When the value is empty the field is simply two spaces of padding.

        Historically, and presumably to avoid having an empty location ID,
        the DMC has represented “empty” location IDs as a string of two
        spaces. Following this practice, we express this in StationXML by
        setting the locationCode attribute to a string of two spaces. We have
        done this so long we sometimes forget that it is not compliant with a
        strict reading of SEED, at best it falls into the vagaries of SEED,
        on the other hand we have been doing it for years with no apparent
        problems (in fact it has helpfully avoided an empty core
        identifier).

        There now exists another fdsnws-station implementation that returns
        StationXML with the locationCode attribute set to an empty string
        when the SEED value is empty. The justification is that this follows
        the SEED rules of trimming the padding spaces from the values.

        Unfortunately this means there are now flavors of StationXML that are
        incompatible in the core channel name identifiers. In other words,
        two StationXML documents for the same SEED channel appear, without
        extra field translation, to be different channels.

        As most of you are users of SEED and StationXML metadata (at some
        level) and some of you have written code to parse these formats and
        manage the data returned by the DMC and other FDSN data centers, we
        are asking for your input regarding the potential solutions.

        Here are the options being considered for mapping an empty location
        ID in SEED to StationXML:

        1) Set locationCode to two spaces. While the DMC and users have been
        using this for a long while, it is not precisely the SEED value (but
        the mapping could be formalized). Also, whitespace in attributes does
        have some theoretical challenges: the wonky rules for XML attributes
        related to whitespace handling require removal of spaces in some
        cases (we have never heard of problems though).

        2) Set locationCode to an empty string. This would match the strict
        value present in SEED, an empty identifier.

        3) Set locationCode to “--“ (two dashes). This avoids issues with
        whitespace in XML attribute values and avoids issues with an empty
        identifier. Also, this matches the request mechanisms where “--“ is
        accepted as a synonym for an empty location ID.

        All of these solutions are viable in that we can make them work in
        code, it is a matter of choosing one for future FDSN metadata, pick
        your poison so to speak.

        In my personal opinion, an empty location ID is an unfortunate quirk
        of SEED that we should rectify in StationXML. An empty identifier can
        be confused for “unknown” if the programmer is not careful, which is
        semantically very different than “set to empty”. The two-space
        strings that the DMC is currently using are also not ideal, they are
        hard for humans to read and potentially weird with XML rules. The
        dashed location ID avoids these issues but requires the most change.
        I also think requiring all readers of StationXML to translate (e.g.
        remove padding) is a bad idea, the values in SEED should be uniquely
        mapped to values in StationXML.

        Thanks for reading this far. Your opinion and input is appreciated.

        regards,
        Chad


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        --
        ------------------------------------------------------------------------
        Doug Neuhauser University of California, Berkeley
        doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
        Office: 510-642-0931 215 McCone Hall # 4760
        Fax: 510-643-5811 Berkeley, CA 94720-4760
        Remote: 530-752-5615 (Wed,Fri)



        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices

        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        • Hi Chad,

          Chad Trabant wrote on 08/12/2014 11:41 PM:
          No argument that the padding spaces should be left behind.

          Good.

          Assuming we are unwilling to actually address the blank location IDs we
          should consider making the attribute optional. It has been suggested by
          a few folks already.

          The "empty" location code format does need to be addressed, because even
          if an attribute is optional, it may be present and it may represent an
          empty location code.

          Making the location code optional is of course possible. Actually in
          QuakeML it is optional, too. There the reason and semantics are special,
          though, because in parametric data like picks the location code *may* be
          unknown indeed (same for the channel code).

          By contrast, in SEED or StationXML the location code is not unknown.

          One could argue that from a purists' point of view a blank location in
          SEED is an unset location, so the purist mapping of this is to leave it
          unset in XML.

          A location code that is encoded in SEED as two spaces is not unset but
          "empty" and still a string. The purist mapping is therefore the empty
          string.

          That said we can make the location code optional in XML, but isn't it
          really any more than merely a cosmetic trick to hide an "ugly", empty
          location code in the XML?

          Conforming clients need to be prepared to receive an "empty" (not
          unset!) value anyway. In other words, neither of

          <Channel locationCode="" ...
          <Channel locationCode=" " ...
          <Channel locationCode="--" ...

          is forbidden by just making locationCode optional. You can leave it
          unset but you don't have to. A parser still has to accept at least ""
          and " " (in fact " ", too, as we have just learned). And since
          "explicit is better than implicit" it seems wise not to make the
          location code optional. Though technically it is feasible provided there
          is a clear default value, as otherwise a missing location code might (at
          least in principle) be mistaken as unknown like in QuakeML.

          This would be simply done by changing the schema to make
          the attribute optional. (this is not my favorite idea, I've argued
          against it, but at least it is cleanly follows SEED common practice).

          I would prefer to make optional only those fields, for which information
          may indeed be unknown, like a digitizer serial number.

          Allowing the value to be optional in SEED (within the limitations of a
          fixed width format) and required in StationXML is trying to eat your
          cake and keep it too. We do not force other optional string values of
          SEED to be required in the XML, so why make an exception for location?

          In SEED, the location code is always present even if it is an empty
          string. It's not optional.

          But even "optional" still implies the option to explicitly specify an
          "empty" location code. And the question of how to properly represent
          this in XML is still an open one (even though opinions seem to converge
          towards the empty string).

          Yazan Suleiman wrote on 08/13/2014 06:17 AM:
          In my opinion StationXml shouldn’t force me to provide a value for
          an attribute that is unknown to me.

          But empty is not the same as unknown. The "empty" location code in SEED
          does carry a specific information that needs to be represented in XML,
          whether we like the value or not.

          While the purpose of StationXml schema is to map between SEED and
          XML, limitations of Seed shouldn’t be carried over.

          +1

          Cheers
          Joachim


    • Hi Doug,

      Thanks for your 2 cents.

      Regarding only certain software being the problem with blank location, I guess you did not like any of the others pointed out here?
      http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html

      If you want a non-Oracle database example, the 'ltree' data type in Postgres is a natural fit for N.S.L.C hierarchal data and it cannot take a blank identifier either. I do not see how the number of pain points with empty identifiers will not grow over time.

      The proposal for a "--" location ID was to change SEED by starting with StationXML as a transition. The first step could be done without changing all the miniSEED in all the archives, the next step could be done with a future revision in miniSEED. This would required mapping, which we are already doing for requests and will continue to do indefinitely. For sure this would be non-trivial change over time, the question is whether it is worth it or not.

      If we are going to continue to shoot ourselves in the foot with unset location IDs let's do so with clear eyes, the problems are not limited to esoteric software or use cases. Also, a blank string is not the only choice, more on that next.

      Chad

      PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.

      On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:

      I've been following this thread, and thought it was time to chime in.

      IMHO, the FDSN web services should follow the SEED convention.
      The SEED convention states that station, network, channel, and location
      are all blank-padded fields of fixed lengths.
      To me, this means that that we should either use the full blank-padded
      fields for ALL of these identifiers, or for none of them.

      eg:

      <Network code="G " >
      <Station code="KIP ">
      <Channel locationCode=" " code="BHZ">

      or

      <Network code="G" >
      <Station code="KIP">
      <Channel locationCode="" code="BHZ">

      Personally I think the latter (blank trimmed) is better.

      I agree that the blank location code is a pain when dealing with
      Oracle, white-space delimited fields such as command lines, etc,
      but unless we change the SEED convention, I don't see that making
      an aliases of "-" or "--" in FDSN station XML improves the situation.

      AFAIK, the ONLY reason that we struggle with the two-blank issue is
      that certain software (eg Oracle) cannot distinguish between the
      the empty string (string of length 0) and NULL. Therefore, the DMC,
      NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
      for the location code.

      Unless we propose to change the SEED standard, all of our data in
      our archives, and all of our current acquisition systems, I think
      that we have to live with "emtpy" location codes.

      I have not seen any compelling argument for representing a blank (empty)
      location code in FDSN station XML as anything but the empty string.

      If you want to have "" and " " be equivalent in FDSN station XML,
      you can simply change the schema definition of the field to be a "token"
      rather than a "string", in which case any representation with blanks will
      be reduced to the empty string. Problem solved?

      I note that the NCEDC implementation currently uses 1 blank " "
      for empty location code. I have no problem changing this if we can
      agree on a convention.

      I also note ironically that the TA network run by IRIS is one of the
      largest networks in terms of stations, and uses blank location codes.

      My 2 cents...

      - Doug N

      On 07/23/2014 10:30 AM, Chad Trabant wrote:

      Hello WS users and developers,

      A recent discussion between FDSN data centers is centered on
      representation of empty location IDs in StationXML, the default
      format returned by the fdsnws-station web service. The DMC may be
      changing how it represents location ID in XML and text formats based
      on these discussions. We are asking for input as any such change will
      effect users of our metadata service.

      Some background: In the SEED channel naming scheme there is a
      hierarchy of network, station, location and channel identifiers. Of
      these, it is only the location ID that is commonly accepted to be
      empty. In the SEED format the location ID is a two-character field,
      where the value is left justified and padded with spaces if needed.
      When the value is empty the field is simply two spaces of padding.

      Historically, and presumably to avoid having an empty location ID,
      the DMC has represented “empty” location IDs as a string of two
      spaces. Following this practice, we express this in StationXML by
      setting the locationCode attribute to a string of two spaces. We have
      done this so long we sometimes forget that it is not compliant with a
      strict reading of SEED, at best it falls into the vagaries of SEED,
      on the other hand we have been doing it for years with no apparent
      problems (in fact it has helpfully avoided an empty core
      identifier).

      There now exists another fdsnws-station implementation that returns
      StationXML with the locationCode attribute set to an empty string
      when the SEED value is empty. The justification is that this follows
      the SEED rules of trimming the padding spaces from the values.

      Unfortunately this means there are now flavors of StationXML that are
      incompatible in the core channel name identifiers. In other words,
      two StationXML documents for the same SEED channel appear, without
      extra field translation, to be different channels.

      As most of you are users of SEED and StationXML metadata (at some
      level) and some of you have written code to parse these formats and
      manage the data returned by the DMC and other FDSN data centers, we
      are asking for your input regarding the potential solutions.

      Here are the options being considered for mapping an empty location
      ID in SEED to StationXML:

      1) Set locationCode to two spaces. While the DMC and users have been
      using this for a long while, it is not precisely the SEED value (but
      the mapping could be formalized). Also, whitespace in attributes does
      have some theoretical challenges: the wonky rules for XML attributes
      related to whitespace handling require removal of spaces in some
      cases (we have never heard of problems though).

      2) Set locationCode to an empty string. This would match the strict
      value present in SEED, an empty identifier.

      3) Set locationCode to “--“ (two dashes). This avoids issues with
      whitespace in XML attribute values and avoids issues with an empty
      identifier. Also, this matches the request mechanisms where “--“ is
      accepted as a synonym for an empty location ID.

      All of these solutions are viable in that we can make them work in
      code, it is a matter of choosing one for future FDSN metadata, pick
      your poison so to speak.

      In my personal opinion, an empty location ID is an unfortunate quirk
      of SEED that we should rectify in StationXML. An empty identifier can
      be confused for “unknown” if the programmer is not careful, which is
      semantically very different than “set to empty”. The two-space
      strings that the DMC is currently using are also not ideal, they are
      hard for humans to read and potentially weird with XML rules. The
      dashed location ID avoids these issues but requires the most change.
      I also think requiring all readers of StationXML to translate (e.g.
      remove padding) is a bad idea, the values in SEED should be uniquely
      mapped to values in StationXML.

      Thanks for reading this far. Your opinion and input is appreciated.

      regards,
      Chad


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices


      --
      ------------------------------------------------------------------------
      Doug Neuhauser University of California, Berkeley
      doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
      Office: 510-642-0931 215 McCone Hall # 4760
      Fax: 510-643-5811 Berkeley, CA 94720-4760
      Remote: 530-752-5615 (Wed,Fri)


      _______________________________________________
      webservices mailing list
      webservices<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/webservices





      • On 08/12/2014 02:31 PM, Chad Trabant wrote:

        Hi Doug,

        Thanks for your 2 cents.

        Regarding only certain software being the problem with blank
        location, I guess you did not like any of the others pointed out
        here?
        http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html

        Most of these arguments are not related directly to stationxml, but to the
        empty location code. However, those that are related to empty location code
        appear to be the inability to distinguish between an attribute that is not
        supplied vs an empty string attribute. If you make the LocationCode optional,
        it seems like you are in the same boat. If it is not specified, what do
        you use for location code? blank-blank? Then that is the same logic you
        use if your query does not return a location code.

        Your example of:
        %{net}{sta}{loc}{chan} = "some lvalue"
        is not a good one, because of no separation between components.
        How do you distinguish between
        net = G, sta = ABCD
        and net = GA, sta = BCD?

        If you want a non-Oracle database example, the 'ltree' data type in
        Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
        take a blank identifier either. I do not see how the number of pain
        points with empty identifiers will not grow over time.

        The proposal for a "--" location ID was to change SEED by starting
        with StationXML as a transition. The first step could be done without
        changing all the miniSEED in all the archives, the next step could be
        done with a future revision in miniSEED. This would required mapping,
        which we are already doing for requests and will continue to do
        indefinitely. For sure this would be non-trivial change over time,
        the question is whether it is worth it or not.

        I don't see anything in your original proposal about changing SEED.
        I only see a proposal to change the SEED representation in StationXML.

        If we are going to continue to shoot ourselves in the foot with unset
        location IDs let's do so with clear eyes, the problems are not
        limited to esoteric software or use cases. Also, a blank string is
        not the only choice, more on that next.

        I agree with the above statement. However, trying to address the issue
        just within StationXML I think is just another bandaid, and I don't see
        why the StationXML needs this bandaid.

        Since StationXML does not appear to need this bandaid, I don't understand
        the need.

        IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

        - Doug N

        Chad

        PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.

        On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:

        I've been following this thread, and thought it was time to chime in.

        IMHO, the FDSN web services should follow the SEED convention.
        The SEED convention states that station, network, channel, and location
        are all blank-padded fields of fixed lengths.
        To me, this means that that we should either use the full blank-padded
        fields for ALL of these identifiers, or for none of them.

        eg:

        <Network code="G " >
        <Station code="KIP ">
        <Channel locationCode=" " code="BHZ">

        or

        <Network code="G" >
        <Station code="KIP">
        <Channel locationCode="" code="BHZ">

        Personally I think the latter (blank trimmed) is better.

        I agree that the blank location code is a pain when dealing with
        Oracle, white-space delimited fields such as command lines, etc,
        but unless we change the SEED convention, I don't see that making
        an aliases of "-" or "--" in FDSN station XML improves the situation.

        AFAIK, the ONLY reason that we struggle with the two-blank issue is
        that certain software (eg Oracle) cannot distinguish between the
        the empty string (string of length 0) and NULL. Therefore, the DMC,
        NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
        for the location code.

        Unless we propose to change the SEED standard, all of our data in
        our archives, and all of our current acquisition systems, I think
        that we have to live with "emtpy" location codes.

        I have not seen any compelling argument for representing a blank (empty)
        location code in FDSN station XML as anything but the empty string.

        If you want to have "" and " " be equivalent in FDSN station XML,
        you can simply change the schema definition of the field to be a "token"
        rather than a "string", in which case any representation with blanks will
        be reduced to the empty string. Problem solved?

        I note that the NCEDC implementation currently uses 1 blank " "
        for empty location code. I have no problem changing this if we can
        agree on a convention.

        I also note ironically that the TA network run by IRIS is one of the
        largest networks in terms of stations, and uses blank location codes.

        My 2 cents...

        - Doug N

        On 07/23/2014 10:30 AM, Chad Trabant wrote:

        Hello WS users and developers,

        A recent discussion between FDSN data centers is centered on
        representation of empty location IDs in StationXML, the default
        format returned by the fdsnws-station web service. The DMC may be
        changing how it represents location ID in XML and text formats based
        on these discussions. We are asking for input as any such change will
        effect users of our metadata service.

        Some background: In the SEED channel naming scheme there is a
        hierarchy of network, station, location and channel identifiers. Of
        these, it is only the location ID that is commonly accepted to be
        empty. In the SEED format the location ID is a two-character field,
        where the value is left justified and padded with spaces if needed.
        When the value is empty the field is simply two spaces of padding.

        Historically, and presumably to avoid having an empty location ID,
        the DMC has represented “empty” location IDs as a string of two
        spaces. Following this practice, we express this in StationXML by
        setting the locationCode attribute to a string of two spaces. We have
        done this so long we sometimes forget that it is not compliant with a
        strict reading of SEED, at best it falls into the vagaries of SEED,
        on the other hand we have been doing it for years with no apparent
        problems (in fact it has helpfully avoided an empty core
        identifier).

        There now exists another fdsnws-station implementation that returns
        StationXML with the locationCode attribute set to an empty string
        when the SEED value is empty. The justification is that this follows
        the SEED rules of trimming the padding spaces from the values.

        Unfortunately this means there are now flavors of StationXML that are
        incompatible in the core channel name identifiers. In other words,
        two StationXML documents for the same SEED channel appear, without
        extra field translation, to be different channels.

        As most of you are users of SEED and StationXML metadata (at some
        level) and some of you have written code to parse these formats and
        manage the data returned by the DMC and other FDSN data centers, we
        are asking for your input regarding the potential solutions.

        Here are the options being considered for mapping an empty location
        ID in SEED to StationXML:

        1) Set locationCode to two spaces. While the DMC and users have been
        using this for a long while, it is not precisely the SEED value (but
        the mapping could be formalized). Also, whitespace in attributes does
        have some theoretical challenges: the wonky rules for XML attributes
        related to whitespace handling require removal of spaces in some
        cases (we have never heard of problems though).

        2) Set locationCode to an empty string. This would match the strict
        value present in SEED, an empty identifier.

        3) Set locationCode to “--“ (two dashes). This avoids issues with
        whitespace in XML attribute values and avoids issues with an empty
        identifier. Also, this matches the request mechanisms where “--“ is
        accepted as a synonym for an empty location ID.

        All of these solutions are viable in that we can make them work in
        code, it is a matter of choosing one for future FDSN metadata, pick
        your poison so to speak.

        In my personal opinion, an empty location ID is an unfortunate quirk
        of SEED that we should rectify in StationXML. An empty identifier can
        be confused for “unknown” if the programmer is not careful, which is
        semantically very different than “set to empty”. The two-space
        strings that the DMC is currently using are also not ideal, they are
        hard for humans to read and potentially weird with XML rules. The
        dashed location ID avoids these issues but requires the most change.
        I also think requiring all readers of StationXML to translate (e.g.
        remove padding) is a bad idea, the values in SEED should be uniquely
        mapped to values in StationXML.

        Thanks for reading this far. Your opinion and input is appreciated.

        regards,
        Chad


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        --
        ------------------------------------------------------------------------
        Doug Neuhauser University of California, Berkeley
        doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
        Office: 510-642-0931 215 McCone Hall # 4760
        Fax: 510-643-5811 Berkeley, CA 94720-4760
        Remote: 530-752-5615 (Wed,Fri)


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        _______________________________________________
        webservices mailing list
        webservices<at>iris.washington.edu
        http://www.iris.washington.edu/mailman/listinfo/webservices


        --
        ------------------------------------------------------------------------
        Doug Neuhauser University of California, Berkeley
        doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
        Office: 510-642-0931 215 McCone Hall # 4760
        Fax: 510-643-5811 Berkeley, CA 94720-4760
        Remote: 530-752-5615 (Wed,Fri)




        • On Aug 12, 2014, at 4:12 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:

          On 08/12/2014 02:31 PM, Chad Trabant wrote:

          Hi Doug,

          Thanks for your 2 cents.

          Regarding only certain software being the problem with blank
          location, I guess you did not like any of the others pointed out
          here?
          http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html

          Most of these arguments are not related directly to stationxml, but to the
          empty location code. However, those that are related to empty location code
          appear to be the inability to distinguish between an attribute that is not
          supplied vs an empty string attribute. If you make the LocationCode optional,
          it seems like you are in the same boat. If it is not specified, what do
          you use for location code? blank-blank? Then that is the same logic you
          use if your query does not return a location code.

          I completely agree, mostly the same boat. The points were: a) empty IDs have challenges that are not limited to any esoteric software and b) if an value is unset why represent it with anything at all, playing devils advocate: why is location special?

          Your example of:
          %{net}{sta}{loc}{chan} = "some lvalue"
          is not a good one, because of no separation between components.
          How do you distinguish between
          net = G, sta = ABCD
          and net = GA, sta = BCD?

          Those are completely distinct values in a nested hash (they are not a concatenated string). {G}{ABCD} is a different path than {GA}{BCD}.

          If you want a non-Oracle database example, the 'ltree' data type in
          Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
          take a blank identifier either. I do not see how the number of pain
          points with empty identifiers will not grow over time.

          The proposal for a "--" location ID was to change SEED by starting
          with StationXML as a transition. The first step could be done without
          changing all the miniSEED in all the archives, the next step could be
          done with a future revision in miniSEED. This would required mapping,
          which we are already doing for requests and will continue to do
          indefinitely. For sure this would be non-trivial change over time,
          the question is whether it is worth it or not.

          I don't see anything in your original proposal about changing SEED.
          I only see a proposal to change the SEED representation in StationXML.

          Indeed the 2nd step was not explicitly described, it would be another proposal.

          If we are going to continue to shoot ourselves in the foot with unset
          location IDs let's do so with clear eyes, the problems are not
          limited to esoteric software or use cases. Also, a blank string is
          not the only choice, more on that next.

          I agree with the above statement. However, trying to address the issue
          just within StationXML I think is just another bandaid, and I don't see
          why the StationXML needs this bandaid.

          Well, the "bandaid" would be a first step away from empty location IDs. You agree they are problematic but the solution is not radical enough? You prefer all the changes proposed at once, fair enough.

          Since StationXML does not appear to need this bandaid, I don't understand
          the need.

          IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

          It depends on what you mean by SEED. StationXML IS SEED for most intents and purposes. Changing all aspects of SEED at once is a much larger can of worms, and this would be an opportune time to change just the StationXML representation of SEED. Over the next couple of months and years as folks convert from dataless SEED to StationXML, an opportunity exists to make such low level changes, which will get much harder once the adoption is farther along.

          Chad

          - Doug N

          Chad

          PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.

          On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:

          I've been following this thread, and thought it was time to chime in.

          IMHO, the FDSN web services should follow the SEED convention.
          The SEED convention states that station, network, channel, and location
          are all blank-padded fields of fixed lengths.
          To me, this means that that we should either use the full blank-padded
          fields for ALL of these identifiers, or for none of them.

          eg:

          <Network code="G " >
          <Station code="KIP ">
          <Channel locationCode=" " code="BHZ">

          or

          <Network code="G" >
          <Station code="KIP">
          <Channel locationCode="" code="BHZ">

          Personally I think the latter (blank trimmed) is better.

          I agree that the blank location code is a pain when dealing with
          Oracle, white-space delimited fields such as command lines, etc,
          but unless we change the SEED convention, I don't see that making
          an aliases of "-" or "--" in FDSN station XML improves the situation.

          AFAIK, the ONLY reason that we struggle with the two-blank issue is
          that certain software (eg Oracle) cannot distinguish between the
          the empty string (string of length 0) and NULL. Therefore, the DMC,
          NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
          for the location code.

          Unless we propose to change the SEED standard, all of our data in
          our archives, and all of our current acquisition systems, I think
          that we have to live with "emtpy" location codes.

          I have not seen any compelling argument for representing a blank (empty)
          location code in FDSN station XML as anything but the empty string.

          If you want to have "" and " " be equivalent in FDSN station XML,
          you can simply change the schema definition of the field to be a "token"
          rather than a "string", in which case any representation with blanks will
          be reduced to the empty string. Problem solved?

          I note that the NCEDC implementation currently uses 1 blank " "
          for empty location code. I have no problem changing this if we can
          agree on a convention.

          I also note ironically that the TA network run by IRIS is one of the
          largest networks in terms of stations, and uses blank location codes.

          My 2 cents...

          - Doug N

          On 07/23/2014 10:30 AM, Chad Trabant wrote:

          Hello WS users and developers,

          A recent discussion between FDSN data centers is centered on
          representation of empty location IDs in StationXML, the default
          format returned by the fdsnws-station web service. The DMC may be
          changing how it represents location ID in XML and text formats based
          on these discussions. We are asking for input as any such change will
          effect users of our metadata service.

          Some background: In the SEED channel naming scheme there is a
          hierarchy of network, station, location and channel identifiers. Of
          these, it is only the location ID that is commonly accepted to be
          empty. In the SEED format the location ID is a two-character field,
          where the value is left justified and padded with spaces if needed.
          When the value is empty the field is simply two spaces of padding.

          Historically, and presumably to avoid having an empty location ID,
          the DMC has represented “empty” location IDs as a string of two
          spaces. Following this practice, we express this in StationXML by
          setting the locationCode attribute to a string of two spaces. We have
          done this so long we sometimes forget that it is not compliant with a
          strict reading of SEED, at best it falls into the vagaries of SEED,
          on the other hand we have been doing it for years with no apparent
          problems (in fact it has helpfully avoided an empty core
          identifier).

          There now exists another fdsnws-station implementation that returns
          StationXML with the locationCode attribute set to an empty string
          when the SEED value is empty. The justification is that this follows
          the SEED rules of trimming the padding spaces from the values.

          Unfortunately this means there are now flavors of StationXML that are
          incompatible in the core channel name identifiers. In other words,
          two StationXML documents for the same SEED channel appear, without
          extra field translation, to be different channels.

          As most of you are users of SEED and StationXML metadata (at some
          level) and some of you have written code to parse these formats and
          manage the data returned by the DMC and other FDSN data centers, we
          are asking for your input regarding the potential solutions.

          Here are the options being considered for mapping an empty location
          ID in SEED to StationXML:

          1) Set locationCode to two spaces. While the DMC and users have been
          using this for a long while, it is not precisely the SEED value (but
          the mapping could be formalized). Also, whitespace in attributes does
          have some theoretical challenges: the wonky rules for XML attributes
          related to whitespace handling require removal of spaces in some
          cases (we have never heard of problems though).

          2) Set locationCode to an empty string. This would match the strict
          value present in SEED, an empty identifier.

          3) Set locationCode to “--“ (two dashes). This avoids issues with
          whitespace in XML attribute values and avoids issues with an empty
          identifier. Also, this matches the request mechanisms where “--“ is
          accepted as a synonym for an empty location ID.

          All of these solutions are viable in that we can make them work in
          code, it is a matter of choosing one for future FDSN metadata, pick
          your poison so to speak.

          In my personal opinion, an empty location ID is an unfortunate quirk
          of SEED that we should rectify in StationXML. An empty identifier can
          be confused for “unknown” if the programmer is not careful, which is
          semantically very different than “set to empty”. The two-space
          strings that the DMC is currently using are also not ideal, they are
          hard for humans to read and potentially weird with XML rules. The
          dashed location ID avoids these issues but requires the most change.
          I also think requiring all readers of StationXML to translate (e.g.
          remove padding) is a bad idea, the values in SEED should be uniquely
          mapped to values in StationXML.

          Thanks for reading this far. Your opinion and input is appreciated.

          regards,
          Chad


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          --
          ------------------------------------------------------------------------
          Doug Neuhauser University of California, Berkeley
          doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
          Office: 510-642-0931 215 McCone Hall # 4760
          Fax: 510-643-5811 Berkeley, CA 94720-4760
          Remote: 530-752-5615 (Wed,Fri)


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


          --
          ------------------------------------------------------------------------
          Doug Neuhauser University of California, Berkeley
          doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
          Office: 510-642-0931 215 McCone Hall # 4760
          Fax: 510-643-5811 Berkeley, CA 94720-4760
          Remote: 530-752-5615 (Wed,Fri)


          _______________________________________________
          webservices mailing list
          webservices<at>iris.washington.edu
          http://www.iris.washington.edu/mailman/listinfo/webservices


        • On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
          <doug<at>seismo.berkeley.edu> wrote:




          IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

          - Doug N



          OK, I would like to offer a proposal to change to SEED to eliminate
          blank location ids.

          Wait, can I do that?
          :)

          Philip

          • Seed represents unidentified locations as “ “ [2 empty spaces] (This is a
            limitation of SEED), while most (if not all) modern languages represent
            such thing as NULL. NULL does not equal “” or “ “ or “--“. NULL is NULL
            and should be represented as such. While SEED is limited, XML is not. Why
            should we incorporate Seed limitations into any new XML schema.



            Unidentified attributes (values=unidentified or not provided or null) are
            omitted in XML. When an attribute does not appear in the document, then it
            is NULL (no confusion there).



            In my opinion StationXml shouldn’t force me to provide a value for an
            attribute that is unknown to me. While the purpose of StationXml schema is
            to map between SEED and XML, limitations of Seed shouldn’t be carried over.



            For historical data, software should take care of any conversion needed.
            What you store in your database is outside the scope of StationXml.










            On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
            wrote:

            On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
            <doug<at>seismo.berkeley.edu> wrote:




            IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

            - Doug N



            OK, I would like to offer a proposal to change to SEED to eliminate
            blank location ids.

            Wait, can I do that?
            :)

            Philip
            _______________________________________________
            webservices mailing list
            webservices<at>iris.washington.edu
            http://www.iris.washington.edu/mailman/listinfo/webservices


            • Hear hear Yazan!


              Ellen


              On Tue, Aug 12, 2014 at 9:17 PM, Yazan Suleiman <yazan.suleiman<at>gmail.com>
              wrote:

              Seed represents unidentified locations as " " [2 empty spaces] (This is a
              limitation of SEED), while most (if not all) modern languages represent
              such thing as NULL. NULL does not equal "" or " " or "--". NULL is NULL
              and should be represented as such. While SEED is limited, XML is not. Why
              should we incorporate Seed limitations into any new XML schema.



              Unidentified attributes (values=unidentified or not provided or null) are
              omitted in XML. When an attribute does not appear in the document, then it
              is NULL (no confusion there).



              In my opinion StationXml shouldn't force me to provide a value for an
              attribute that is unknown to me. While the purpose of StationXml schema is
              to map between SEED and XML, limitations of Seed shouldn't be carried over.



              For historical data, software should take care of any conversion needed.
              What you store in your database is outside the scope of StationXml.










              On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
              wrote:

              On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
              <doug<at>seismo.berkeley.edu> wrote:




              IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

              - Doug N



              OK, I would like to offer a proposal to change to SEED to eliminate
              blank location ids.

              Wait, can I do that?
              :)

              Philip
              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices



              _______________________________________________
              webservices mailing list
              webservices<at>iris.washington.edu
              http://www.iris.washington.edu/mailman/listinfo/webservices




          • On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
            <doug<at>seismo.berkeley.edu> wrote:

            IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.

            - Doug N


            Hi all,

            In fact changing SEED is the ultimate goal, and while the proposal was to start that process with StationXML, Doug is correct that the core issue is with the SEED rules themselves. With that, it's probably time to move any SEED-changing discussion to the FDSN mailing lists. Thank you to the users that voiced your opinion, you are welcome to continue to chime in with thoughts here if you would like.

            The idea of changing the schema type for locationCode to a token is appealing if that will help make the now 3 flavors of locationCode more compatible to XML-parsing clients. We should discuss this more in the FDSN context, it might buy us more time to discuss and address the lower level issue. Unfortunately it does not address the text output used by many.

            Chad

00:25:26 v.22510d55