SAGE: Thread: A question of location ID, how to represent empty IDs in XML?

Started: 2014-07-23 17:30:46

Last activity: 2014-08-14 15:59:34

Topics: Web Services

Chad Trabant

A question of location ID, how to represent empty IDs in XML?

2014-07-23 17:30:46

Hello WS users and developers,

A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

Here are the options being considered for mapping an empty location ID in SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

Thanks for reading this far. Your opinion and input is appreciated.

regards,
Chad

Philip Crotwell

Re: A question of location ID, how to represent empty IDs in XML?

2014-07-23 21:37:09

Hi

Years ago we had full SEED. Then because of keeping metadata updated,
we switched to a separation into dataless SEED + miniseed. Now,
because of the complexities and limitations of dataless SEED, the
future looks like StationXML + miniseed. I am all for this change, but
how the location id is resolved really needs to address not just what
do we do in StationXML, but what do we do in StationXML + miniseed.

I also lean towards "--" for the simple reason that there are so many
instances where I have been bitten by spaces or nulls. Even though I
know about this, I still get caught. File names, urls, user gui
displays, etc all have problems with spaces nor nulls and as a
practical matter it is harder to see something that isn't there than
something that is there. Furthermore, using null or space-space is
really hard as a command line argument in the shell. That said, "--"
already means "long option name" in many *nix programs, so if we were
starting from scratch, underscores like "__" might be a better choice.
The SEED manual already lists underscore as a separate item in the
flags section (p32), so maybe worth considering.

But if option 3 is choosen, would there be any possibility of amending
the SEED spec so that "--" is actually valid within the location id
field, with the caveat that it is synonymous with space-space/null,
but "--" is the preferred value? I realize that doing a global search
and replace on a petabyte of miniseed data is probably not going to
happen, but it would be really nice if whatever location id is in
StationXML, it is exactly 2 characters and is the exact same 2
characters as in miniseed.

Frankly the whole idea of making location ids "optional" was a real
mistake IMHO. I am sure that anyone that has every written code to
deal with location ids has something that looks like:
if (locid == null or locid == "" or locid == " " or locid == "--")
then locid = "--"
which is just a painfully stupid thing to have to do over and over and
over again. Grumble grumble grumble. :(

Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
station or channel codes, so addressing that at the same time might be
wise.

My $0.02, please pick one string, and only one string, and use it everywhere.

thanks
Philip

On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:

Hello WS users and developers,

A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.

Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).

There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.

Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.

As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.

Here are the options being considered for mapping an empty location ID in SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).

2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.

3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.

All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.

Thanks for reading this far. Your opinion and input is appreciated.

regards,
Chad

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
- Chad Trabant
  
  Re: A question of location ID, how to represent empty IDs in XML?
  
  2014-07-24 17:48:52
  
  On Jul 23, 2014, at 11:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
  
  Hi
  
  Years ago we had full SEED. Then because of keeping metadata updated,
  we switched to a separation into dataless SEED + miniseed. Now,
  because of the complexities and limitations of dataless SEED, the
  future looks like StationXML + miniseed. I am all for this change, but
  how the location id is resolved really needs to address not just what
  do we do in StationXML, but what do we do in StationXML + miniseed.
  
  I also lean towards "--" for the simple reason that there are so many
  instances where I have been bitten by spaces or nulls. Even though I
  know about this, I still get caught. File names, urls, user gui
  displays, etc all have problems with spaces nor nulls and as a
  practical matter it is harder to see something that isn't there than
  something that is there. Furthermore, using null or space-space is
  really hard as a command line argument in the shell. That said, "--"
  already means "long option name" in many *nix programs, so if we were
  starting from scratch, underscores like "__" might be a better choice.
  The SEED manual already lists underscore as a separate item in the
  flags section (p32), so maybe worth considering.
  
  Hi Philip, thanks for your thoughts.
  
  The underscore character is certainly another option. What I do not like about it is low readability, in particular in URLs they can become completely lost.
  
  But if option 3 is choosen, would there be any possibility of amending
  the SEED spec so that "--" is actually valid within the location id
  field, with the caveat that it is synonymous with space-space/null,
  but "--" is the preferred value? I realize that doing a global search
  and replace on a petabyte of miniseed data is probably not going to
  happen, but it would be really nice if whatever location id is in
  StationXML, it is exactly 2 characters and is the exact same 2
  characters as in miniseed.
  
  If the FDSN were to go the route of "--" in StationXML it seems natural to extend the conversation to potential changes in SEED headers and data records. That is just a bigger can of worms and would take more time to address. The idea the two should be treated synonymously is just what I have in mind and would allow us to transition over time.
  
  Frankly the whole idea of making location ids "optional" was a real
  mistake IMHO. I am sure that anyone that has every written code to
  deal with location ids has something that looks like:
  if (locid == null or locid == "" or locid == " " or locid == "--")
  then locid = "--"
  which is just a painfully stupid thing to have to do over and over and
  over again. Grumble grumble grumble. :(
  
  Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
  station or channel codes, so addressing that at the same time might be
  wise.
  
  Indeed, this should be clarified in the SEED spec.
  
  Chad
  
  My $0.02, please pick one string, and only one string, and use it everywhere.
  
  thanks
  Philip
  
  On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
  
  Hello WS users and developers,
  
  A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
  
  Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
  
  Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
  
  There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
  
  Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
  
  As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
  
  Here are the options being considered for mapping an empty location ID in SEED to StationXML:
  
  1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
  
  2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
  
  3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
  
  All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
  
  In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
  
  Thanks for reading this far. Your opinion and input is appreciated.
  
  regards,
  Chad
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
Yazan Suleiman

Re: A question of location ID, how to represent empty IDs in XML?

2014-07-24 16:29:55

Is modifying stationxml schema (to allow null location, required=false) a
possibility? example:
<Channel startDate="1992-09-23T00:00:00" restrictedStatus="open"
endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="
open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode="--" startDate="1992-09-23T00:00:00"
restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">

It is very reasonable to have a null value for location in any object
representation of station schema. " " or "" is inaccurate and only
introduces more trouble and complexity.

If changing the schema is not an option then " " or "" is a very bad idea.
Many parsers treat "" or " " as empty and will ignore them. If
translating this into SEED is the issue, then it is the convertor
responsibility to take care of the conversion.

Yazan

On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu>
wrote:

Hello WS users and developers,

A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default format
returned by the fdsnws-station web service. The DMC may be changing how it
represents location ID in XML and text formats based on these discussions.
We are asking for input as any such change will effect users of our
metadata service.

Some background: In the SEED channel naming scheme there is a hierarchy of
network, station, location and channel identifiers. Of these, it is only
the location ID that is commonly accepted to be empty. In the SEED format
the location ID is a two-character field, where the value is left justified
and padded with spaces if needed. When the value is empty the field is
simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID, the DMC
has represented “empty” location IDs as a string of two spaces. Following
this practice, we express this in StationXML by setting the locationCode
attribute to a string of two spaces. We have done this so long we
sometimes forget that it is not compliant with a strict reading of SEED, at
best it falls into the vagaries of SEED, on the other hand we have been
doing it for years with no apparent problems (in fact it has helpfully
avoided an empty core identifier).

There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string when the
SEED value is empty. The justification is that this follows the SEED rules
of trimming the padding spaces from the values.

Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words, two
StationXML documents for the same SEED channel appear, without extra field
translation, to be different channels.

As most of you are users of SEED and StationXML metadata (at some level)
and some of you have written code to parse these formats and manage the
data returned by the DMC and other FDSN data centers, we are asking for
your input regarding the potential solutions.

Here are the options being considered for mapping an empty location ID in
SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but the
mapping could be formalized). Also, whitespace in attributes does have
some theoretical challenges: the wonky rules for XML attributes related to
whitespace handling require removal of spaces in some cases (we have never
heard of problems though).

2) Set locationCode to an empty string. This would match the strict value
present in SEED, an empty identifier.

3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.

All of these solutions are viable in that we can make them work in code,
it is a matter of choosing one for future FDSN metadata, pick your poison
so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk of
SEED that we should rectify in StationXML. An empty identifier can be
confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space strings
that the DMC is currently using are also not ideal, they are hard for
humans to read and potentially weird with XML rules. The dashed location
ID avoids these issues but requires the most change. I also think
requiring all readers of StationXML to translate (e.g. remove padding) is a
bad idea, the values in SEED should be uniquely mapped to values in
StationXML.

Thanks for reading this far. Your opinion and input is appreciated.

regards,
Chad

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
- Rob Newman
  
  Re: A question of location ID, how to represent empty IDs in XML?
  
  2014-07-24 16:51:16
  
  Hi WS folks,
  
  For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python (my language of choice):
  
  "Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.
  Flat is better than nested.
  Sparse is better than dense.
  Readability counts.
  Special cases aren't special enough to break the rules.
  Although practicality beats purity.
  Errors should never pass silently.
  Unless explicitly silenced."
  
  Number 2 is especially relevant here:
  "Explicit is better than implicit."
  
  Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.
  
  Just my $0.02.
  
  - Rob Newman, IRIS DMC
  
  On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:
  
  Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
  <Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  vs
  <Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  vs
  <Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  
  It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.
  
  If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.
  
  Yazan
  
  On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
  
  Hello WS users and developers,
  
  A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
  
  Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
  
  Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
  
  There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
  
  Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
  
  As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
  
  Here are the options being considered for mapping an empty location ID in SEED to StationXML:
  
  1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
  
  2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
  
  3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
  
  All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
  
  In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
  
  Thanks for reading this far. Your opinion and input is appreciated.
  
  regards,
  Chad
  - Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-25 22:26:52
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data
    format cannot accommodate existing channel naming, then the new format
    is flawed. But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an
    empty string and a parser has to produce it as such. Not as null, nil or
    none, but as an empty string. Otherwise the parser is broken and needs
    to be fixed, not the data!
    
    Again: It's not about beauty. We all agree that current channel naming
    is not particularly beautiful and has limitations. But our business is
    not to try to solve that issue now and here.
    
    Cheers
    Joachim
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-25 16:35:45
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Marcelo Bianchi
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-26 06:38:17
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-27 02:52:55
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.
    
    If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Anthony Lomax
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 16:59:18
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has
    1-N spaces is the same thing - there are often implicit or explicit
    trim() function hiding in a processing pipeline. A null string is not
    the same. So an empty or blank string is the same, valid location code,
    and null is undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not
    needed because sometimes it is not possible or difficult to represent an
    empty string or even a string? For example on the command line or in a
    restful WS URI? (Or a URI on the command line!) So it may be that the
    use of "--" for intermediate processing and requests could be tolerated
    and somehow official, while empty or only-blanks strings official and
    for persistent data.
    
    Just my 0.02EUR = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.
    
    If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ------------------------------------------------------------------------
    
    *Anthony Lomax*
    *161 Allée du Micocoulier, 06370 Mouans-Sartoux, France*
    *tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net
    <anthony<at>alomax.net> web: http://www.alomax.net
    http://www.alomax.net/ *
    
    *Twitter: * *@ALomaxNet http://twitter.com/ALomaxNet*
    *Science & Special Topics: * *http://www.alomax.net/science*
    *Software: * *http://www.alomax.net/software* *- updates: *
    *https://twitter.com/ALomaxNet*
    ------------------------------------------------------------------------
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 16:37:26
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 16:53:09
    
    One more thing is that this is not something that we can resolve based
    on the XML spec as all three variations are well-formed and can be
    valid XML depending on the schema.
    
    There is another issue in that white space in xml attributes can be
    normalized by the parsers, but this behavior is not standard across
    all parsers, so dealing with attributes that are not limited to
    non-whitespace characters means that you likely have to consider
    empth, one space and two spaces, and even N spaces as all being
    equivalent. Depending on the parser, you may be able to have this
    handled for you, or you may have to code explicitly for the cases.
    
    I think per the xml spec, even these two are considered "the same" as well:
    locationCode="
    "
    locationCode="
    
    "
    as newlines in attributes can be normalized to whilespace on parsing.
    But again, exactly how it is done depends on the parser.
    
    Philip
    
    PS I am NOT advocating we choose newline-newline as the default
    location id!!! :)
    
    On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Lion Krischer
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 23:16:26
    
    The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
    
    http://www.w3.org/TR/REC-xml/#AVNormalize
    
    I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
    
    Lion
    
    On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    One more thing is that this is not something that we can resolve based
    on the XML spec as all three variations are well-formed and can be
    valid XML depending on the schema.
    
    There is another issue in that white space in xml attributes can be
    normalized by the parsers, but this behavior is not standard across
    all parsers, so dealing with attributes that are not limited to
    non-whitespace characters means that you likely have to consider
    empth, one space and two spaces, and even N spaces as all being
    equivalent. Depending on the parser, you may be able to have this
    handled for you, or you may have to code explicitly for the cases.
    
    I think per the xml spec, even these two are considered "the same" as well:
    locationCode="
    "
    locationCode="
    
    "
    as newlines in attributes can be normalized to whilespace on parsing.
    But again, exactly how it is done depends on the parser.
    
    Philip
    
    PS I am NOT advocating we choose newline-newline as the default
    location id!!! :)
    
    On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 17:48:54
    
    That spec also says:
    If the attribute type is not CDATA, then the XML processor MUST
    further process the normalized attribute value by discarding any
    leading and trailing space (#x20) characters, and by replacing
    sequences of space (#x20) characters by a single space (#x20)
    character.
    
    So, by this you should always end up with an empty string even if you
    have two or more spaces. My experience with parsers is that this does
    not happen, but since it is in the spec it could. You mileage may
    vary...
    
    Philip
    
    On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
    <krischer<at>geophysik.uni-muenchen.de> wrote:
    
    The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
    
    http://www.w3.org/TR/REC-xml/#AVNormalize
    
    I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
    
    Lion
    
    On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    One more thing is that this is not something that we can resolve based
    on the XML spec as all three variations are well-formed and can be
    valid XML depending on the schema.
    
    There is another issue in that white space in xml attributes can be
    normalized by the parsers, but this behavior is not standard across
    all parsers, so dealing with attributes that are not limited to
    non-whitespace characters means that you likely have to consider
    empth, one space and two spaces, and even N spaces as all being
    equivalent. Depending on the parser, you may be able to have this
    handled for you, or you may have to code explicitly for the cases.
    
    I think per the xml spec, even these two are considered "the same" as well:
    locationCode="
    "
    locationCode="
    
    "
    as newlines in attributes can be normalized to whilespace on parsing.
    But again, exactly how it is done depends on the parser.
    
    Philip
    
    PS I am NOT advocating we choose newline-newline as the default
    location id!!! :)
    
    On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Lion Krischer
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-29 00:10:23
    
    Well in that case the only sensible solution seems to be to use an empty string to encode an empty location.
    
    Lion
    
    On 28 Jul 2014, at 16:48, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    That spec also says:
    If the attribute type is not CDATA, then the XML processor MUST
    further process the normalized attribute value by discarding any
    leading and trailing space (#x20) characters, and by replacing
    sequences of space (#x20) characters by a single space (#x20)
    character.
    
    So, by this you should always end up with an empty string even if you
    have two or more spaces. My experience with parsers is that this does
    not happen, but since it is in the spec it could. You mileage may
    vary...
    
    Philip
    
    On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
    <krischer<at>geophysik.uni-muenchen.de> wrote:
    
    The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
    
    http://www.w3.org/TR/REC-xml/#AVNormalize
    
    I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
    
    Lion
    
    On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    One more thing is that this is not something that we can resolve based
    on the XML spec as all three variations are well-formed and can be
    valid XML depending on the schema.
    
    There is another issue in that white space in xml attributes can be
    normalized by the parsers, but this behavior is not standard across
    all parsers, so dealing with attributes that are not limited to
    non-whitespace characters means that you likely have to consider
    empth, one space and two spaces, and even N spaces as all being
    equivalent. Depending on the parser, you may be able to have this
    handled for you, or you may have to code explicitly for the cases.
    
    I think per the xml spec, even these two are considered "the same" as well:
    locationCode="
    "
    locationCode="
    
    "
    as newlines in attributes can be normalized to whilespace on parsing.
    But again, exactly how it is done depends on the parser.
    
    Philip
    
    PS I am NOT advocating we choose newline-newline as the default
    location id!!! :)
    
    On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Lion Krischer
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 22:54:21
    
    Hi all,
    
    leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?
    
    The following group will match any uppercase alphanumeric two letter code and two spaces:
    
    ^([A-Z0-9]{2}| )$
    
    It matches “AA”, “00”, “10”, “A1”, “ “ , …
    but not “—“, “”, “-“, “a1”, ...
    
    Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
    
    Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.
    
    In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows. “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
    
    Cheers!
    
    Lion
    
    On 28 Jul 2014, at 15:37, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 23:18:39
    
    Hi Lion,
    
    Lion Krischer [07/28/2014 03:54 PM]:
    
    ^([A-Z0-9]{2}| )$
    
    It matches “AA”, “00”, “10”, “A1”, “ “ , …
    but not “—“, “”, “-“, “a1”, ...
    
    'Not ""' is a problem as "" is a valid location code according to SEED
    specification. Which is what all this is actually about. :)
    
    In general I like the idea of using regular expressions if we use
    ^([A-Z0-9]{2}| |)$
    
    Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
    
    Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.
    
    The most important consistency is with the SEED standard.
    
    Cheers
    Joachim
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 19:03:19
    
    On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    The most important consistency is with the SEED standard.
    
    I would argue that consistency for end users is the only thing that
    matters. Consistency with the SEED spec may be a means to that end,
    but if the end users do not perceive it as being consistent, it isn't
    consistent.
    
    To me, that means we need to look at the bigger picture. Ideally we
    would have location ids that could be represented by exactly the same
    characters in:
    stationXML
    miniseed
    URLS
    client displays
    databases
    and even email
    in a way that is explicit, consistent and natural for the end user.
    
    To be honest, I don't like any of the choices. If I had my way, loc
    ids would have been defined as strictly two characters like
    ^([A-Z0-9]{2})$, and 00 would have been what you used if you didn't
    care. Alas, that horse has left the barn.
    
    Maybe not even worth $0.02... :)
    Philip
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 05:42:33
    
    On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    The most important consistency is with the SEED standard.
    
    I would argue that consistency for end users is the only thing that
    matters. Consistency with the SEED spec may be a means to that end,
    but if the end users do not perceive it as being consistent, it isn't
    consistent.
    
    To me, that means we need to look at the bigger picture. Ideally we
    would have location ids that could be represented by exactly the same
    characters in:
    stationXML
    miniseed
    URLS
    client displays
    databases
    and even email
    in a way that is explicit, consistent and natural for the end user.
    
    I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.
    
    Here are some others I would add to the list:
    
    use in command lines
    use in other data formats
    
    Chad
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 17:07:54
    
    Chad Trabant wrote on 31.07.2014 07:42:
    
    On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    The most important consistency is with the SEED standard.
    
    I would argue that consistency for end users is the only thing that
    matters. Consistency with the SEED spec may be a means to that end,
    but if the end users do not perceive it as being consistent, it isn't
    consistent.
    
    To me, that means we need to look at the bigger picture. Ideally we
    would have location ids that could be represented by exactly the same
    characters in:
    stationXML
    miniseed
    URLS
    client displays
    databases
    and even email
    in a way that is explicit, consistent and natural for the end user.
    
    I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.
    
    Here are some others I would add to the list:
    
    use in command lines
    use in other data formats
    
    That's a pretty ambitious list considering...
    
    Chad Trabant wrote on 31.07.2014 06:57:
    
    There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, [...]
    
    We are still talking here about a metadata format, aren't we? And you want to prescribe how users shall display empty location codes in GUI displays? You must be kidding...
    
    The issue is *not* about other data formats. It is up to every developer to save empty location codes in whatever way they like in their formats, databases, bulletins etc. That is absolutely no problem and hence doesn't require a solution.
    
    Here the issue is about representing data in XML. Since we have a well accepted and widely implemented channel naming standard *already*, and since users are working with StationXML *already*, what we need *now* is a clarification about the proper representation of *current* channel naming in StationXML.
    
    Joachim
    
    Lion Krischer
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 23:33:37
    
    Hi Joachim,
    
    'Not ""' is a problem as "" is a valid location code according to SEED specification. Which is what all this is actually about. :)
    
    In general I like the idea of using regular expressions if we use ^([A-Z0-9]{2}| |)$
    
    The idea was to choose either “ “ or “” which both denote an empty location id. In SEED it is not possible to specify two actual spaces (and not an empty string) as the location identifier as right aligned spaces are considered padding characters and have to be removed by the processing software.
    
    Allowing both would mean having two separate “encodings” for the same thing. I am fine with either it is just important that one is picked as the proper representation of an empty location id.
    
    According to the SEED spec it appears that single letter location codes are also valid. Does that happen in the wild?
    
    Cheers!
    
    Lion
    
    Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
    
    Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.
    
    The most important consistency is with the SEED standard.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 06:49:09
    
    On Jul 28, 2014, at 6:54 AM, Lion Krischer <krischer<at>geophysik.uni-muenchen.de> wrote:
    
    Hi all,
    
    leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?
    
    Hi Lion,
    
    We should definately add the rules to the schema, we just need to decide what they are!
    
    The following group will match any uppercase alphanumeric two letter code and two spaces:
    
    ^([A-Z0-9]{2}| )$
    
    It matches “AA”, “00”, “10”, “A1”, “ “ , …
    but not “—“, “”, “-“, “a1”, ...
    
    Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
    
    Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.
    
    In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows.
    
    Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
    
    Has anyone observed this automatic trimming on any system?
    
    “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
    
    Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for selecting the empty SEED location IDs.
    
    Chad
    
    PS. here is my test data:
    
    ------- chan.xml
    <FDSNStationXML schemaVersion="1.0">
    <Channel locationCode=" " startDate="2012-03-12T20:28:00" restrictedStatus="open" endDate="2599-12-31T23:59:59" code="BHZ">
    </Channel>
    </FDSNStationXML>
    -------
    
    Here is a test with Python:
    -------
    from xml.etree import ElementTree
    
    with open('chan.xml', 'rt') as f:
    tree = ElementTree.parse(f)
    
    node = tree.find('./with_attributes')
    print node.tag
    for name, value in sorted(node.attrib.items()):
    print ' %-4s = "%s"' % (name, value)
    -------
    
    which produces:
    -------
    Channel
    code = "BHZ"
    endDate = "2599-12-31T23:59:59"
    locationCode = " "
    restrictedStatus = "open"
    startDate = "2012-03-12T20:28:00"
    -------
    
    No trimming.
    
    Here is a test with Perl:
    -------
    use strict;
    use warnings;
    use XML::Simple;
    use Data::Dumper;
    
    my $file = 'chan.xml';
    
    my $test_data = XMLin($file);
    print Dumper($test_data);
    -------
    
    which produces:
    -------
    $VAR1 = {
    'schemaVersion' => '1.0',
    'Channel' => {
    'locationCode' => ' ',
    'endDate' => '2599-12-31T23:59:59',
    'restrictedStatus' => 'open',
    'startDate' => '2012-03-12T20:28:00',
    'code' => 'BHZ'
    }
    };
    -------
    
    No trimming.
    
    There are many more parsing options for Perl and Python and other languages of course, but this is pretty basic stuff. It is how a user such as myself would go about parsing and using StationXML.
    
    Robert Barsch
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 16:20:16
    
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1
    
    Dear all,
    
    Maybe some stupid questions: Are there actually any valid use cases
    for having to distinct between empty and unknown location code within
    the data? If so, does this than also apply for network, station,
    channel codes? So if the community opts to go for unknown as well as
    an empty/unset markers for the location field shouldn't be the same
    markers used for unknown/unset network etc.?
    
    In terms of existing StationXML parsers I assume most are just
    stripping whitespaces from the location code and thus “” and “
    “ should already work resulting in minimal disruption in the
    users’ workflows.
    
    Usually without DTD or XML schema definition, all whitespaces are
    significant whitespaces and should be preserved by any XML parsers. I
    guess Lion meant with StationXML parser more than the plain XML
    parser. I don't know what other clients do, but ObsPy strips
    internally all Net/Sta/Loc/Cha field values.
    
    Cheers,
    Robert
    
    PS: for some reason did my previous mail sent last weekend not appear
    at this list, also I didn't receive all replies to this thread as
    archived in
    http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
    (e.g.
    http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
    was missing)- I didn't get any bounce or error message from mailman
    either? Any idea?
    
    - --
    Dr. Robert Barsch
    
    EGU Office Munich
    Luisenstr. 37
    80333 Munich
    Germany
    
    Phone: +49-89-21806565
    Fax: +49-89-218017855
    eMail: barsch<at>egu.eu
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.17 (MingW32)
    Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
    
    iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
    duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
    =vpwD
    -----END PGP SIGNATURE-----
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 08:01:30
    
    On Jul 31, 2014, at 12:20 AM, Robert Barsch <barsch<at>egu.eu> wrote:
    
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1
    
    Dear all,
    
    Maybe some stupid questions: Are there actually any valid use cases
    for having to distinct between empty and unknown location code within
    the data? If so, does this than also apply for network, station,
    channel codes? So if the community opts to go for unknown as well as
    an empty/unset markers for the location field shouldn't be the same
    markers used for unknown/unset network etc.?
    
    Hi Robert,
    
    There is no rule in the SEED world preventing two channel names differing only by location ID, in fact it happens often. Since location can be empty it means that we can have both XX.STA.00.LHZ and XX.STA..LHZ, if location were described as "unknown" these two become ambiguous. I do not know off hand of any cases where the differences are between an empty location ID and an filled one, but it would be a weird case to eliminate (or even describe) in the specification.
    
    In terms of existing StationXML parsers I assume most are just
    stripping whitespaces from the location code and thus “” and “
    “ should already work resulting in minimal disruption in the
    users’ workflows.
    
    Usually without DTD or XML schema definition, all whitespaces are
    significant whitespaces and should be preserved by any XML parsers.
    
    Ah. I think that is basically what I have been finding, thanks for the confirmation.
    
    Chad
    
    I
    guess Lion meant with StationXML parser more than the plain XML
    parser. I don't know what other clients do, but ObsPy strips
    internally all Net/Sta/Loc/Cha field values.
    
    Cheers,
    Robert
    
    PS: for some reason did my previous mail sent last weekend not appear
    at this list, also I didn't receive all replies to this thread as
    archived in
    http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
    (e.g.
    http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
    was missing)- I didn't get any bounce or error message from mailman
    either? Any idea?
    
    - --
    Dr. Robert Barsch
    
    EGU Office Munich
    Luisenstr. 37
    80333 Munich
    Germany
    
    Phone: +49-89-21806565
    Fax: +49-89-218017855
    eMail: barsch<at>egu.eu
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.17 (MingW32)
    Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
    
    iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
    duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
    =vpwD
    -----END PGP SIGNATURE-----
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 16:33:08
    
    Chad Trabant wrote on 31.07.2014 08:49:
    
    In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus and should already work resulting in minimal disruption in the users workflows.
    
    Actually, this does not appear to be happening, in the parsers Ive used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
    
    There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.
    
    Has anyone observed this automatic trimming on any system?
    
    No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
    
    In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
    
    -- would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
    
    Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
    
    The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.
    
    Joachim
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 07:42:29
    
    On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Chad Trabant wrote on 31.07.2014 08:49:
    
    In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus and should already work resulting in minimal disruption in the users workflows.
    
    Actually, this does not appear to be happening, in the parsers Ive used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
    
    There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.
    
    Has anyone observed this automatic trimming on any system?
    
    No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
    
    In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
    
    HI Joachim,
    
    You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
    
    -- would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
    
    Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
    
    The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.
    
    Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
    
    Chad
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 17:13:37
    
    Chad Trabant wrote on 31.07.2014 09:42:
    
    On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Has anyone observed this automatic trimming on any system?
    
    No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
    
    In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
    
    HI Joachim,
    
    You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
    
    The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
    
    Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
    
    Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point.
    
    Joachim
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 08:37:36
    
    On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Chad Trabant wrote on 31.07.2014 09:42:
    
    On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Has anyone observed this automatic trimming on any system?
    
    No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
    
    In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
    
    HI Joachim,
    
    You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
    
    The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
    
    Hi Joachim,
    
    That is a strange transition from libmseed to web service clients that I do not understand. You appear fixated on updating the clients, but as I have said many times that, by itself, will not solve the actual problem; the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
    
    Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
    
    Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
    
    Here is what you said about mapping:
    
    On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    In general mappings are not the problem and are widely used anyway.
    
    So what is the problem with mapping?
    
    I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
    
    Chad
    
    Joachim
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-08 00:44:07
    
    Hi Chad,
    
    after a well-deserved creative break a little more feedback from Potsdam
    on our favorite topic. :)
    
    Chad Trabant wrote on 31.07.2014 10:37:
    
    On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Chad Trabant wrote on 31.07.2014 09:42:
    
    On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Has anyone observed this automatic trimming on any system?
    
    No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
    
    In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
    
    HI Joachim,
    
    You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
    
    The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
    
    Hi Joachim,
    
    That is a strange transition from libmseed to web service clients that I do not understand.
    
    In libmseed you treat the two spaces differently than some web service
    client code. While in libmseed you trim the spaces, resulting in an
    empty string, in web service clients (like FetchData) you keep the
    spaces. By simply trimming them there, too, and then matching against an
    empty string, you would not only maintain consistency in your
    interpretation of waveform and meta data, but also be more "accepting"
    in what your clients are able to process. In particular this would
    enable your clients to parse strictly SEED compliant empty location
    codes, which currently is not possible.
    
    You appear fixated on updating the clients,
    
    Yes, absolutely! Because that's where the current problem can be fixed
    most easily.
    
    but as I have said many times that, by itself, will not solve the actual problem;
    
    That depends on what you consider as "the actual problem". If it is
    empty location codes, I would not view that as a problem at all.
    Cosmetics at worst, but it is as it is and we can live with it. Can't you?
    
    the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
    
    Once the issue is clarified, the clients will naturally be adopted to
    the specification. The inconsistency is currently still at a
    low/manageable level. In particular, there is absolutely nowhere an
    inconsistency with (Mini)SEED headers, it's *currently* *only* a
    relatively minor inconsistency at XML level that is not too big to be
    handled. Besides the standard conformance this is IMHO the main
    advantage of "" compared to "--".
    
    Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
    
    Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
    
    Here is what you said about mapping:
    
    On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    In general mappings are not the problem and are widely used anyway.
    
    Yes, of course mappings are not a problem, especially not from a
    technical point of view. But it also depends on the kind of mapping.
    
    BTW, I stated the above in the context of the mapping "" <-> " ", which
    is very easy using trim() et al. and in particular does not require any
    change to the *current* channel naming conventions. And which is also
    why I wrote "widely used". Technically a mapping to/from "--" would be
    quite different, because the range of values that need to be tested
    against e.g. in a simple comparison makes this more complicated. In
    practice, of course, one can implement this once as a library function
    or by creating a location code class and overloading the == operator.
    This is still considerably more work than just calling trim().
    
    With a mapping to/from "--" we also have the "forever" issue. With ""
    vs. " " this is not an issue at all, especially in view of the existing
    SEED headers. That's a big difference.
    
    So what is the problem with mapping?
    
    Because as already said, the mapping would be required "forever" due to
    persistence of the data. In particular, you cannot declare existing
    metadata invalid. Hence you would have to keep supporting "" and " ",
    too, to maintain backward compatibility.
    
    I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
    
    This is already a very technical discussion and where you detect
    "personal indignation" is left as an exercise to the reader.
    
    Here is the context: "StationXML is the new dataless SEED, as such it
    should be compatible between data centers for at least the core
    parameters. Currently StationXML produced by SeisComP3 and other data
    centers for the same exact same channel can be documents that are
    semantically different channels (NSLC do not match). We would not do
    this with dataless SEED, right? Any notion that a reader of the XML
    must apply rules to the core name values is rubbish, no transformation
    should be needed at that point. These are documents that are being
    stored as files, loaded into databases, and otherwise saved."
    
    In my interpretation this is a statement against any mapping.
    
    But I recognize we are all in a learning curve and opinions evolve and
    sometimes change. Plus we are having two discussions on the list and
    off-list, each with several sub-threads. This probably creates
    additional confusion and doesn't quite help to focus on what the real
    issues are *currently*. Channel naming can be discussed and should,
    actually has been many times before, but can we not focus on what needs
    to be solved in very short time without introducing additional
    incompatibilities?
    
    All frustration about ugly empty location codes aside, I maintain that
    there are technically rather no issues with them. Nothing that cannot be
    solved quickly with rather few modifications plus a clarification in the
    FDSN StationXML specification. In fact I already proposed a clear
    timeline you might want to comment on. What follows is a quote from my
    email of July-24, 18:43 UTC to this list.
    
    ----------------------------------------------------------------------
    
    Actually we are currently seeking to solve a particular incompatibility
    between FDSN StationXML produced by different services, but technically
    that is much, *much* easier to achieve than the introduction of a new
    and incompatible channel naming. I would welcome an intensified
    discussion on the latter, but not in the context of the current FDSN
    StationXML or web services.
    
    It's actually quite strange that already now, early after the
    introduction of FDSN StationXML, we are not only choking over minor
    incompatibilities, but are discussing "solutions" to problems that
    apparently noone had noticed they existed before StationXML... Looks
    like shooting at sparrows with cannons, IMO.
    
    There used to be a IASPEI working group on station codes that even came
    up with a new channel naming "standard"[*], which, however, doesn't seem
    to have gained much acceptance so far. Nevertheless this is the level at
    which changes to channel naming need to be discussed, even though the
    process may be frustratingly slow. But the impact of such a change is
    just too big to be decided ad hoc.
    
    To summarize:
    
    We will not find a future-proof channel naming convention quickly.
    Partial changes, especially if incompatible, should be absolutely avoided.
    
    The particular problem we attempted (and still need) to solve in the
    first place is a location code incompatibility due to differently strict
    adherence to the SEED specification. Not surprisingly I prefer the
    empty-string representation for the empty location code. To be
    pragmatic, I propose the following time line:
    
    * Accept that at least for a transitional period we have to accept the
    existence of space-space and empty location codes.
    
    * During a transitional period, don't change the servers that now
    produce space-space location codes, as that would break compatibility
    with some clients. We want to keep compatibility rather than introducing
    new incompatibility.
    
    * Instead update the clients to accept both space-space and empty
    location codes by trimming trailing spaces if present. This is a
    relatively minor change and IIRC this is on IRIS's agenda already, which
    is highly appreciated.
    
    At this point in time, interoperability is restored, even without
    server-side changes. This is important as it may take quite some time
    for the users to actually upgrade their clients; but it doesn't hurt anyone.
    
    * Finally the server upgrades where needed. The decision as to when to
    upgrade the server side can be made once it is considered appropriate;
    there is absolutely no hurry from the client side.
    
    The needed changes for the above proposal are very small compared to the
    huge changes that would be required at every level to implement a new
    channel naming convention. This may (and hopefully will) take place some
    time in the future, but it requires a lot of preparation and
    coordination. I am pretty sure that we will have a considerable number
    of beers in the meantime.
    
    Besides the beers, we should focus on finalizing the specification of
    FDSN StationXML. There are too many under-defined elements even in the
    xsd and the risk of serious incompatibilities is very high.
    
    Cheers
    Joachim
    
    [*] http://www.isc.ac.uk/registries/download/IR_implementation.pdf
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-29 01:24:36
    
    Hi Philip
    
    Philip Crotwell [07/28/2014 03:37 PM]:
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue.:)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    Exactly. Good that you provided this as example because we were already
    getting lost so deep within the details that we may have forgotten that
    this thread has just moved to this list and that it might not have been
    clear to everybody what the issue actually is...
    
    Even few lines of XML can (sometimes) help make things clearer. ;)
    
    There are two basic issues being discussed (and yes, more beer would help!:)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    This would be ideal, but I think it is not realistic:
    
    If "--" were introduced, it would be impossible not to keep supporting "
    " and "" practically forever in order to maintain backward compatibility.
    
    If "" were to become the preferred empty location code, we still have
    probably billions of instances of " " out in the wild that should not
    be declared invalid.
    
    The same is true for " " resp. "".
    
    In short some mapping is required anyway. Fortunately the mapping
    between "" and " " is trivial by using methods like trim(), strip() or
    so (depending on the language). Most seismic data handling software
    already does it anyway because it's so obvious. For XML it's at least
    ObsPy and SeisComP. SEED readers that trim the location code include
    rdseed, libmseed and qlib2. All database engines provide a trim()
    method, so database queries are not a problem either.
    
    if trim(loc1) == trim(loc2) ...
    
    may be slightly more expensive in terms of CPU cycles than
    
    if loc1 == loc2 ...
    
    but I presume that this is nowhere a real issue. With the added benefit
    that the currently not strictly SEED compliant " " location code is
    then within the valid range kind of automatically.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    Compatibility is absolutely essential. This is probably the main reason
    why even after more than 10 years of discussion about new channel
    naming, there hasn't been any real progress AFAICS. And despite all
    shortcomings the current NSLC is really remarkable as it is accepted and
    used nearly everywhere. Don't put that at stake.
    
    Thanks btw for your other comments about potential issues related to
    white space.
    
    Cheers
    Joachim
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 10:18:47
    
    Thanks Philip, I think you have outlined the issues well.
    
    Regarding issue #1, I strongly feel that we need to choose one representation, the sooner we stop creating incompatible metadata the better.
    
    Regarding issue #2:
    
    b) two spaces=" "
    
    This is what IRIS currently does, not strictly SEED but avoids empty identifiers.
    
    c) two dashes="--".
    
    This would require work and continued mapping, the mapping is clear between SEED-based holdings and StationXML. SEED headers and data records could also be considered, but is a bigger can of worms.
    
    a) empty=""
    
    This is possibly the most straight forward mapping of SEED information, but leaves us with an empty string identifier.
    
    Below are a few of the issues we note regarding empty identifiers
    
    1) They are too similar to "unknown" (which results in potential ambiguity where channels are only differentiated by location ID):
    
    a) In many languages an empty string evaluates to false; if, for example, when program is testing for and then extracting a value from an XML document parsed into a structure/object it could appear as if the value was not present. Of course the coding in probably every language can be done to avoid such a false negative, but it is a pitfall that we would be asking all future users and coders to know about.
    
    b) In XPath (the query language for XSLT), which is used to search or translate XML, the matching of a string attribute usually uses the string() function. Specifying the string attribute to match when the attribute has a value is straightforward, when trying to match the empty string the query is for NOT string. In the boolean functions of XPath "a string is true if and only if its length is non-zero" (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a fringe technology, an empty string is not just another kind of string but an anomoly.
    
    c) In JavaScript the getAttribute() method returns the same value whether the attribute was an empty string or unspecified. The method is no longer recommended but illustrates that such thinking is not limited to niche projects.
    
    2) Organizing data in structures such as a nested hash is pretty common: %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty identifier as a key works in some languages but it is obtuse and unclear. I'm sure there are many other data structures that would use location by itself as a key.
    
    3) Empty identifiers are difficult to specify on the command line, URLs, etc. and non-obvious many other places such as GUI fields. We have largely addressed this issue for FDSN web services (at the DMC for other mechanisms as well) by making "--" a synonym for the empty location ID. In other words we are already mapping "--" into the empty location ID for requests and users are learning this association. A further adoption of the synonym into the metadata would solve many of these problems.
    
    4) While it is certainly not the FDSN's task to define data formats outside of its purview, the adoption or matching of the core channel naming fields in other formats is certainly in the FDSN's best interest. This has been happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially empty (optional?) location ID could make such adoption harder as it is an wrinkle, especially for space delimited formats. I believe these broader implications deserve some consideration.
    
    I'm sure most developers could come up with solutions to the technical problems, but an empty identifier leaves the unfortunate wrinkles for all future users and coders.
    
    Here is an example of someone that was confused by current metadata, I'll bet if there was a value in the locationCode it would have been easier:
    https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
    
    There is a chance we will end up with the empty location identifier, but the considerations should go beyond an assumption that an empty string is the only choice.
    
    Since an empty location field in SEED essentially means unset, perhaps we should consider making the locationCode attribute optional and leaving it out of the XML when it is empty in SEED. In this line of thinking, the empty string is just a hack to include a required attribute when in fact there is nothing to include. For me the "unset" aspect is unsettlingly similar to "unknown", but it's an idea preferred by at least one engineer at the DMC.
    
    Chad
    
    On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help! :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 14:18:39
    
    Hi
    
    Just another data point, Earthworm, which is widely used by regional
    networks globally, has long had the "dash dash is the same as space
    space" convention. So dash dash is not something pulled out of thin
    air, it is how at least I do things already.
    
    And this shows that it is fairly common (if not technically correct)
    for users to regard space-space as the location id instead of
    regarding it as null with 2 spaces for padding. My guess is that very
    few users are aware of this, and even as someone who has been writing
    seismic software for a couple of decades I still think of the location
    id as space-space, not null.
    
    http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
    
    Philip
    
    On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Thanks Philip, I think you have outlined the issues well.
    
    Regarding issue #1, I strongly feel that we need to choose one
    representation, the sooner we stop creating incompatible metadata the
    better.
    
    Regarding issue #2:
    
    b) two spaces=" "
    
    This is what IRIS currently does, not strictly SEED but avoids empty
    identifiers.
    
    c) two dashes="--".
    
    This would require work and continued mapping, the mapping is clear between
    SEED-based holdings and StationXML. SEED headers and data records could
    also be considered, but is a bigger can of worms.
    
    a) empty=""
    
    This is possibly the most straight forward mapping of SEED information, but
    leaves us with an empty string identifier.
    
    Below are a few of the issues we note regarding empty identifiers
    
    1) They are too similar to "unknown" (which results in potential ambiguity
    where channels are only differentiated by location ID):
    
    a) In many languages an empty string evaluates to false; if, for example,
    when program is testing for and then extracting a value from an XML document
    parsed into a structure/object it could appear as if the value was not
    present. Of course the coding in probably every language can be done to
    avoid such a false negative, but it is a pitfall that we would be asking all
    future users and coders to know about.
    
    b) In XPath (the query language for XSLT), which is used to search or
    translate XML, the matching of a string attribute usually uses the string()
    function. Specifying the string attribute to match when the attribute has a
    value is straightforward, when trying to match the empty string the query is
    for NOT string. In the boolean functions of XPath "a string is true if and
    only if its length is non-zero"
    (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
    fringe technology, an empty string is not just another kind of string but an
    anomoly.
    
    c) In JavaScript the getAttribute() method returns the same value whether
    the attribute was an empty string or unspecified. The method is no longer
    recommended but illustrates that such thinking is not limited to niche
    projects.
    
    2) Organizing data in structures such as a nested hash is pretty common:
    %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
    identifier as a key works in some languages but it is obtuse and unclear.
    I'm sure there are many other data structures that would use location by
    itself as a key.
    
    3) Empty identifiers are difficult to specify on the command line, URLs,
    etc. and non-obvious many other places such as GUI fields. We have largely
    addressed this issue for FDSN web services (at the DMC for other mechanisms
    as well) by making "--" a synonym for the empty location ID. In other words
    we are already mapping "--" into the empty location ID for requests and
    users are learning this association. A further adoption of the synonym into
    the metadata would solve many of these problems.
    
    4) While it is certainly not the FDSN's task to define data formats outside
    of its purview, the adoption or matching of the core channel naming fields
    in other formats is certainly in the FDSN's best interest. This has been
    happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
    empty (optional?) location ID could make such adoption harder as it is an
    wrinkle, especially for space delimited formats. I believe these broader
    implications deserve some consideration.
    
    I'm sure most developers could come up with solutions to the technical
    problems, but an empty identifier leaves the unfortunate wrinkles for all
    future users and coders.
    
    Here is an example of someone that was confused by current metadata, I'll
    bet if there was a value in the locationCode it would have been easier:
    https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
    
    There is a chance we will end up with the empty location identifier, but the
    considerations should go beyond an assumption that an empty string is the
    only choice.
    
    Since an empty location field in SEED essentially means unset, perhaps we
    should consider making the locationCode attribute optional and leaving it
    out of the XML when it is empty in SEED. In this line of thinking, the
    empty string is just a hack to include a required attribute when in fact
    there is nothing to include. For me the "unset" aspect is unsettlingly
    similar to "unknown", but it's an idea preferred by at least one engineer at
    the DMC.
    
    Chad
    
    On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help!
    :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 15:59:04
    
    Yet another data point, going all the way back to vol 1 issue 1 of the
    DMC newsletter introducing location ids:
    
    "The Location Identifier is a two character code that, when used in
    conjunction with the other data specifiers, uniquely identifies a data
    stream."
    and
    "Historically, within a SEED volume, the Location Identifier was left
    “blank” (consisted of two spaces)."
    and
    "GSN Use of Location Identifiers
    Valid characters for location identifiers are [space, 0-9, A-Z][space,
    0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "
    
    http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
    
    From this it seems that location id was intended to be exactly 2
    
    characters, not zero or two. My feeling is that we have a long
    tradition of the location id being "space-space" and not null or
    empty. Personally I really dislike space-space, but the only thing I
    dislike more than space-space is empty.
    
    Philip
    
    On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Just another data point, Earthworm, which is widely used by regional
    networks globally, has long had the "dash dash is the same as space
    space" convention. So dash dash is not something pulled out of thin
    air, it is how at least I do things already.
    
    And this shows that it is fairly common (if not technically correct)
    for users to regard space-space as the location id instead of
    regarding it as null with 2 spaces for padding. My guess is that very
    few users are aware of this, and even as someone who has been writing
    seismic software for a couple of decades I still think of the location
    id as space-space, not null.
    
    http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
    
    Philip
    
    On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
    
    Thanks Philip, I think you have outlined the issues well.
    
    Regarding issue #1, I strongly feel that we need to choose one
    representation, the sooner we stop creating incompatible metadata the
    better.
    
    Regarding issue #2:
    
    b) two spaces=" "
    
    This is what IRIS currently does, not strictly SEED but avoids empty
    identifiers.
    
    c) two dashes="--".
    
    This would require work and continued mapping, the mapping is clear between
    SEED-based holdings and StationXML. SEED headers and data records could
    also be considered, but is a bigger can of worms.
    
    a) empty=""
    
    This is possibly the most straight forward mapping of SEED information, but
    leaves us with an empty string identifier.
    
    Below are a few of the issues we note regarding empty identifiers
    
    1) They are too similar to "unknown" (which results in potential ambiguity
    where channels are only differentiated by location ID):
    
    a) In many languages an empty string evaluates to false; if, for example,
    when program is testing for and then extracting a value from an XML document
    parsed into a structure/object it could appear as if the value was not
    present. Of course the coding in probably every language can be done to
    avoid such a false negative, but it is a pitfall that we would be asking all
    future users and coders to know about.
    
    b) In XPath (the query language for XSLT), which is used to search or
    translate XML, the matching of a string attribute usually uses the string()
    function. Specifying the string attribute to match when the attribute has a
    value is straightforward, when trying to match the empty string the query is
    for NOT string. In the boolean functions of XPath "a string is true if and
    only if its length is non-zero"
    (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
    fringe technology, an empty string is not just another kind of string but an
    anomoly.
    
    c) In JavaScript the getAttribute() method returns the same value whether
    the attribute was an empty string or unspecified. The method is no longer
    recommended but illustrates that such thinking is not limited to niche
    projects.
    
    2) Organizing data in structures such as a nested hash is pretty common:
    %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
    identifier as a key works in some languages but it is obtuse and unclear.
    I'm sure there are many other data structures that would use location by
    itself as a key.
    
    3) Empty identifiers are difficult to specify on the command line, URLs,
    etc. and non-obvious many other places such as GUI fields. We have largely
    addressed this issue for FDSN web services (at the DMC for other mechanisms
    as well) by making "--" a synonym for the empty location ID. In other words
    we are already mapping "--" into the empty location ID for requests and
    users are learning this association. A further adoption of the synonym into
    the metadata would solve many of these problems.
    
    4) While it is certainly not the FDSN's task to define data formats outside
    of its purview, the adoption or matching of the core channel naming fields
    in other formats is certainly in the FDSN's best interest. This has been
    happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
    empty (optional?) location ID could make such adoption harder as it is an
    wrinkle, especially for space delimited formats. I believe these broader
    implications deserve some consideration.
    
    I'm sure most developers could come up with solutions to the technical
    problems, but an empty identifier leaves the unfortunate wrinkles for all
    future users and coders.
    
    Here is an example of someone that was confused by current metadata, I'll
    bet if there was a value in the locationCode it would have been easier:
    https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
    
    There is a chance we will end up with the empty location identifier, but the
    considerations should go beyond an assumption that an empty string is the
    only choice.
    
    Since an empty location field in SEED essentially means unset, perhaps we
    should consider making the locationCode attribute optional and leaving it
    out of the XML when it is empty in SEED. In this line of thinking, the
    empty string is just a hack to include a required attribute when in fact
    there is nothing to include. For me the "unset" aspect is unsettlingly
    similar to "unknown", but it's an idea preferred by at least one engineer at
    the DMC.
    
    Chad
    
    On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would help!
    :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has 1-N
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the same.
    So an empty or blank string is the same, valid location code, and null is
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not needed
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a restful WS
    URI? (Or a URI on the command line!) So it may be that the use of "--" for
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02€ = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are not
    addressing are the concerns about an empty ID that have been brought up by
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying. The
    observations from Python, ObsPy and SeisComP are a few of many that need to
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might see
    where this proposal comes from. Yes, we would need to treat empty location
    IDs and "--" as synonyms for a very long time. Empty strings in XML mean
    you will need to map empty IDs to empty strings, NULL and whatever an XML
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the the
    answer is obvious to me: the FDSN can sanction that the two-space string is
    the XML synonym for the empty SEED location ID and we adjust the schema to
    make sure a string of whitespaces is preserved. Then SeisComP can change
    its relatively new StationXML implementation and ALL existing clients will
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in effect,
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly not
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but they
    are here and something the DMC and users of the DMC have dealt with. Issues
    have been identified with empty location IDs by us and our users. If DMC is
    going to change, and push the change on all users of the DMC's StationXML,
    it would be much more compelling to have a solution that addresses the low
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent empty
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean what
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data format
    cannot accommodate existing channel naming, then the new format is flawed.
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an empty
    string and a parser has to produce it as such. Not as null, nil or none, but
    as an empty string. Otherwise the parser is broken and needs to be fixed,
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming is
    not particularly beautiful and has limitations. But our business is not to
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Ellen Yu
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-02 05:18:34
    
    Hi,
    
    I realize I am coming pretty late to the party here, but I'll chime in
    anyway. At SCEDC (the archive for the Southern California Seismic
    Network), we represent our empty location codes with two blank spaces as
    well. I suspect the same is done at the Northern California Earthquake
    Data Center.
    
    There are great arguments here for each of the options presented, but I
    think unless we decide to make location id optional we should not use an
    empty string to denote an unset location id in StationXML. I think there
    is enough variety in how an empty string is treated in different
    programming languages and databases to be problematic. From some
    databases' perspectives, you really might as well make it null at that
    point.
    
    So I would say, if location id is required, use a two character
    substitution; my personal preference is two spaces as that seems to be the
    convention (although it ain't pretty) - and we should consider in a future
    version of stationXML making the location id optional.
    
    Ellen
    
    On Thu, Jul 31, 2014 at 5:59 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
    wrote:
    
    Yet another data point, going all the way back to vol 1 issue 1 of the
    DMC newsletter introducing location ids:
    
    "The Location Identifier is a two character code that, when used in
    conjunction with the other data specifiers, uniquely identifies a data
    stream."
    and
    "Historically, within a SEED volume, the Location Identifier was left
    "blank" (consisted of two spaces)."
    and
    "GSN Use of Location Identifiers
    Valid characters for location identifiers are [space, 0-9, A-Z][space,
    0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "
    
    http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
    
    From this it seems that location id was intended to be exactly 2
    characters, not zero or two. My feeling is that we have a long
    tradition of the location id being "space-space" and not null or
    empty. Personally I really dislike space-space, but the only thing I
    dislike more than space-space is empty.
    
    Philip
    
    On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
    wrote:
    
    Hi
    
    Just another data point, Earthworm, which is widely used by regional
    networks globally, has long had the "dash dash is the same as space
    space" convention. So dash dash is not something pulled out of thin
    air, it is how at least I do things already.
    
    And this shows that it is fairly common (if not technically correct)
    for users to regard space-space as the location id instead of
    regarding it as null with 2 spaces for padding. My guess is that very
    few users are aware of this, and even as someone who has been writing
    seismic software for a couple of decades I still think of the location
    id as space-space, not null.
    
    http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
    
    Philip
    
    On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu>
    
    wrote:
    
    Thanks Philip, I think you have outlined the issues well.
    
    Regarding issue #1, I strongly feel that we need to choose one
    representation, the sooner we stop creating incompatible metadata the
    better.
    
    Regarding issue #2:
    
    b) two spaces=" "
    
    This is what IRIS currently does, not strictly SEED but avoids empty
    identifiers.
    
    c) two dashes="--".
    
    This would require work and continued mapping, the mapping is clear
    
    between
    
    SEED-based holdings and StationXML. SEED headers and data records could
    also be considered, but is a bigger can of worms.
    
    a) empty=""
    
    This is possibly the most straight forward mapping of SEED information,
    
    but
    
    leaves us with an empty string identifier.
    
    Below are a few of the issues we note regarding empty identifiers
    
    1) They are too similar to "unknown" (which results in potential
    
    ambiguity
    
    where channels are only differentiated by location ID):
    
    a) In many languages an empty string evaluates to false; if, for
    
    example,
    
    when program is testing for and then extracting a value from an XML
    
    document
    
    parsed into a structure/object it could appear as if the value was not
    present. Of course the coding in probably every language can be done to
    avoid such a false negative, but it is a pitfall that we would be
    
    asking all
    
    future users and coders to know about.
    
    b) In XPath (the query language for XSLT), which is used to search or
    translate XML, the matching of a string attribute usually uses the
    
    string()
    
    function. Specifying the string attribute to match when the attribute
    
    has a
    
    value is straightforward, when trying to match the empty string the
    
    query is
    
    for NOT string. In the boolean functions of XPath "a string is true if
    
    and
    
    only if its length is non-zero"
    (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
    fringe technology, an empty string is not just another kind of string
    
    but an
    
    anomoly.
    
    c) In JavaScript the getAttribute() method returns the same value
    
    whether
    
    the attribute was an empty string or unspecified. The method is no
    
    longer
    
    recommended but illustrates that such thinking is not limited to niche
    projects.
    
    2) Organizing data in structures such as a nested hash is pretty common:
    %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
    identifier as a key works in some languages but it is obtuse and
    
    unclear.
    
    I'm sure there are many other data structures that would use location by
    itself as a key.
    
    3) Empty identifiers are difficult to specify on the command line, URLs,
    etc. and non-obvious many other places such as GUI fields. We have
    
    largely
    
    addressed this issue for FDSN web services (at the DMC for other
    
    mechanisms
    
    as well) by making "--" a synonym for the empty location ID. In other
    
    words
    
    we are already mapping "--" into the empty location ID for requests and
    users are learning this association. A further adoption of the synonym
    
    into
    
    the metadata would solve many of these problems.
    
    4) While it is certainly not the FDSN's task to define data formats
    
    outside
    
    of its purview, the adoption or matching of the core channel naming
    
    fields
    
    in other formats is certainly in the FDSN's best interest. This has
    
    been
    
    happening for a long time already (ISF/IASPEI, GSE, etc.). The
    
    potentially
    
    empty (optional?) location ID could make such adoption harder as it is
    
    an
    
    wrinkle, especially for space delimited formats. I believe these
    
    broader
    
    implications deserve some consideration.
    
    I'm sure most developers could come up with solutions to the technical
    problems, but an empty identifier leaves the unfortunate wrinkles for
    
    all
    
    future users and coders.
    
    Here is an example of someone that was confused by current metadata,
    
    I'll
    
    bet if there was a value in the locationCode it would have been easier:
    
    https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
    
    There is a chance we will end up with the empty location identifier,
    
    but the
    
    considerations should go beyond an assumption that an empty string is
    
    the
    
    only choice.
    
    Since an empty location field in SEED essentially means unset, perhaps
    
    we
    
    should consider making the locationCode attribute optional and leaving
    
    it
    
    out of the XML when it is empty in SEED. In this line of thinking, the
    empty string is just a hack to include a required attribute when in fact
    there is nothing to include. For me the "unset" aspect is unsettlingly
    similar to "unknown", but it's an idea preferred by at least one
    
    engineer at
    
    the DMC.
    
    Chad
    
    On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
    
    wrote:
    
    Hi
    
    Being on the cheap side of the Atlantic, I'll save us $0.00068 and
    make a stab at the underlying issue. :)
    
    Here, with lots of stuff cut out, is how a channel is "identified" in
    stationXML via the fdsn station web service at the IRIS DMC,
    
    http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode=" " code="BHZ">
    
    Another implementation of the same web service (not sure of url) gives
    back this:
    
    <Network code="GE" >
    <Station code="UGM">
    <Channel locationCode="" code="BHZ">
    
    with locationCode="" vs =" " being the difference under consideration.
    
    There are two basic issues being discussed (and yes, more beer would
    
    help!
    
    :)
    
    1) Should all valid stationXML documents be required to use the exact
    same string of characters to represent the location id for this
    channel. This is would allow a comparison operation to be "simple" in
    that it can compare the attribute values without additional
    processing.
    
    2) If we agree to 1), then what should those exact characters be? The
    current top choices are
    a) empty=""
    b) two spaces=" "
    c) two dashes="--".
    
    1) seems less controversial than 2) in that greater compatibility is
    generally seen as positive.
    
    This is primarily a question about the form of the stationXML
    documents, but obviously there are connections to the way requests are
    formed, the relationship to miniseed/seed, the way things are coded in
    software and how much detailed understanding we expect of end users.
    
    Philip
    
    On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
    
    Hello all,
    
    Can someone give a concise statement of the original problem being
    discussed, it only or primarily a concern about XML?
    
    It seems to me that with modern languages a string that is empty or has
    
    1-N
    
    spaces is the same thing - there are often implicit or explicit trim()
    function hiding in a processing pipeline. A null string is not the
    
    same.
    
    So an empty or blank string is the same, valid location code, and null
    
    is
    
    undefined or uninitialized location code.
    
    With regards to the "--" pseudo for the location code, is this not
    
    needed
    
    because sometimes it is not possible or difficult to represent an empty
    string or even a string? For example on the command line or in a
    
    restful WS
    
    URI? (Or a URI on the command line!) So it may be that the use of
    
    "--" for
    
    intermediate processing and requests could be tolerated and somehow
    official, while empty or only-blanks strings official and for persistent
    data.
    
    Just my 0.02 EURO = $0.0268
    
    Best regards to all,
    
    Anthony
    
    On 27/07/2014 04:52, Chad Trabant wrote:
    
    Hi Marcelo,
    
    Thanks for your thoughts as well. Something that you and Joachim are
    
    not
    
    addressing are the concerns about an empty ID that have been brought up
    
    by
    
    more than one person. The answer that empty strings are technically
    possible and it all works in Python/SeisComP is less than satisfying.
    
    The
    
    observations from Python, ObsPy and SeisComP are a few of many that
    
    need to
    
    be taken into account.
    
    I agree that there is a long tail consideration for the "--" location ID
    solution. Understand that some folks find an empty ID to be problematic
    regardless of whether it is XML, SEED, text, whatever, then you might
    
    see
    
    where this proposal comes from. Yes, we would need to treat empty
    
    location
    
    IDs and "--" as synonyms for a very long time. Empty strings in XML
    
    mean
    
    you will need to map empty IDs to empty strings, NULL and whatever an
    
    XML
    
    parser might or might not produce for a long time as well (think beyond
    Python and SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    If the main considerations are for the least amount of disruption the
    
    the
    
    answer is obvious to me: the FDSN can sanction that the two-space
    
    string is
    
    the XML synonym for the empty SEED location ID and we adjust the schema
    
    to
    
    make sure a string of whitespaces is preserved. Then SeisComP can
    
    change
    
    its relatively new StationXML implementation and ALL existing clients
    
    will
    
    be compatible with all metadata and, mostly importantly, we would have
    consistent metadata.
    
    If the empty string ID representation is adopted it would would, in
    
    effect,
    
    mean that the DMC would need to change its metadata service and (more
    importantly) all users of the DMC's metadata service would need to
    transition to a new metadata channel naming scheme. This is certainly
    
    not
    
    out of the question, but it is not something we would do without careful
    consideration. I do not find the two-space strings all that great, but
    
    they
    
    are here and something the DMC and users of the DMC have dealt with.
    
    Issues
    
    have been identified with empty location IDs by us and our users. If
    
    DMC is
    
    going to change, and push the change on all users of the DMC's
    
    StationXML,
    
    it would be much more compelling to have a solution that addresses the
    
    low
    
    level issues.
    
    regards,
    Chad
    
    ----- Original Message -----
    From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
    To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
    Sent: Friday, July 25, 2014 7:38:17 PM
    Subject: Re: [webservices] A question of location ID, how to represent
    
    empty
    
    IDs in XML?
    
    Hi Philip and All,
    
    I totaly agree with Joachim, was planning to answer but he was much
    faster. What you guys are proposing is not a solution. the station XML
    supports nicely the empty string and it is not null. There is a type
    difference here in Python and in any other language and can be nicely
    handled internally.
    
    Also the location id is not just a string it is a key entry to link
    miniseed to metadata and making an exception at this level just
    because a user interface cannot proper render it without ambiguity
    does not sounds like a proper way proposal. I am not favorable in
    creating an exception that will have to be carried over along the
    decades to come. Alternatives solutions for this issue should be
    searched on the end user interface.
    
    with my best regards,
    
    Marcelo Bianchi
    --
    
    2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    I would argue that change is hard and so if we don't do it now it will
    never happen. StationXML is new enough that there is already a
    disruption, we should seize the chance. If we do not do something now
    about null loc ids, it will be a decade or two before we get another
    chance.
    
    It is time to drive the stake through the heart of null location ids.
    Kill the evil while we have a chance.
    
    Philip
    
    On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de>
    
    wrote:
    
    Hello Rob,
    
    Rob Newman wrote on 24.07.2014 18:51:
    
    For what it's worth, I would also vote for the "--" standard. To quote
    from the Zen of Python
    <
    
    http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
    
    (my language of choice):
    
    "Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced."
    
    I'd add "Compatible is better than incompatible." :)
    
    Number 2 is especially relevant here:
    "Explicit is better than implicit."
    
    My favorite would be:
    
    "Special cases aren't special enough to break the rules."
    
    Quoted whitespace and nulls are painful. Code what you mean, and mean
    
    what
    
    you code. It's easier for everyone.
    
    But what if we simply *mean* "empty string"?
    
    The issue is not about beauty, pain or ease. It's about standard
    conformance. We already have a channel naming standard. If a new data
    
    format
    
    cannot accommodate existing channel naming, then the new format is
    
    flawed.
    
    But that's not even the case here...
    
    An XML document that contains
    
    <Channel locationCode="" ...
    
    is not malformed. There's an attribute that *explicitly* contains an
    
    empty
    
    string and a parser has to produce it as such. Not as null, nil or
    
    none, but
    
    as an empty string. Otherwise the parser is broken and needs to be
    
    fixed,
    
    not the data!
    
    Again: It's not about beauty. We all agree that current channel naming
    
    is
    
    not particularly beautiful and has limitations. But our business is not
    
    to
    
    try to solve that issue now and here.
    
    Cheers
    Joachim
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    Sent from my iClayTablet
    
    ________________________________
    
    Anthony Lomax
    161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
    tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
    http://www.alomax.net
    
    Twitter: @ALomaxNet
    Science & Special Topics: http://www.alomax.net/science
    Software: http://www.alomax.net/software - updates:
    https://twitter.com/ALomaxNet
    ________________________________
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-08 01:08:33
    
    Hi Philip,
    
    Philip Crotwell wrote on 07/31/2014 02:59 PM:
    
    http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
    
    From this it seems that location id was intended to be exactly 2
    characters, not zero or two. My feeling is that we have a long
    tradition of the location id being "space-space" and not null or
    empty. Personally I really dislike space-space, but the only thing I
    dislike more than space-space is empty.
    
    Now we have the above IRIS newsletter article vs. the FDSN standard.
    Which one should be considered authoritative?
    
    Philip Crotwell wrote on 07/31/2014 01:18 PM:
    
    Just another data point, Earthworm, which is widely used by regional
    networks globally, has long had the "dash dash is the same as space
    space" convention. So dash dash is not something pulled out of thin
    air, it is how at least I do things already.
    
    OK, at least we know now where Chad and you got that idea from. ;)
    
    And this shows that it is fairly common (if not technically correct)
    for users to regard space-space as the location id instead of
    regarding it as null with 2 spaces for padding. My guess is that very
    few users are aware of this, and even as someone who has been writing
    seismic software for a couple of decades I still think of the location
    id as space-space, not null.
    
    http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
    
    If you say Earthworm I say SAC. The location code in SAC is trimmed just
    like in other software already mentioned. Does that convince you? Of
    course not, as we are here discussing neither Earthworm nor SAC channel
    naming convention.
    
    This discussion is about FDSN standard channel naming. Obviously neither
    Earthworm nor SAC count. Both use their own formats and within their
    respective ecosystems they can of course represent the location code in
    whatever way is considered appropriate, as long as the export to FDSN
    formats is done properly.
    
    Cheers
    Joachim
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 20:51:27
    
    Hi Chad
    
    Chad Trabant wrote on 27.07.2014 04:52:
    
    The answer that empty strings are technically possible and it all
    works in Python/SeisComP is less than satisfying. The observations
    from Python, ObsPy and SeisComP are a few of many that need to be
    taken into account.
    
    Please name a few. Not abstract claims or hearsay. Point us to client
    code that cannot parse an empty location code; only then someone can
    take a closer look at the matter and quite possibly provide help.
    
    Yes, we would need to treat empty location IDs and "--" as synonyms
    for a very long time. Empty strings in XML mean you will need to map
    empty IDs to empty strings, NULL and whatever an XML parser might or
    might not produce for a long time as well (think beyond Python and
    SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    I don't accept the parser issues unless you provide examples; see above.
    
    In general mappings are not the problem and are widely used anyway. Can
    you name a single software that when reading (Mini)SEED does *not* map
    the location code from " " to ""? Even libmseed does!
    
    So why not be consistent and do the same when parsing XML? It would
    solve the current issues. You can then keep your two spaces as long as
    you like. ;)
    
    If the main considerations are for the least amount of disruption the
    the answer is obvious to me: the FDSN can sanction that the two-space
    string is the XML synonym for the empty SEED location ID and we
    adjust the schema to make sure a string of whitespaces is preserved.
    Then SeisComP can change its relatively new StationXML implementation
    and ALL existing clients will be compatible with all metadata and,
    mostly importantly, we would have consistent metadata.
    
    Chad, this whole discussion started back in early January with your
    complaint about the SeisComP fdsnws server implementation. You were
    alleging that 'The resulting StationXML includes empty location IDs
    (locationCode=“”), this is not allowed in SEED and therefore not allowed
    in StationXML.' If the SeisComP server were indeed producing wrong XML
    it would have been corrected long ago. But that's not the case! It's
    actually SeisComP that produces the more correct FDSN StationXML
    compared to IRIS XML, not only w.r.t. locationCode.
    
    Don't you think it is now time to roll up the sleeves and make your
    client codes work with standard compliant FDSN StationXML rather than
    doctoring an FDSN standard?
    
    If the empty string ID representation is adopted it would would, in
    effect, mean that the DMC would need to change its metadata service
    and (more importantly) all users of the DMC's metadata service would
    need to transition to a new metadata channel naming scheme. This is
    certainly not out of the question, but it is not something we would
    do without careful consideration. I do not find the two-space
    strings all that great, but they are here and something the DMC and
    users of the DMC have dealt with. Issues have been identified with
    empty location IDs by us and our users. If DMC is going to change,
    and push the change on all users of the DMC's StationXML, it would be
    much more compelling to have a solution that addresses the low level
    issues.
    
    Did you read my email of Thursday, 18:43 UTC? Following the ideas I
    outlined there, you are technically *not* required to change any of your
    servers. Only a few client codes are actually affected and even I was
    able to make the changes in one of those in 10 minutes. Of course, in
    total it will take longer, but if specific problematic cases related to
    parsing are identified and discussed, I am sure solutions can be found
    quickly. We have this list, we have skilled and enthusiastic people
    working on this, so why not use this as a platform even for more
    technical discussions? Or how about creating a "developer's corner"
    webservices-devel or so?
    
    Cheers
    Joachim
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-31 04:57:34
    
    On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
    
    Hi Chad
    
    Chad Trabant wrote on 27.07.2014 04:52:
    
    The answer that empty strings are technically possible and it all
    works in Python/SeisComP is less than satisfying. The observations
    from Python, ObsPy and SeisComP are a few of many that need to be
    taken into account.
    
    Please name a few. Not abstract claims or hearsay. Point us to client
    code that cannot parse an empty location code; only then someone can
    take a closer look at the matter and quite possibly provide help.
    
    OK, here are a few: IRIS-WS, IRIS Fetch scripts, irisFetch.m, JWEED and probably: SeisFile, EMERALD, Epicentral and all the other codes that users of the DMC have created to read the metadata we send them.
    
    The statement that observations from Python, ObsPy and SeisComP alone are insufficient evidence for key changes to FDSN formats is not an abstract claim or hearsay, it is rather obvious since they are not the only (or even majority) systems handling these formats.
    
    Yes, we would need to treat empty location IDs and "--" as synonyms
    for a very long time. Empty strings in XML mean you will need to map
    empty IDs to empty strings, NULL and whatever an XML parser might or
    might not produce for a long time as well (think beyond Python and
    SeisComP). Either is possible, only one of them is a unique
    mapping.
    
    I don't accept the parser issues unless you provide examples; see above.
    
    In general mappings are not the problem and are widely used anyway. Can
    you name a single software that when reading (Mini)SEED does *not* map
    the location code from " " to ""? Even libmseed does!
    
    The code that reads dataless SEED into the DMC's metadata tables. If you want two: the code that reads the values from the DMC's database and creates StationXML. But it doesn’t really matter.
    
    Yes, collapsing the spaces is very common and in fact how SEED specifies that it be done, no one is arguing this that I have read.
    
    So why not be consistent and do the same when parsing XML? It would
    solve the current issues. You can then keep your two spaces as long as
    you like. ;)
    
    Yes, it totally makes sense to keep the same thing going in XML, except that there have been some issues identified in both SEED and XML and this is an opportunity to begin addressing the low level issue. In essence, the empty string solution is not ideal, even if it is the most appropriate mapping given the current rules. More on this later.
    
    If the main considerations are for the least amount of disruption the
    the answer is obvious to me: the FDSN can sanction that the two-space
    string is the XML synonym for the empty SEED location ID and we
    adjust the schema to make sure a string of whitespaces is preserved.
    Then SeisComP can change its relatively new StationXML implementation
    and ALL existing clients will be compatible with all metadata and,
    mostly importantly, we would have consistent metadata.
    
    Chad, this whole discussion started back in early January with your
    complaint about the SeisComP fdsnws server implementation. You were
    alleging that 'The resulting StationXML includes empty location IDs
    (locationCode=“”), this is not allowed in SEED and therefore not allowed
    in StationXML.'
    
    And I have since written that my thoughts have changed, that indeed the location code in SEED does not contain spaces or is required to be two characters.
    
    The point I was making is that the least number of users would be effected if the FDSN decided to require two characters and allow spaces. I say this because I believe most of the users of StationXML get their metadata from the DMC at the moment and have already dealt with the metadata in some way.
    
    If the SeisComP server were indeed producing wrong XML
    it would have been corrected long ago. But that's not the case! It's
    actually SeisComP that produces the more correct FDSN StationXML
    compared to IRIS XML, not only w.r.t. locationCode.
    
    This statement is heavy on hubris and naivety.
    
    There in no easy way to determine if any given StationXML document is fully "correct". The schema does not have enough information to vet the contents of a StationXML document, it basically checks to make sure the layout is correct, so XML schema validity is not sufficient for "correct". Currently, the StationXML contents are supposed to follow the guidelines defined in SEED. I think many of us agree that we should work to put as many of the content rules as possible into future versions of the schema to clarify many of the gray areas of StationXML. The concept of "more correct" is qualitative when used generally and is rarely or never more important than "compatible with the consensus”.
    
    Such gray areas exist even within SEED. Within the FDSN here is how we have traditionally dealt with the gray areas: when implementing a piece of software to produce something already in production at another center(s) you usually use the other(s) as a reference (or collaborate with then). If important differences are found they are brought up and discussed civilly and a plan is made to make things compatible, usually with user impact being a high priority. Unfortunately, this is not how this current situation unfolded and we are left with incompatible metadata.
    
    Don't you think it is now time to roll up the sleeves and make your
    client codes work with standard compliant FDSN StationXML rather than
    doctoring an FDSN standard?
    
    You do not unilaterally decide what compliant FDSN StationXML is. As you well know I have made a proposal to the FDSN and asked for clarity on this issue, seems worth knowing where we are going.
    
    If the empty string ID representation is adopted it would would, in
    effect, mean that the DMC would need to change its metadata service
    and (more importantly) all users of the DMC's metadata service would
    need to transition to a new metadata channel naming scheme. This is
    certainly not out of the question, but it is not something we would
    do without careful consideration. I do not find the two-space
    strings all that great, but they are here and something the DMC and
    users of the DMC have dealt with. Issues have been identified with
    empty location IDs by us and our users. If DMC is going to change,
    and push the change on all users of the DMC's StationXML, it would be
    much more compelling to have a solution that addresses the low level
    issues.
    
    Did you read my email of Thursday, 18:43 UTC? Following the ideas I
    outlined there, you are technically *not* required to change any of your
    servers. Only a few client codes are actually affected and even I was
    able to make the changes in one of those in 10 minutes.
    
    You miss the main issue: the metadata is incompatible, some servers must change. There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, in fact it is an artifact of the same “anachronism” that you claim to dislike, bringing us back to SEED-like parsing.
    
    Of course, in
    total it will take longer, but if specific problematic cases related to
    parsing are identified and discussed, I am sure solutions can be found
    quickly. We have this list, we have skilled and enthusiastic people
    working on this, so why not use this as a platform even for more
    technical discussions? Or how about creating a "developer's corner"
    webservices-devel or so?
    
    Thanks for the suggestions. Technical discussions in other sub-threads.
    
    Chad
    
    Cheers
    Joachim
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-07-28 21:05:30
    
    Philip Crotwell [07/25/14 15:35]:
    
    It sounds like you are saying "change is hard, so we shouldn't do it".
    
    That depends very much on the kind of change I would say. The change
    that is currently being discussed is a hack that might help XML parser
    developers, with hefty repercussions otherwise.
    
    If that is the change, it indeed shouldn't be done.
    
    What I would highly welcome and support is a mature, future-proof
    channel naming concept (involving network codes, too!) with a clear
    implementation roadmap. There have been attempts in this direction, led
    by the USGS and the ISC, but they are not reflected in current FDSN
    StationXML.
    
    Cheers
    Joachim
- Chad Trabant
  
  Re: A question of location ID, how to represent empty IDs in XML?
  
  2014-07-24 18:34:07
  
  Hi Yazan,
  
  (passing along our in-person conversation for the list)
  
  I do not think allowing a null or optional location ID is a good idea, here is why: in SEED there is always a location ID (the two-byte field cannot be left out), it is always known; when it is empty it is still a specific location ID. Allowing optional location ID in XML leaves a translation from StationXML to SEED a bit ambiguous. The spec would have to clarify that "not present" always means the empty location ID in SEED, I find this translation not nearly as clear and obvious as having a real value present.
  
  As you say, many parsers will have problems with "" or " ". It should not be up to every reader (e.g. converters) of StationXML to properly interpret the multiple possible results coming out of any parser, the formats should have a unique and unambiguous mapping.
  
  Chad
  
  On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:
  
  Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
  <Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  vs
  <Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  vs
  <Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
  
  It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.
  
  If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.
  
  Yazan
  
  On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
  
  Hello WS users and developers,
  
  A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
  
  Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
  
  Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
  
  There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
  
  Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
  
  As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
  
  Here are the options being considered for mapping an empty location ID in SEED to StationXML:
  
  1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
  
  2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
  
  3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
  
  All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
  
  In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
  
  Thanks for reading this far. Your opinion and input is appreciated.
  
  regards,
  Chad
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
Joachim Saul

Re: A question of location ID, how to represent empty IDs in XML?

2014-07-25 03:43:31

Hi Chad/Philip,

thanks for reviving this discussion on the appropriate mailing list.

Chad Trabant [07/23/14 19:30]:

Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces.

Note that the padding spaces do not form part of the location code
string itself, according to the SEED specification, which only allows
alphanumeric characters.

Actually the location code is treated in the SEED specification not
differently than e.g. a station code, from which trailing spaces are
removed in every software that I know of.

BTW, I think the two spaces are not there to avoid having an empty
location ID, but are a relict from Fortran 77 days. :)

Following this practice, we express this in StationXML by setting
the locationCode attribute to a string of two spaces. We have done
this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).

On the other hand, even in the IRIS ecosystem the empty location code is
prominently used as empty string. Not everywhere, but e.g. the
well-known rdseed program removes the trailing spaces when reading SEED,
resulting in an empty C string if there are two padding spaces in the
location code field. A very natural way of dealing with the trailing
spaces, especially in view of the clear specifications in the SEED
manual. Also in the IRIS BUD file name convention (e.g. [1]), empty
location codes become empty strings, with no apparent problems with
mapping or otherwise.

There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this
follows the SEED rules of trimming the padding spaces from the
values.

Unfortunately this means there are now flavors of StationXML that
are incompatible in the core channel name identifiers. In other
words, two StationXML documents for the same SEED channel appear,
without extra field translation, to be different channels.

This depends on how you evaluate the location code. If you simply follow
the SEED specification and always trim the location code, like e.g.
ObsPy and rdseed do, the problem you describe is avoided altogether.

Of course, the requirement for removing trailing white space doesn't
come without the cost of a few more CPU cycles. But if that were an
issue we wouldn't be using XML, would we? Also, this rule would need to
be written into the future specification of FDSN StationXML.

As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.

Here are the options being considered for mapping an empty location
ID in SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have
been using this for a long while, it is not precisely the SEED value
(but the mapping could be formalized). Also, whitespace in
attributes does have some theoretical challenges: the wonky rules for
XML attributes related to whitespace handling require removal of
spaces in some cases (we have never heard of problems though).

2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.

And would be easy to keep compatible with the two spaces.

This representation is also widely used for a long time already, incl.
at IRIS (see above).

3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.

Let's not mix request mechanisms with the data format. Data formats are
a holy grail whereas request mechanisms change more frequently.

Suppose we could retrieve full SEED using the web services. Even then it
would be equally appropriate to use "--" on the request side. But there
is no justification for breaking data format compatibility just for
matching particular request mechanisms.

All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier
can be confused for “unknown” if the programmer is not careful,
which is semantically very different than “set to empty”. The
two-space strings that the DMC is currently using are also not ideal,
they are hard for humans to read and potentially weird with XML
rules. The dashed location ID avoids these issues but requires the
most change. I also think requiring all readers of StationXML to
translate (e.g. remove padding) is a bad idea, the values in SEED
should be uniquely mapped to values in StationXML.

I share your view that the empty location code is not optimal. However,
the world is not perfect and the empty location code is a fact we have
to live with and have been able to live with for decades. Seismologists
have learned how to handle it. Existing software libraries make the
empty location code as painless as possible. Technically it is a no-issue.

The solution to the empty location code is not to incompatibly break a
data format without a technical reason but only because of aesthetics.
Empty strings are represented in XML without problems, particularly if
used in XML attributes. In fact, it is an advantage of a modern XML
format that we don't need the padding spaces etc. any more.

Philip Crotwell [07/23/14 20:37]:

Years ago we had full SEED. Then because of keeping metadata updated,
we switched to a separation into dataless SEED + miniseed. Now,
because of the complexities and limitations of dataless SEED, the
future looks like StationXML + miniseed. I am all for this change,
but how the location id is resolved really needs to address not just
what do we do in StationXML, but what do we do in StationXML +
miniseed.

I also lean towards "--" for the simple reason that there are so many
instances where I have been bitten by spaces or nulls. Even though I
know about this, I still get caught. File names, urls, user gui
displays, etc all have problems with spaces nor nulls and as a
practical matter it is harder to see something that isn't there than
something that is there. Furthermore, using null or space-space is
really hard as a command line argument in the shell. That said, "--"
already means "long option name" in many *nix programs, so if we
were starting from scratch, underscores like "__" might be a better
choice. The SEED manual already lists underscore as a separate item
in the flags section (p32), so maybe worth considering.

In all of the above cases it is the interfaces that have to deal with
the empty location code. I agree that an empty string is not always easy
to visualize, but we know how to deal with it. Nothing prevents us from
using "--" or "__" in GUIs or external formats or input to the fdsnws's.
I myself use "__" e.g. in pick lists for ease of visualization,
awk/grep'ing etc.; but that has nothing to do with the XML or SEED
representation. The same is true for the request formats; as long as the
user knows how to explicitly specify an empty location code, it's fine.

But if option 3 is choosen, would there be any possibility of
amending the SEED spec so that "--" is actually valid within the
location id field, with the caveat that it is synonymous with
space-space/null, but "--" is the preferred value?

This would mean that GE.UGM.--.BHZ and GE.UGM..BHZ are equivalent, in
fact: identical stream ID's. Technically this is feasible. But are the
downstream software repercussions, let alone the confusion among the
data users a price we are willing to pay? I don't think so.

I realize that doing a global search and replace on a petabyte of
miniseed data is probably not going to happen, but it would be
really nice if whatever location id is in StationXML, it is exactly
2 characters and is the exact same 2 characters as in miniseed.

On the other hand, the use of XML is a chance to get rid of the fixed
field values with padding. This may not be relevant today, but it might
become in the future.

Frankly the whole idea of making location ids "optional" was a real
mistake IMHO. I am sure that anyone that has every written code to
deal with location ids has something that looks like: if (locid ==
null or locid == "" or locid == " " or locid == "--") then locid =
"--" which is just a painfully stupid thing to have to do over and
over and over again. Grumble grumble grumble.:(

But fortunately you do that only once and wrapping this into a library
function is a no-brainer.

On a side note, I am curious to know (technically) under what
circumstances locid==null would evaluate to true, considering

<xs:attribute name="locationCode" type="xs:string" use="required"/>

from the xsd[2].

Lastly, as far as I can tell the SEED spec doesn't disallow
null/empty station or channel codes, so addressing that at the same
time might be wise.

I haven't come across any of those but there it makes sense. Yet I don't
think we can or should prevent empty location codes. They are a very
common reality.

My $0.02, please pick one string, and only one string, and use it
everywhere.

If "only one string" is a requirement, it is probably the strongest
argument against a change.

"Only one string" will only work without deviation from the current use
of SEED location code. We can't recode the archives, let alone the local
archives users have built for their work over the years. Well,
technically it could be done, but I think we all agree that we don't
want to, as this would have to involve not only (Mini)SEED waveform data
but also meta data and parametric data. How about... QuakeML archives?
Datalogger firmware? We can't change all of that and if we add e.g. "--"
to the range of *possible* location codes, we still have to continue to
"forever" support the other representations in order to be backward
compatible.

Generally speaking, it is good to discuss future possibilities for
channel naming conventions, not only with respect to the location code.
But the naming should ideally be independent of the used data formats.
XML is a big step towards becoming less dependent on the limits imposed
by SEED, but we are not going to get rid of SEED for many years to come.

Actually we are currently seeking to solve a particular incompatibility
between FDSN StationXML produced by different services needs to be
solved, but technically that is much, *much* easier to achieve than the
introduction of a new and incompatible channel naming. I would welcome
an intensified discussion on the latter, but not in the context of the
current FDSN StationXML or web services.

It's actually quite strange that already now, early after the
introduction of FDSN StationXML, we are not only choking over minor
incompatibilities, but are discussing "solutions" to problems that
apparently noone had noticed they existed before StationXML... Looks
like shooting at sparrows with cannons, IMO.

There used to be a IASPEI working group on station codes that even came
up with a new channel naming "standard"[3], which, however, doesn't seem
to have gained much acceptance so far. Nevertheless this is the level at
which changes to channel naming need to be discussed, even though the
process may be frustratingly slow. But the impact of such a change is
just too big to be decided ad hoc.

To summarize:

We will not find a future-proof channel naming convention quickly.
Partial changes, especially if incompatible, should be absolutely avoided.

The particular problem we attempted (and still need) to solve in the
first place is a location code incompatibility due to differently strict
adherence to the SEED specification. Not surprisingly I prefer the
empty-string representation for the empty location code. To be
pragmatic, I propose the following time line:

* Accept that at least for a transitional period we have to accept the
existence of space-space and empty location codes.

* During a transitional period, don't change the servers that now
produce space-space location codes, as that would break compatibility
with some clients. We want to keep compatibility rather than introducing
new incompatibility.

* Instead update the clients to accept both space-space and empty
location codes by trimming trailing spaces if present. This is a
relatively minor change and IIRC this is on IRIS's agenda already, which
is highly appreciated.

At this point in time, interoperability is restored, even without
server-side changes. This is important as it may take quite some time
for the users to actually upgrade their clients; but it doesn't hurt anyone.

* Finally the server upgrades where needed. The decision as to when to
upgrade the server side can be made once it is considered appropriate;
there is absolutely no hurry from the client side.

The needed changes for the above proposal are very small compared to the
huge changes that would be required at every level to implement a new
channel naming convention. This may (and hopefully will) take place some
time in the future, but it requires a lot of preparation and
coordination. I am pretty sure that we will have a considerable number
of beers in the meantime. ;)

Besides the beers, we should focus on finalizing the specification of
FDSN StationXML. There are too many under-defined elements even in the
xsd and the risk of serious incompatibilities is very high.

Cheers
Joachim

[1] http://www.iris.edu/bud_stuff/bud_dir/GE/UGM/UGM.GE..BHZ.2014.205
[2] http://www.fdsn.org/xml/station/fdsn-station-1.0.xsd
[3] http://www.isc.ac.uk/registries/download/IR_implementation.pdf
Doug Neuhauser

Re: A question of location ID, how to represent empty IDs in XML?

2014-08-12 17:53:26

I've been following this thread, and thought it was time to chime in.

IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.

eg:

<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">

or

<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">

Personally I think the latter (blank trimmed) is better.

I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.

AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.

Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.

I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.

If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?

I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.

I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.

My 2 cents...

- Doug N

On 07/23/2014 10:30 AM, Chad Trabant wrote:

Hello WS users and developers,

A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.

Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.

Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).

There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.

Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.

As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.

Here are the options being considered for mapping an empty location
ID in SEED to StationXML:

1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).

2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.

3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.

All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.

In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.

Thanks for reading this far. Your opinion and input is appreciated.

regards,
Chad

_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices

--
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
- Jeremy Fee
  
  Re: A question of location ID, how to represent empty IDs in XML?
  
  2014-08-12 21:18:55
  
  SEED is a fixed width record format, most likely the reason for blank
  padded fields. I'd recommend not carrying that over into the XML format.
  
  The primary purpose of the channel code is to be a unique identifier, and
  an empty string is distinct from any non-empty value.
  
  Jeremy
  
  On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>
  wrote:
  
  I've been following this thread, and thought it was time to chime in.
  
  IMHO, the FDSN web services should follow the SEED convention.
  The SEED convention states that station, network, channel, and location
  are all blank-padded fields of fixed lengths.
  To me, this means that that we should either use the full blank-padded
  fields for ALL of these identifiers, or for none of them.
  
  eg:
  
  <Network code="G " >
  <Station code="KIP ">
  <Channel locationCode=" " code="BHZ">
  
  or
  
  <Network code="G" >
  <Station code="KIP">
  <Channel locationCode="" code="BHZ">
  
  Personally I think the latter (blank trimmed) is better.
  
  I agree that the blank location code is a pain when dealing with
  Oracle, white-space delimited fields such as command lines, etc,
  but unless we change the SEED convention, I don't see that making
  an aliases of "-" or "--" in FDSN station XML improves the situation.
  
  AFAIK, the ONLY reason that we struggle with the two-blank issue is
  that certain software (eg Oracle) cannot distinguish between the
  the empty string (string of length 0) and NULL. Therefore, the DMC,
  NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
  for the location code.
  
  Unless we propose to change the SEED standard, all of our data in
  our archives, and all of our current acquisition systems, I think
  that we have to live with "emtpy" location codes.
  
  I have not seen any compelling argument for representing a blank (empty)
  location code in FDSN station XML as anything but the empty string.
  
  If you want to have "" and " " be equivalent in FDSN station XML,
  you can simply change the schema definition of the field to be a "token"
  rather than a "string", in which case any representation with blanks will
  be reduced to the empty string. Problem solved?
  
  I note that the NCEDC implementation currently uses 1 blank " "
  for empty location code. I have no problem changing this if we can
  agree on a convention.
  
  I also note ironically that the TA network run by IRIS is one of the
  largest networks in terms of stations, and uses blank location codes.
  
  My 2 cents...
  
  - Doug N
  
  On 07/23/2014 10:30 AM, Chad Trabant wrote:
  
  Hello WS users and developers,
  
  A recent discussion between FDSN data centers is centered on
  representation of empty location IDs in StationXML, the default
  format returned by the fdsnws-station web service. The DMC may be
  changing how it represents location ID in XML and text formats based
  on these discussions. We are asking for input as any such change will
  effect users of our metadata service.
  
  Some background: In the SEED channel naming scheme there is a
  hierarchy of network, station, location and channel identifiers. Of
  these, it is only the location ID that is commonly accepted to be
  empty. In the SEED format the location ID is a two-character field,
  where the value is left justified and padded with spaces if needed.
  When the value is empty the field is simply two spaces of padding.
  
  Historically, and presumably to avoid having an empty location ID,
  the DMC has represented “empty” location IDs as a string of two
  spaces. Following this practice, we express this in StationXML by
  setting the locationCode attribute to a string of two spaces. We have
  done this so long we sometimes forget that it is not compliant with a
  strict reading of SEED, at best it falls into the vagaries of SEED,
  on the other hand we have been doing it for years with no apparent
  problems (in fact it has helpfully avoided an empty core
  identifier).
  
  There now exists another fdsnws-station implementation that returns
  StationXML with the locationCode attribute set to an empty string
  when the SEED value is empty. The justification is that this follows
  the SEED rules of trimming the padding spaces from the values.
  
  Unfortunately this means there are now flavors of StationXML that are
  incompatible in the core channel name identifiers. In other words,
  two StationXML documents for the same SEED channel appear, without
  extra field translation, to be different channels.
  
  As most of you are users of SEED and StationXML metadata (at some
  level) and some of you have written code to parse these formats and
  manage the data returned by the DMC and other FDSN data centers, we
  are asking for your input regarding the potential solutions.
  
  Here are the options being considered for mapping an empty location
  ID in SEED to StationXML:
  
  1) Set locationCode to two spaces. While the DMC and users have been
  using this for a long while, it is not precisely the SEED value (but
  the mapping could be formalized). Also, whitespace in attributes does
  have some theoretical challenges: the wonky rules for XML attributes
  related to whitespace handling require removal of spaces in some
  cases (we have never heard of problems though).
  
  2) Set locationCode to an empty string. This would match the strict
  value present in SEED, an empty identifier.
  
  3) Set locationCode to “--“ (two dashes). This avoids issues with
  whitespace in XML attribute values and avoids issues with an empty
  identifier. Also, this matches the request mechanisms where “--“ is
  accepted as a synonym for an empty location ID.
  
  All of these solutions are viable in that we can make them work in
  code, it is a matter of choosing one for future FDSN metadata, pick
  your poison so to speak.
  
  In my personal opinion, an empty location ID is an unfortunate quirk
  of SEED that we should rectify in StationXML. An empty identifier can
  be confused for “unknown” if the programmer is not careful, which is
  semantically very different than “set to empty”. The two-space
  strings that the DMC is currently using are also not ideal, they are
  hard for humans to read and potentially weird with XML rules. The
  dashed location ID avoids these issues but requires the most change.
  I also think requiring all readers of StationXML to translate (e.g.
  remove padding) is a bad idea, the values in SEED should be uniquely
  mapped to values in StationXML.
  
  Thanks for reading this far. Your opinion and input is appreciated.
  
  regards,
  Chad
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  
  --
  ------------------------------------------------------------------------
  Doug Neuhauser University of California, Berkeley
  doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
  Office: 510-642-0931 215 McCone Hall # 4760
  Fax: 510-643-5811 Berkeley, CA 94720-4760
  Remote: 530-752-5615 (Wed,Fri)
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  - Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-12 21:41:46
    
    No argument that the padding spaces should be left behind.
    
    Assuming we are unwilling to actually address the blank location IDs we should consider making the attribute optional. It has been suggested by a few folks already.
    
    One could argue that from a purists' point of view a blank location in SEED is an unset location, so the purist mapping of this is to leave it unset in XML. This would be simply done by changing the schema to make the attribute optional. (this is not my favorite idea, I've argued against it, but at least it is cleanly follows SEED common practice).
    
    Allowing the value to be optional in SEED (within the limitations of a fixed width format) and required in StationXML is trying to eat your cake and keep it too. We do not force other optional string values of SEED to be required in the XML, so why make an exception for location?
    
    Chad
    
    On Aug 12, 2014, at 12:18 PM, Fee, Jeremy <jmfee<at>usgs.gov> wrote:
    
    SEED is a fixed width record format, most likely the reason for blank padded fields. I'd recommend not carrying that over into the XML format.
    
    The primary purpose of the channel code is to be a unique identifier, and an empty string is distinct from any non-empty value.
    
    Jeremy
    
    On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
    I've been following this thread, and thought it was time to chime in.
    
    IMHO, the FDSN web services should follow the SEED convention.
    The SEED convention states that station, network, channel, and location
    are all blank-padded fields of fixed lengths.
    To me, this means that that we should either use the full blank-padded
    fields for ALL of these identifiers, or for none of them.
    
    eg:
    
    <Network code="G " >
    <Station code="KIP ">
    <Channel locationCode=" " code="BHZ">
    
    or
    
    <Network code="G" >
    <Station code="KIP">
    <Channel locationCode="" code="BHZ">
    
    Personally I think the latter (blank trimmed) is better.
    
    I agree that the blank location code is a pain when dealing with
    Oracle, white-space delimited fields such as command lines, etc,
    but unless we change the SEED convention, I don't see that making
    an aliases of "-" or "--" in FDSN station XML improves the situation.
    
    AFAIK, the ONLY reason that we struggle with the two-blank issue is
    that certain software (eg Oracle) cannot distinguish between the
    the empty string (string of length 0) and NULL. Therefore, the DMC,
    NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
    for the location code.
    
    Unless we propose to change the SEED standard, all of our data in
    our archives, and all of our current acquisition systems, I think
    that we have to live with "emtpy" location codes.
    
    I have not seen any compelling argument for representing a blank (empty)
    location code in FDSN station XML as anything but the empty string.
    
    If you want to have "" and " " be equivalent in FDSN station XML,
    you can simply change the schema definition of the field to be a "token"
    rather than a "string", in which case any representation with blanks will
    be reduced to the empty string. Problem solved?
    
    I note that the NCEDC implementation currently uses 1 blank " "
    for empty location code. I have no problem changing this if we can
    agree on a convention.
    
    I also note ironically that the TA network run by IRIS is one of the
    largest networks in terms of stations, and uses blank location codes.
    
    My 2 cents...
    
    - Doug N
    
    On 07/23/2014 10:30 AM, Chad Trabant wrote:
    
    Hello WS users and developers,
    
    A recent discussion between FDSN data centers is centered on
    representation of empty location IDs in StationXML, the default
    format returned by the fdsnws-station web service. The DMC may be
    changing how it represents location ID in XML and text formats based
    on these discussions. We are asking for input as any such change will
    effect users of our metadata service.
    
    Some background: In the SEED channel naming scheme there is a
    hierarchy of network, station, location and channel identifiers. Of
    these, it is only the location ID that is commonly accepted to be
    empty. In the SEED format the location ID is a two-character field,
    where the value is left justified and padded with spaces if needed.
    When the value is empty the field is simply two spaces of padding.
    
    Historically, and presumably to avoid having an empty location ID,
    the DMC has represented “empty” location IDs as a string of two
    spaces. Following this practice, we express this in StationXML by
    setting the locationCode attribute to a string of two spaces. We have
    done this so long we sometimes forget that it is not compliant with a
    strict reading of SEED, at best it falls into the vagaries of SEED,
    on the other hand we have been doing it for years with no apparent
    problems (in fact it has helpfully avoided an empty core
    identifier).
    
    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string
    when the SEED value is empty. The justification is that this follows
    the SEED rules of trimming the padding spaces from the values.
    
    Unfortunately this means there are now flavors of StationXML that are
    incompatible in the core channel name identifiers. In other words,
    two StationXML documents for the same SEED channel appear, without
    extra field translation, to be different channels.
    
    As most of you are users of SEED and StationXML metadata (at some
    level) and some of you have written code to parse these formats and
    manage the data returned by the DMC and other FDSN data centers, we
    are asking for your input regarding the potential solutions.
    
    Here are the options being considered for mapping an empty location
    ID in SEED to StationXML:
    
    1) Set locationCode to two spaces. While the DMC and users have been
    using this for a long while, it is not precisely the SEED value (but
    the mapping could be formalized). Also, whitespace in attributes does
    have some theoretical challenges: the wonky rules for XML attributes
    related to whitespace handling require removal of spaces in some
    cases (we have never heard of problems though).
    
    2) Set locationCode to an empty string. This would match the strict
    value present in SEED, an empty identifier.
    
    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.
    
    All of these solutions are viable in that we can make them work in
    code, it is a matter of choosing one for future FDSN metadata, pick
    your poison so to speak.
    
    In my personal opinion, an empty location ID is an unfortunate quirk
    of SEED that we should rectify in StationXML. An empty identifier can
    be confused for “unknown” if the programmer is not careful, which is
    semantically very different than “set to empty”. The two-space
    strings that the DMC is currently using are also not ideal, they are
    hard for humans to read and potentially weird with XML rules. The
    dashed location ID avoids these issues but requires the most change.
    I also think requiring all readers of StationXML to translate (e.g.
    remove padding) is a bad idea, the values in SEED should be uniquely
    mapped to values in StationXML.
    
    Thanks for reading this far. Your opinion and input is appreciated.
    
    regards,
    Chad
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Joachim Saul
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-14 15:59:34
    
    Hi Chad,
    
    Chad Trabant wrote on 08/12/2014 11:41 PM:
    
    No argument that the padding spaces should be left behind.
    
    Good.
    
    Assuming we are unwilling to actually address the blank location IDs we
    should consider making the attribute optional. It has been suggested by
    a few folks already.
    
    The "empty" location code format does need to be addressed, because even
    if an attribute is optional, it may be present and it may represent an
    empty location code.
    
    Making the location code optional is of course possible. Actually in
    QuakeML it is optional, too. There the reason and semantics are special,
    though, because in parametric data like picks the location code *may* be
    unknown indeed (same for the channel code).
    
    By contrast, in SEED or StationXML the location code is not unknown.
    
    One could argue that from a purists' point of view a blank location in
    SEED is an unset location, so the purist mapping of this is to leave it
    unset in XML.
    
    A location code that is encoded in SEED as two spaces is not unset but
    "empty" and still a string. The purist mapping is therefore the empty
    string.
    
    That said we can make the location code optional in XML, but isn't it
    really any more than merely a cosmetic trick to hide an "ugly", empty
    location code in the XML?
    
    Conforming clients need to be prepared to receive an "empty" (not
    unset!) value anyway. In other words, neither of
    
    <Channel locationCode="" ...
    <Channel locationCode=" " ...
    <Channel locationCode="--" ...
    
    is forbidden by just making locationCode optional. You can leave it
    unset but you don't have to. A parser still has to accept at least ""
    and " " (in fact " ", too, as we have just learned). And since
    "explicit is better than implicit" it seems wise not to make the
    location code optional. Though technically it is feasible provided there
    is a clear default value, as otherwise a missing location code might (at
    least in principle) be mistaken as unknown like in QuakeML.
    
    This would be simply done by changing the schema to make
    the attribute optional. (this is not my favorite idea, I've argued
    against it, but at least it is cleanly follows SEED common practice).
    
    I would prefer to make optional only those fields, for which information
    may indeed be unknown, like a digitizer serial number.
    
    Allowing the value to be optional in SEED (within the limitations of a
    fixed width format) and required in StationXML is trying to eat your
    cake and keep it too. We do not force other optional string values of
    SEED to be required in the XML, so why make an exception for location?
    
    In SEED, the location code is always present even if it is an empty
    string. It's not optional.
    
    But even "optional" still implies the option to explicitly specify an
    "empty" location code. And the question of how to properly represent
    this in XML is still an open one (even though opinions seem to converge
    towards the empty string).
    
    Yazan Suleiman wrote on 08/13/2014 06:17 AM:
    
    In my opinion StationXml shouldn’t force me to provide a value for
    an attribute that is unknown to me.
    
    But empty is not the same as unknown. The "empty" location code in SEED
    does carry a specific information that needs to be represented in XML,
    whether we like the value or not.
    
    While the purpose of StationXml schema is to map between SEED and
    XML, limitations of Seed shouldn’t be carried over.
    
    +1
    
    Cheers
    Joachim
- Chad Trabant
  
  Re: A question of location ID, how to represent empty IDs in XML?
  
  2014-08-12 21:31:33
  
  Hi Doug,
  
  Thanks for your 2 cents.
  
  Regarding only certain software being the problem with blank location, I guess you did not like any of the others pointed out here?
  http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
  
  If you want a non-Oracle database example, the 'ltree' data type in Postgres is a natural fit for N.S.L.C hierarchal data and it cannot take a blank identifier either. I do not see how the number of pain points with empty identifiers will not grow over time.
  
  The proposal for a "--" location ID was to change SEED by starting with StationXML as a transition. The first step could be done without changing all the miniSEED in all the archives, the next step could be done with a future revision in miniSEED. This would required mapping, which we are already doing for requests and will continue to do indefinitely. For sure this would be non-trivial change over time, the question is whether it is worth it or not.
  
  If we are going to continue to shoot ourselves in the foot with unset location IDs let's do so with clear eyes, the problems are not limited to esoteric software or use cases. Also, a blank string is not the only choice, more on that next.
  
  Chad
  
  PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
  
  On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
  
  I've been following this thread, and thought it was time to chime in.
  
  IMHO, the FDSN web services should follow the SEED convention.
  The SEED convention states that station, network, channel, and location
  are all blank-padded fields of fixed lengths.
  To me, this means that that we should either use the full blank-padded
  fields for ALL of these identifiers, or for none of them.
  
  eg:
  
  <Network code="G " >
  <Station code="KIP ">
  <Channel locationCode=" " code="BHZ">
  
  or
  
  <Network code="G" >
  <Station code="KIP">
  <Channel locationCode="" code="BHZ">
  
  Personally I think the latter (blank trimmed) is better.
  
  I agree that the blank location code is a pain when dealing with
  Oracle, white-space delimited fields such as command lines, etc,
  but unless we change the SEED convention, I don't see that making
  an aliases of "-" or "--" in FDSN station XML improves the situation.
  
  AFAIK, the ONLY reason that we struggle with the two-blank issue is
  that certain software (eg Oracle) cannot distinguish between the
  the empty string (string of length 0) and NULL. Therefore, the DMC,
  NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
  for the location code.
  
  Unless we propose to change the SEED standard, all of our data in
  our archives, and all of our current acquisition systems, I think
  that we have to live with "emtpy" location codes.
  
  I have not seen any compelling argument for representing a blank (empty)
  location code in FDSN station XML as anything but the empty string.
  
  If you want to have "" and " " be equivalent in FDSN station XML,
  you can simply change the schema definition of the field to be a "token"
  rather than a "string", in which case any representation with blanks will
  be reduced to the empty string. Problem solved?
  
  I note that the NCEDC implementation currently uses 1 blank " "
  for empty location code. I have no problem changing this if we can
  agree on a convention.
  
  I also note ironically that the TA network run by IRIS is one of the
  largest networks in terms of stations, and uses blank location codes.
  
  My 2 cents...
  
  - Doug N
  
  On 07/23/2014 10:30 AM, Chad Trabant wrote:
  
  Hello WS users and developers,
  
  A recent discussion between FDSN data centers is centered on
  representation of empty location IDs in StationXML, the default
  format returned by the fdsnws-station web service. The DMC may be
  changing how it represents location ID in XML and text formats based
  on these discussions. We are asking for input as any such change will
  effect users of our metadata service.
  
  Some background: In the SEED channel naming scheme there is a
  hierarchy of network, station, location and channel identifiers. Of
  these, it is only the location ID that is commonly accepted to be
  empty. In the SEED format the location ID is a two-character field,
  where the value is left justified and padded with spaces if needed.
  When the value is empty the field is simply two spaces of padding.
  
  Historically, and presumably to avoid having an empty location ID,
  the DMC has represented “empty” location IDs as a string of two
  spaces. Following this practice, we express this in StationXML by
  setting the locationCode attribute to a string of two spaces. We have
  done this so long we sometimes forget that it is not compliant with a
  strict reading of SEED, at best it falls into the vagaries of SEED,
  on the other hand we have been doing it for years with no apparent
  problems (in fact it has helpfully avoided an empty core
  identifier).
  
  There now exists another fdsnws-station implementation that returns
  StationXML with the locationCode attribute set to an empty string
  when the SEED value is empty. The justification is that this follows
  the SEED rules of trimming the padding spaces from the values.
  
  Unfortunately this means there are now flavors of StationXML that are
  incompatible in the core channel name identifiers. In other words,
  two StationXML documents for the same SEED channel appear, without
  extra field translation, to be different channels.
  
  As most of you are users of SEED and StationXML metadata (at some
  level) and some of you have written code to parse these formats and
  manage the data returned by the DMC and other FDSN data centers, we
  are asking for your input regarding the potential solutions.
  
  Here are the options being considered for mapping an empty location
  ID in SEED to StationXML:
  
  1) Set locationCode to two spaces. While the DMC and users have been
  using this for a long while, it is not precisely the SEED value (but
  the mapping could be formalized). Also, whitespace in attributes does
  have some theoretical challenges: the wonky rules for XML attributes
  related to whitespace handling require removal of spaces in some
  cases (we have never heard of problems though).
  
  2) Set locationCode to an empty string. This would match the strict
  value present in SEED, an empty identifier.
  
  3) Set locationCode to “--“ (two dashes). This avoids issues with
  whitespace in XML attribute values and avoids issues with an empty
  identifier. Also, this matches the request mechanisms where “--“ is
  accepted as a synonym for an empty location ID.
  
  All of these solutions are viable in that we can make them work in
  code, it is a matter of choosing one for future FDSN metadata, pick
  your poison so to speak.
  
  In my personal opinion, an empty location ID is an unfortunate quirk
  of SEED that we should rectify in StationXML. An empty identifier can
  be confused for “unknown” if the programmer is not careful, which is
  semantically very different than “set to empty”. The two-space
  strings that the DMC is currently using are also not ideal, they are
  hard for humans to read and potentially weird with XML rules. The
  dashed location ID avoids these issues but requires the most change.
  I also think requiring all readers of StationXML to translate (e.g.
  remove padding) is a bad idea, the values in SEED should be uniquely
  mapped to values in StationXML.
  
  Thanks for reading this far. Your opinion and input is appreciated.
  
  regards,
  Chad
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  
  --
  ------------------------------------------------------------------------
  Doug Neuhauser University of California, Berkeley
  doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
  Office: 510-642-0931 215 McCone Hall # 4760
  Fax: 510-643-5811 Berkeley, CA 94720-4760
  Remote: 530-752-5615 (Wed,Fri)
  
  _______________________________________________
  webservices mailing list
  webservices<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/webservices
  - Doug Neuhauser
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-12 23:12:03
    
    On 08/12/2014 02:31 PM, Chad Trabant wrote:
    
    Hi Doug,
    
    Thanks for your 2 cents.
    
    Regarding only certain software being the problem with blank
    location, I guess you did not like any of the others pointed out
    here?
    http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
    
    Most of these arguments are not related directly to stationxml, but to the
    empty location code. However, those that are related to empty location code
    appear to be the inability to distinguish between an attribute that is not
    supplied vs an empty string attribute. If you make the LocationCode optional,
    it seems like you are in the same boat. If it is not specified, what do
    you use for location code? blank-blank? Then that is the same logic you
    use if your query does not return a location code.
    
    Your example of:
    %{net}{sta}{loc}{chan} = "some lvalue"
    is not a good one, because of no separation between components.
    How do you distinguish between
    net = G, sta = ABCD
    and net = GA, sta = BCD?
    
    If you want a non-Oracle database example, the 'ltree' data type in
    Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
    take a blank identifier either. I do not see how the number of pain
    points with empty identifiers will not grow over time.
    
    The proposal for a "--" location ID was to change SEED by starting
    with StationXML as a transition. The first step could be done without
    changing all the miniSEED in all the archives, the next step could be
    done with a future revision in miniSEED. This would required mapping,
    which we are already doing for requests and will continue to do
    indefinitely. For sure this would be non-trivial change over time,
    the question is whether it is worth it or not.
    
    I don't see anything in your original proposal about changing SEED.
    I only see a proposal to change the SEED representation in StationXML.
    
    If we are going to continue to shoot ourselves in the foot with unset
    location IDs let's do so with clear eyes, the problems are not
    limited to esoteric software or use cases. Also, a blank string is
    not the only choice, more on that next.
    
    I agree with the above statement. However, trying to address the issue
    just within StationXML I think is just another bandaid, and I don't see
    why the StationXML needs this bandaid.
    
    Since StationXML does not appear to need this bandaid, I don't understand
    the need.
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    - Doug N
    
    Chad
    
    PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
    
    On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
    
    I've been following this thread, and thought it was time to chime in.
    
    IMHO, the FDSN web services should follow the SEED convention.
    The SEED convention states that station, network, channel, and location
    are all blank-padded fields of fixed lengths.
    To me, this means that that we should either use the full blank-padded
    fields for ALL of these identifiers, or for none of them.
    
    eg:
    
    <Network code="G " >
    <Station code="KIP ">
    <Channel locationCode=" " code="BHZ">
    
    or
    
    <Network code="G" >
    <Station code="KIP">
    <Channel locationCode="" code="BHZ">
    
    Personally I think the latter (blank trimmed) is better.
    
    I agree that the blank location code is a pain when dealing with
    Oracle, white-space delimited fields such as command lines, etc,
    but unless we change the SEED convention, I don't see that making
    an aliases of "-" or "--" in FDSN station XML improves the situation.
    
    AFAIK, the ONLY reason that we struggle with the two-blank issue is
    that certain software (eg Oracle) cannot distinguish between the
    the empty string (string of length 0) and NULL. Therefore, the DMC,
    NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
    for the location code.
    
    Unless we propose to change the SEED standard, all of our data in
    our archives, and all of our current acquisition systems, I think
    that we have to live with "emtpy" location codes.
    
    I have not seen any compelling argument for representing a blank (empty)
    location code in FDSN station XML as anything but the empty string.
    
    If you want to have "" and " " be equivalent in FDSN station XML,
    you can simply change the schema definition of the field to be a "token"
    rather than a "string", in which case any representation with blanks will
    be reduced to the empty string. Problem solved?
    
    I note that the NCEDC implementation currently uses 1 blank " "
    for empty location code. I have no problem changing this if we can
    agree on a convention.
    
    I also note ironically that the TA network run by IRIS is one of the
    largest networks in terms of stations, and uses blank location codes.
    
    My 2 cents...
    
    - Doug N
    
    On 07/23/2014 10:30 AM, Chad Trabant wrote:
    
    Hello WS users and developers,
    
    A recent discussion between FDSN data centers is centered on
    representation of empty location IDs in StationXML, the default
    format returned by the fdsnws-station web service. The DMC may be
    changing how it represents location ID in XML and text formats based
    on these discussions. We are asking for input as any such change will
    effect users of our metadata service.
    
    Some background: In the SEED channel naming scheme there is a
    hierarchy of network, station, location and channel identifiers. Of
    these, it is only the location ID that is commonly accepted to be
    empty. In the SEED format the location ID is a two-character field,
    where the value is left justified and padded with spaces if needed.
    When the value is empty the field is simply two spaces of padding.
    
    Historically, and presumably to avoid having an empty location ID,
    the DMC has represented “empty” location IDs as a string of two
    spaces. Following this practice, we express this in StationXML by
    setting the locationCode attribute to a string of two spaces. We have
    done this so long we sometimes forget that it is not compliant with a
    strict reading of SEED, at best it falls into the vagaries of SEED,
    on the other hand we have been doing it for years with no apparent
    problems (in fact it has helpfully avoided an empty core
    identifier).
    
    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string
    when the SEED value is empty. The justification is that this follows
    the SEED rules of trimming the padding spaces from the values.
    
    Unfortunately this means there are now flavors of StationXML that are
    incompatible in the core channel name identifiers. In other words,
    two StationXML documents for the same SEED channel appear, without
    extra field translation, to be different channels.
    
    As most of you are users of SEED and StationXML metadata (at some
    level) and some of you have written code to parse these formats and
    manage the data returned by the DMC and other FDSN data centers, we
    are asking for your input regarding the potential solutions.
    
    Here are the options being considered for mapping an empty location
    ID in SEED to StationXML:
    
    1) Set locationCode to two spaces. While the DMC and users have been
    using this for a long while, it is not precisely the SEED value (but
    the mapping could be formalized). Also, whitespace in attributes does
    have some theoretical challenges: the wonky rules for XML attributes
    related to whitespace handling require removal of spaces in some
    cases (we have never heard of problems though).
    
    2) Set locationCode to an empty string. This would match the strict
    value present in SEED, an empty identifier.
    
    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.
    
    All of these solutions are viable in that we can make them work in
    code, it is a matter of choosing one for future FDSN metadata, pick
    your poison so to speak.
    
    In my personal opinion, an empty location ID is an unfortunate quirk
    of SEED that we should rectify in StationXML. An empty identifier can
    be confused for “unknown” if the programmer is not careful, which is
    semantically very different than “set to empty”. The two-space
    strings that the DMC is currently using are also not ideal, they are
    hard for humans to read and potentially weird with XML rules. The
    dashed location ID avoids these issues but requires the most change.
    I also think requiring all readers of StationXML to translate (e.g.
    remove padding) is a bad idea, the values in SEED should be uniquely
    mapped to values in StationXML.
    
    Thanks for reading this far. Your opinion and input is appreciated.
    
    regards,
    Chad
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-13 00:45:22
    
    On Aug 12, 2014, at 4:12 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
    
    On 08/12/2014 02:31 PM, Chad Trabant wrote:
    
    Hi Doug,
    
    Thanks for your 2 cents.
    
    Regarding only certain software being the problem with blank
    location, I guess you did not like any of the others pointed out
    here?
    http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
    
    Most of these arguments are not related directly to stationxml, but to the
    empty location code. However, those that are related to empty location code
    appear to be the inability to distinguish between an attribute that is not
    supplied vs an empty string attribute. If you make the LocationCode optional,
    it seems like you are in the same boat. If it is not specified, what do
    you use for location code? blank-blank? Then that is the same logic you
    use if your query does not return a location code.
    
    I completely agree, mostly the same boat. The points were: a) empty IDs have challenges that are not limited to any esoteric software and b) if an value is unset why represent it with anything at all, playing devils advocate: why is location special?
    
    Your example of:
    %{net}{sta}{loc}{chan} = "some lvalue"
    is not a good one, because of no separation between components.
    How do you distinguish between
    net = G, sta = ABCD
    and net = GA, sta = BCD?
    
    Those are completely distinct values in a nested hash (they are not a concatenated string). {G}{ABCD} is a different path than {GA}{BCD}.
    
    If you want a non-Oracle database example, the 'ltree' data type in
    Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
    take a blank identifier either. I do not see how the number of pain
    points with empty identifiers will not grow over time.
    
    The proposal for a "--" location ID was to change SEED by starting
    with StationXML as a transition. The first step could be done without
    changing all the miniSEED in all the archives, the next step could be
    done with a future revision in miniSEED. This would required mapping,
    which we are already doing for requests and will continue to do
    indefinitely. For sure this would be non-trivial change over time,
    the question is whether it is worth it or not.
    
    I don't see anything in your original proposal about changing SEED.
    I only see a proposal to change the SEED representation in StationXML.
    
    Indeed the 2nd step was not explicitly described, it would be another proposal.
    
    If we are going to continue to shoot ourselves in the foot with unset
    location IDs let's do so with clear eyes, the problems are not
    limited to esoteric software or use cases. Also, a blank string is
    not the only choice, more on that next.
    
    I agree with the above statement. However, trying to address the issue
    just within StationXML I think is just another bandaid, and I don't see
    why the StationXML needs this bandaid.
    
    Well, the "bandaid" would be a first step away from empty location IDs. You agree they are problematic but the solution is not radical enough? You prefer all the changes proposed at once, fair enough.
    
    Since StationXML does not appear to need this bandaid, I don't understand
    the need.
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    It depends on what you mean by SEED. StationXML IS SEED for most intents and purposes. Changing all aspects of SEED at once is a much larger can of worms, and this would be an opportune time to change just the StationXML representation of SEED. Over the next couple of months and years as folks convert from dataless SEED to StationXML, an opportunity exists to make such low level changes, which will get much harder once the adoption is farther along.
    
    Chad
    
    - Doug N
    
    Chad
    
    PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
    
    On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
    
    I've been following this thread, and thought it was time to chime in.
    
    IMHO, the FDSN web services should follow the SEED convention.
    The SEED convention states that station, network, channel, and location
    are all blank-padded fields of fixed lengths.
    To me, this means that that we should either use the full blank-padded
    fields for ALL of these identifiers, or for none of them.
    
    eg:
    
    <Network code="G " >
    <Station code="KIP ">
    <Channel locationCode=" " code="BHZ">
    
    or
    
    <Network code="G" >
    <Station code="KIP">
    <Channel locationCode="" code="BHZ">
    
    Personally I think the latter (blank trimmed) is better.
    
    I agree that the blank location code is a pain when dealing with
    Oracle, white-space delimited fields such as command lines, etc,
    but unless we change the SEED convention, I don't see that making
    an aliases of "-" or "--" in FDSN station XML improves the situation.
    
    AFAIK, the ONLY reason that we struggle with the two-blank issue is
    that certain software (eg Oracle) cannot distinguish between the
    the empty string (string of length 0) and NULL. Therefore, the DMC,
    NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
    for the location code.
    
    Unless we propose to change the SEED standard, all of our data in
    our archives, and all of our current acquisition systems, I think
    that we have to live with "emtpy" location codes.
    
    I have not seen any compelling argument for representing a blank (empty)
    location code in FDSN station XML as anything but the empty string.
    
    If you want to have "" and " " be equivalent in FDSN station XML,
    you can simply change the schema definition of the field to be a "token"
    rather than a "string", in which case any representation with blanks will
    be reduced to the empty string. Problem solved?
    
    I note that the NCEDC implementation currently uses 1 blank " "
    for empty location code. I have no problem changing this if we can
    agree on a convention.
    
    I also note ironically that the TA network run by IRIS is one of the
    largest networks in terms of stations, and uses blank location codes.
    
    My 2 cents...
    
    - Doug N
    
    On 07/23/2014 10:30 AM, Chad Trabant wrote:
    
    Hello WS users and developers,
    
    A recent discussion between FDSN data centers is centered on
    representation of empty location IDs in StationXML, the default
    format returned by the fdsnws-station web service. The DMC may be
    changing how it represents location ID in XML and text formats based
    on these discussions. We are asking for input as any such change will
    effect users of our metadata service.
    
    Some background: In the SEED channel naming scheme there is a
    hierarchy of network, station, location and channel identifiers. Of
    these, it is only the location ID that is commonly accepted to be
    empty. In the SEED format the location ID is a two-character field,
    where the value is left justified and padded with spaces if needed.
    When the value is empty the field is simply two spaces of padding.
    
    Historically, and presumably to avoid having an empty location ID,
    the DMC has represented “empty” location IDs as a string of two
    spaces. Following this practice, we express this in StationXML by
    setting the locationCode attribute to a string of two spaces. We have
    done this so long we sometimes forget that it is not compliant with a
    strict reading of SEED, at best it falls into the vagaries of SEED,
    on the other hand we have been doing it for years with no apparent
    problems (in fact it has helpfully avoided an empty core
    identifier).
    
    There now exists another fdsnws-station implementation that returns
    StationXML with the locationCode attribute set to an empty string
    when the SEED value is empty. The justification is that this follows
    the SEED rules of trimming the padding spaces from the values.
    
    Unfortunately this means there are now flavors of StationXML that are
    incompatible in the core channel name identifiers. In other words,
    two StationXML documents for the same SEED channel appear, without
    extra field translation, to be different channels.
    
    As most of you are users of SEED and StationXML metadata (at some
    level) and some of you have written code to parse these formats and
    manage the data returned by the DMC and other FDSN data centers, we
    are asking for your input regarding the potential solutions.
    
    Here are the options being considered for mapping an empty location
    ID in SEED to StationXML:
    
    1) Set locationCode to two spaces. While the DMC and users have been
    using this for a long while, it is not precisely the SEED value (but
    the mapping could be formalized). Also, whitespace in attributes does
    have some theoretical challenges: the wonky rules for XML attributes
    related to whitespace handling require removal of spaces in some
    cases (we have never heard of problems though).
    
    2) Set locationCode to an empty string. This would match the strict
    value present in SEED, an empty identifier.
    
    3) Set locationCode to “--“ (two dashes). This avoids issues with
    whitespace in XML attribute values and avoids issues with an empty
    identifier. Also, this matches the request mechanisms where “--“ is
    accepted as a synonym for an empty location ID.
    
    All of these solutions are viable in that we can make them work in
    code, it is a matter of choosing one for future FDSN metadata, pick
    your poison so to speak.
    
    In my personal opinion, an empty location ID is an unfortunate quirk
    of SEED that we should rectify in StationXML. An empty identifier can
    be confused for “unknown” if the programmer is not careful, which is
    semantically very different than “set to empty”. The two-space
    strings that the DMC is currently using are also not ideal, they are
    hard for humans to read and potentially weird with XML rules. The
    dashed location ID avoids these issues but requires the most change.
    I also think requiring all readers of StationXML to translate (e.g.
    remove padding) is a bad idea, the values in SEED should be uniquely
    mapped to values in StationXML.
    
    Thanks for reading this far. Your opinion and input is appreciated.
    
    regards,
    Chad
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    --
    ------------------------------------------------------------------------
    Doug Neuhauser University of California, Berkeley
    doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
    Office: 510-642-0931 215 McCone Hall # 4760
    Fax: 510-643-5811 Berkeley, CA 94720-4760
    Remote: 530-752-5615 (Wed,Fri)
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Philip Crotwell
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-13 03:50:25
    
    On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
    <doug<at>seismo.berkeley.edu> wrote:
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    - Doug N
    
    OK, I would like to offer a proposal to change to SEED to eliminate
    blank location ids.
    
    Wait, can I do that?
    :)
    
    Philip
    
    Yazan Suleiman
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-13 04:17:19
    
    Seed represents unidentified locations as “ “ [2 empty spaces] (This is a
    limitation of SEED), while most (if not all) modern languages represent
    such thing as NULL. NULL does not equal “” or “ “ or “--“. NULL is NULL
    and should be represented as such. While SEED is limited, XML is not. Why
    should we incorporate Seed limitations into any new XML schema.
    
    Unidentified attributes (values=unidentified or not provided or null) are
    omitted in XML. When an attribute does not appear in the document, then it
    is NULL (no confusion there).
    
    In my opinion StationXml shouldn’t force me to provide a value for an
    attribute that is unknown to me. While the purpose of StationXml schema is
    to map between SEED and XML, limitations of Seed shouldn’t be carried over.
    
    For historical data, software should take care of any conversion needed.
    What you store in your database is outside the scope of StationXml.
    
    On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
    wrote:
    
    On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
    <doug<at>seismo.berkeley.edu> wrote:
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    - Doug N
    
    OK, I would like to offer a proposal to change to SEED to eliminate
    blank location ids.
    
    Wait, can I do that?
    :)
    
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Ellen Yu
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-13 16:50:27
    
    Hear hear Yazan!
    
    Ellen
    
    On Tue, Aug 12, 2014 at 9:17 PM, Yazan Suleiman <yazan.suleiman<at>gmail.com>
    wrote:
    
    Seed represents unidentified locations as " " [2 empty spaces] (This is a
    limitation of SEED), while most (if not all) modern languages represent
    such thing as NULL. NULL does not equal "" or " " or "--". NULL is NULL
    and should be represented as such. While SEED is limited, XML is not. Why
    should we incorporate Seed limitations into any new XML schema.
    
    Unidentified attributes (values=unidentified or not provided or null) are
    omitted in XML. When an attribute does not appear in the document, then it
    is NULL (no confusion there).
    
    In my opinion StationXml shouldn't force me to provide a value for an
    attribute that is unknown to me. While the purpose of StationXml schema is
    to map between SEED and XML, limitations of Seed shouldn't be carried over.
    
    For historical data, software should take care of any conversion needed.
    What you store in your database is outside the scope of StationXml.
    
    On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
    wrote:
    
    On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
    <doug<at>seismo.berkeley.edu> wrote:
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    - Doug N
    
    OK, I would like to offer a proposal to change to SEED to eliminate
    blank location ids.
    
    Wait, can I do that?
    :)
    
    Philip
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices
    
    Chad Trabant
    
    Re: A question of location ID, how to represent empty IDs in XML?
    
    2014-08-13 22:12:29
    
    On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
    <doug<at>seismo.berkeley.edu> wrote:
    
    IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
    
    - Doug N
    
    Hi all,
    
    In fact changing SEED is the ultimate goal, and while the proposal was to start that process with StationXML, Doug is correct that the core issue is with the SEED rules themselves. With that, it's probably time to move any SEED-changing discussion to the FDSN mailing lists. Thank you to the users that voiced your opinion, you are welcome to continue to chime in with thoughts here if you would like.
    
    The idea of changing the schema type for locationCode to a token is appealing if that will help make the now 3 flavors of locationCode more compatible to XML-parsing clients. We should discuss this more in the FDSN context, it might buy us more time to discuss and address the lower level issue. Unfortunately it does not address the text output used by many.
    
    Chad

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: A question of location ID, how to represent empty IDs in XML?

Connect