Hello WS users and developers,
A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
-
Philip Crotwell2014-07-23 21:37:09Hi
Years ago we had full SEED. Then because of keeping metadata updated,
we switched to a separation into dataless SEED + miniseed. Now,
because of the complexities and limitations of dataless SEED, the
future looks like StationXML + miniseed. I am all for this change, but
how the location id is resolved really needs to address not just what
do we do in StationXML, but what do we do in StationXML + miniseed.
I also lean towards "--" for the simple reason that there are so many
instances where I have been bitten by spaces or nulls. Even though I
know about this, I still get caught. File names, urls, user gui
displays, etc all have problems with spaces nor nulls and as a
practical matter it is harder to see something that isn't there than
something that is there. Furthermore, using null or space-space is
really hard as a command line argument in the shell. That said, "--"
already means "long option name" in many *nix programs, so if we were
starting from scratch, underscores like "__" might be a better choice.
The SEED manual already lists underscore as a separate item in the
flags section (p32), so maybe worth considering.
But if option 3 is choosen, would there be any possibility of amending
the SEED spec so that "--" is actually valid within the location id
field, with the caveat that it is synonymous with space-space/null,
but "--" is the preferred value? I realize that doing a global search
and replace on a petabyte of miniseed data is probably not going to
happen, but it would be really nice if whatever location id is in
StationXML, it is exactly 2 characters and is the exact same 2
characters as in miniseed.
Frankly the whole idea of making location ids "optional" was a real
mistake IMHO. I am sure that anyone that has every written code to
deal with location ids has something that looks like:
if (locid == null or locid == "" or locid == " " or locid == "--")
then locid = "--"
which is just a painfully stupid thing to have to do over and over and
over again. Grumble grumble grumble. :(
Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
station or channel codes, so addressing that at the same time might be
wise.
My $0.02, please pick one string, and only one string, and use it everywhere.
thanks
Philip
On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hello WS users and developers,
A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
On Jul 23, 2014, at 11:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Hi Philip, thanks for your thoughts.
Years ago we had full SEED. Then because of keeping metadata updated,
we switched to a separation into dataless SEED + miniseed. Now,
because of the complexities and limitations of dataless SEED, the
future looks like StationXML + miniseed. I am all for this change, but
how the location id is resolved really needs to address not just what
do we do in StationXML, but what do we do in StationXML + miniseed.
I also lean towards "--" for the simple reason that there are so many
instances where I have been bitten by spaces or nulls. Even though I
know about this, I still get caught. File names, urls, user gui
displays, etc all have problems with spaces nor nulls and as a
practical matter it is harder to see something that isn't there than
something that is there. Furthermore, using null or space-space is
really hard as a command line argument in the shell. That said, "--"
already means "long option name" in many *nix programs, so if we were
starting from scratch, underscores like "__" might be a better choice.
The SEED manual already lists underscore as a separate item in the
flags section (p32), so maybe worth considering.
The underscore character is certainly another option. What I do not like about it is low readability, in particular in URLs they can become completely lost.
But if option 3 is choosen, would there be any possibility of amending
If the FDSN were to go the route of "--" in StationXML it seems natural to extend the conversation to potential changes in SEED headers and data records. That is just a bigger can of worms and would take more time to address. The idea the two should be treated synonymously is just what I have in mind and would allow us to transition over time.
the SEED spec so that "--" is actually valid within the location id
field, with the caveat that it is synonymous with space-space/null,
but "--" is the preferred value? I realize that doing a global search
and replace on a petabyte of miniseed data is probably not going to
happen, but it would be really nice if whatever location id is in
StationXML, it is exactly 2 characters and is the exact same 2
characters as in miniseed.
Frankly the whole idea of making location ids "optional" was a real
Indeed, this should be clarified in the SEED spec.
mistake IMHO. I am sure that anyone that has every written code to
deal with location ids has something that looks like:
if (locid == null or locid == "" or locid == " " or locid == "--")
then locid = "--"
which is just a painfully stupid thing to have to do over and over and
over again. Grumble grumble grumble. :(
Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
station or channel codes, so addressing that at the same time might be
wise.
Chad
My $0.02, please pick one string, and only one string, and use it everywhere.
thanks
Philip
On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hello WS users and developers,
_______________________________________________
A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
Yazan Suleiman2014-07-24 16:29:55Is modifying stationxml schema (to allow null location, required=false) a
possibility? example:
<Channel startDate="1992-09-23T00:00:00" restrictedStatus="open"
endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="
open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode="--" startDate="1992-09-23T00:00:00"
restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
It is very reasonable to have a null value for location in any object
representation of station schema. " " or "" is inaccurate and only
introduces more trouble and complexity.
If changing the schema is not an option then " " or "" is a very bad idea.
Many parsers treat "" or " " as empty and will ignore them. If
translating this into SEED is the issue, then it is the convertor
responsibility to take care of the conversion.
Yazan
On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu>
wrote:
Hello WS users and developers,
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default format
returned by the fdsnws-station web service. The DMC may be changing how it
represents location ID in XML and text formats based on these discussions.
We are asking for input as any such change will effect users of our
metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of
network, station, location and channel identifiers. Of these, it is only
the location ID that is commonly accepted to be empty. In the SEED format
the location ID is a two-character field, where the value is left justified
and padded with spaces if needed. When the value is empty the field is
simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC
has represented “empty” location IDs as a string of two spaces. Following
this practice, we express this in StationXML by setting the locationCode
attribute to a string of two spaces. We have done this so long we
sometimes forget that it is not compliant with a strict reading of SEED, at
best it falls into the vagaries of SEED, on the other hand we have been
doing it for years with no apparent problems (in fact it has helpfully
avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string when the
SEED value is empty. The justification is that this follows the SEED rules
of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words, two
StationXML documents for the same SEED channel appear, without extra field
translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level)
and some of you have written code to parse these formats and manage the
data returned by the DMC and other FDSN data centers, we are asking for
your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in
SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but the
mapping could be formalized). Also, whitespace in attributes does have
some theoretical challenges: the wonky rules for XML attributes related to
whitespace handling require removal of spaces in some cases (we have never
heard of problems though).
2) Set locationCode to an empty string. This would match the strict value
present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code,
it is a matter of choosing one for future FDSN metadata, pick your poison
so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of
SEED that we should rectify in StationXML. An empty identifier can be
confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space strings
that the DMC is currently using are also not ideal, they are hard for
humans to read and potentially weird with XML rules. The dashed location
ID avoids these issues but requires the most change. I also think
requiring all readers of StationXML to translate (e.g. remove padding) is a
bad idea, the values in SEED should be uniquely mapped to values in
StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi WS folks,
For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python (my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
"Explicit is better than implicit."
Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.
Just my $0.02.
- Rob Newman, IRIS DMC
On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:
Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
<Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.
If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.
Yazan
On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hello WS users and developers,
A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
-
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote from the Zen of Python http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html (my language of choice):
I'd add "Compatible is better than incompatible." :)
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
My favorite would be:
"Explicit is better than implicit."
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data
format cannot accommodate existing channel naming, then the new format
is flawed. But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an
empty string and a parser has to produce it as such. Not as null, nil or
none, but as an empty string. Otherwise the parser is broken and needs
to be fixed, not the data!
Again: It's not about beauty. We all agree that current channel naming
is not particularly beautiful and has limitations. But our business is
not to try to solve that issue now and here.
Cheers
Joachim
-
Philip Crotwell2014-07-25 16:35:45It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
I'd add "Compatible is better than incompatible." :)
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
My favorite would be:
"Explicit is better than implicit."
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
But what if we simply *mean* "empty string"?
you code. It's easier for everyone.
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Marcelo Bianchi2014-07-26 06:38:17Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
_______________________________________________
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
I'd add "Compatible is better than incompatible." :)
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
My favorite would be:
"Explicit is better than implicit."
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
But what if we simply *mean* "empty string"?
you code. It's easier for everyone.
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.
I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.
If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.
If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
_______________________________________________
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
_______________________________________________
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
I'd add "Compatible is better than incompatible." :)
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
My favorite would be:
"Explicit is better than implicit."
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
But what if we simply *mean* "empty string"?
you code. It's easier for everyone.
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has
1-N spaces is the same thing - there are often implicit or explicit
trim() function hiding in a processing pipeline. A null string is not
the same. So an empty or blank string is the same, valid location code,
and null is undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not
needed because sometimes it is not possible or difficult to represent an
empty string or even a string? For example on the command line or in a
restful WS URI? (Or a URI on the command line!) So it may be that the
use of "--" for intermediate processing and requests could be tolerated
and somehow official, while empty or only-blanks strings official and
for persistent data.
Just my 0.02EUR = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
--
Thanks for your thoughts as well. Something that you and Joachim are not addressing are the concerns about an empty ID that have been brought up by more than one person. The answer that empty strings are technically possible and it all works in Python/SeisComP is less than satisfying. The observations from Python, ObsPy and SeisComP are a few of many that need to be taken into account.
I agree that there is a long tail consideration for the "--" location ID solution. Understand that some folks find an empty ID to be problematic regardless of whether it is XML, SEED, text, whatever, then you might see where this proposal comes from. Yes, we would need to treat empty location IDs and "--" as synonyms for a very long time. Empty strings in XML mean you will need to map empty IDs to empty strings, NULL and whatever an XML parser might or might not produce for a long time as well (think beyond Python and SeisComP). Either is possible, only one of them is a unique mapping.
If the main considerations are for the least amount of disruption the the answer is obvious to me: the FDSN can sanction that the two-space string is the XML synonym for the empty SEED location ID and we adjust the schema to make sure a string of whitespaces is preserved. Then SeisComP can change its relatively new StationXML implementation and ALL existing clients will be compatible with all metadata and, mostly importantly, we would have consistent metadata.
If the empty string ID representation is adopted it would would, in effect, mean that the DMC would need to change its metadata service and (more importantly) all users of the DMC's metadata service would need to transition to a new metadata channel naming scheme. This is certainly not out of the question, but it is not something we would do without careful consideration. I do not find the two-space strings all that great, but they are here and something the DMC and users of the DMC have dealt with. Issues have been identified with empty location IDs by us and our users. If DMC is going to change, and push the change on all users of the DMC's StationXML, it would be much more compelling to have a solution that addresses the low level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
_______________________________________________
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
_______________________________________________
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
I'd add "Compatible is better than incompatible." :)
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
Number 2 is especially relevant here:
My favorite would be:
"Explicit is better than implicit."
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
But what if we simply *mean* "empty string"?
you code. It's easier for everyone.
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
Sent from my iClayTablet
------------------------------------------------------------------------
*Anthony Lomax*
*161 Allée du Micocoulier, 06370 Mouans-Sartoux, France*
*tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net
<anthony<at>alomax.net> web: http://www.alomax.net
http://www.alomax.net/ *
*Twitter: * *@ALomaxNet http://twitter.com/ALomaxNet*
*Science & Special Topics: * *http://www.alomax.net/science*
*Software: * *http://www.alomax.net/software* *- updates: *
*https://twitter.com/ALomaxNet*
------------------------------------------------------------------------
-
Philip Crotwell2014-07-28 16:37:26Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Philip Crotwell2014-07-28 16:53:09One more thing is that this is not something that we can resolve based
on the XML spec as all three variations are well-formed and can be
valid XML depending on the schema.
There is another issue in that white space in xml attributes can be
normalized by the parsers, but this behavior is not standard across
all parsers, so dealing with attributes that are not limited to
non-whitespace characters means that you likely have to consider
empth, one space and two spaces, and even N spaces as all being
equivalent. Depending on the parser, you may be able to have this
handled for you, or you may have to code explicitly for the cases.
I think per the xml spec, even these two are considered "the same" as well:
locationCode="
"
locationCode="
"
as newlines in attributes can be normalized to whilespace on parsing.
But again, exactly how it is done depends on the parser.
Philip
PS I am NOT advocating we choose newline-newline as the default
location id!!! :)
On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
http://www.w3.org/TR/REC-xml/#AVNormalize
I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
Lion
On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
One more thing is that this is not something that we can resolve based
on the XML spec as all three variations are well-formed and can be
valid XML depending on the schema.
There is another issue in that white space in xml attributes can be
normalized by the parsers, but this behavior is not standard across
all parsers, so dealing with attributes that are not limited to
non-whitespace characters means that you likely have to consider
empth, one space and two spaces, and even N spaces as all being
equivalent. Depending on the parser, you may be able to have this
handled for you, or you may have to code explicitly for the cases.
I think per the xml spec, even these two are considered "the same" as well:
locationCode="
"
locationCode="
"
as newlines in attributes can be normalized to whilespace on parsing.
But again, exactly how it is done depends on the parser.
Philip
PS I am NOT advocating we choose newline-newline as the default
location id!!! :)
On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
_______________________________________________
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Philip Crotwell2014-07-28 17:48:54That spec also says:
If the attribute type is not CDATA, then the XML processor MUST
further process the normalized attribute value by discarding any
leading and trailing space (#x20) characters, and by replacing
sequences of space (#x20) characters by a single space (#x20)
character.
So, by this you should always end up with an empty string even if you
have two or more spaces. My experience with parsers is that this does
not happen, but since it is in the spec it could. You mileage may
vary...
Philip
On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
<krischer<at>geophysik.uni-muenchen.de> wrote:
The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
http://www.w3.org/TR/REC-xml/#AVNormalize
I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
Lion
On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
One more thing is that this is not something that we can resolve based
_______________________________________________
on the XML spec as all three variations are well-formed and can be
valid XML depending on the schema.
There is another issue in that white space in xml attributes can be
normalized by the parsers, but this behavior is not standard across
all parsers, so dealing with attributes that are not limited to
non-whitespace characters means that you likely have to consider
empth, one space and two spaces, and even N spaces as all being
equivalent. Depending on the parser, you may be able to have this
handled for you, or you may have to code explicitly for the cases.
I think per the xml spec, even these two are considered "the same" as well:
locationCode="
"
locationCode="
"
as newlines in attributes can be normalized to whilespace on parsing.
But again, exactly how it is done depends on the parser.
Philip
PS I am NOT advocating we choose newline-newline as the default
location id!!! :)
On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
_______________________________________________
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Well in that case the only sensible solution seems to be to use an empty string to encode an empty location.
Lion
On 28 Jul 2014, at 16:48, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
That spec also says:
If the attribute type is not CDATA, then the XML processor MUST
further process the normalized attribute value by discarding any
leading and trailing space (#x20) characters, and by replacing
sequences of space (#x20) characters by a single space (#x20)
character.
So, by this you should always end up with an empty string even if you
have two or more spaces. My experience with parsers is that this does
not happen, but since it is in the spec it could. You mileage may
vary...
Philip
On Mon, Jul 28, 2014 at 10:16 AM, Lion Krischer
<krischer<at>geophysik.uni-muenchen.de> wrote:
The spec does appear to state that all white spaces characters are converted to the same character. So it does distinguish between the number of whitespace characters but not the type.
_______________________________________________
http://www.w3.org/TR/REC-xml/#AVNormalize
I think we can just expect all XML parsers to adhere to that, otherwise an empty strings seems the safest solution.
Lion
On 28 Jul 2014, at 15:53, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
One more thing is that this is not something that we can resolve based
_______________________________________________
on the XML spec as all three variations are well-formed and can be
valid XML depending on the schema.
There is another issue in that white space in xml attributes can be
normalized by the parsers, but this behavior is not standard across
all parsers, so dealing with attributes that are not limited to
non-whitespace characters means that you likely have to consider
empth, one space and two spaces, and even N spaces as all being
equivalent. Depending on the parser, you may be able to have this
handled for you, or you may have to code explicitly for the cases.
I think per the xml spec, even these two are considered "the same" as well:
locationCode="
"
locationCode="
"
as newlines in attributes can be normalized to whilespace on parsing.
But again, exactly how it is done depends on the parser.
Philip
PS I am NOT advocating we choose newline-newline as the default
location id!!! :)
On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
_______________________________________________
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
-
-
Hi all,
leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?
The following group will match any uppercase alphanumeric two letter code and two spaces:
^([A-Z0-9]{2}| )$
It matches “AA”, “00”, “10”, “A1”, “ “ , …
but not “—“, “”, “-“, “a1”, ...
Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.
In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows. “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
Cheers!
Lion
On 28 Jul 2014, at 15:37, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
_______________________________________________
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi Lion,
Lion Krischer [07/28/2014 03:54 PM]:
^([A-Z0-9]{2}| )$
'Not ""' is a problem as "" is a valid location code according to SEED
It matches “AA”, “00”, “10”, “A1”, “ “ , …
but not “—“, “”, “-“, “a1”, ...
specification. Which is what all this is actually about. :)
In general I like the idea of using regular expressions if we use
^([A-Z0-9]{2}| |)$
Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
The most important consistency is with the SEED standard.
Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.
Cheers
Joachim
-
Philip Crotwell2014-07-28 19:03:19On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
The most important consistency is with the SEED standard.
I would argue that consistency for end users is the only thing that
matters. Consistency with the SEED spec may be a means to that end,
but if the end users do not perceive it as being consistent, it isn't
consistent.
To me, that means we need to look at the bigger picture. Ideally we
would have location ids that could be represented by exactly the same
characters in:
stationXML
miniseed
URLS
client displays
databases
and even email
in a way that is explicit, consistent and natural for the end user.
To be honest, I don't like any of the choices. If I had my way, loc
ids would have been defined as strictly two characters like
^([A-Z0-9]{2})$, and 00 would have been what you used if you didn't
care. Alas, that horse has left the barn.
Maybe not even worth $0.02... :)
Philip
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.
The most important consistency is with the SEED standard.
I would argue that consistency for end users is the only thing that
matters. Consistency with the SEED spec may be a means to that end,
but if the end users do not perceive it as being consistent, it isn't
consistent.
To me, that means we need to look at the bigger picture. Ideally we
would have location ids that could be represented by exactly the same
characters in:
stationXML
miniseed
URLS
client displays
databases
and even email
in a way that is explicit, consistent and natural for the end user.
Here are some others I would add to the list:
use in command lines
use in other data formats
Chad
-
Chad Trabant wrote on 31.07.2014 07:42:
On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
That's a pretty ambitious list considering...
On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.
The most important consistency is with the SEED standard.
I would argue that consistency for end users is the only thing that
matters. Consistency with the SEED spec may be a means to that end,
but if the end users do not perceive it as being consistent, it isn't
consistent.
To me, that means we need to look at the bigger picture. Ideally we
would have location ids that could be represented by exactly the same
characters in:
stationXML
miniseed
URLS
client displays
databases
and even email
in a way that is explicit, consistent and natural for the end user.
Here are some others I would add to the list:
use in command lines
use in other data formats
Chad Trabant wrote on 31.07.2014 06:57:
There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, [...]
We are still talking here about a metadata format, aren't we? And you want to prescribe how users shall display empty location codes in GUI displays? You must be kidding...
The issue is *not* about other data formats. It is up to every developer to save empty location codes in whatever way they like in their formats, databases, bulletins etc. That is absolutely no problem and hence doesn't require a solution.
Here the issue is about representing data in XML. Since we have a well accepted and widely implemented channel naming standard *already*, and since users are working with StationXML *already*, what we need *now* is a clarification about the proper representation of *current* channel naming in StationXML.
Joachim
-
-
-
Hi Joachim,
'Not ""' is a problem as "" is a valid location code according to SEED specification. Which is what all this is actually about. :)
The idea was to choose either “ “ or “” which both denote an empty location id. In SEED it is not possible to specify two actual spaces (and not an empty string) as the location identifier as right aligned spaces are considered padding characters and have to be removed by the processing software.
In general I like the idea of using regular expressions if we use ^([A-Z0-9]{2}| |)$
Allowing both would mean having two separate “encodings” for the same thing. I am fine with either it is just important that one is picked as the proper representation of an empty location id.
According to the SEED spec it appears that single letter location codes are also valid. Does that happen in the wild?
Cheers!
Lion
Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
The most important consistency is with the SEED standard.
Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
On Jul 28, 2014, at 6:54 AM, Lion Krischer <krischer<at>geophysik.uni-muenchen.de> wrote:
Hi all,
Hi Lion,
leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?
We should definately add the rules to the schema, we just need to decide what they are!
The following group will match any uppercase alphanumeric two letter code and two spaces:
Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
^([A-Z0-9]{2}| )$
It matches “AA”, “00”, “10”, “A1”, “ “ , …
but not “—“, “”, “-“, “a1”, ...
Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.
In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “” and “ “ should already work resulting in minimal disruption in the users’ workflows.
Has anyone observed this automatic trimming on any system?
“--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for selecting the empty SEED location IDs.
Chad
PS. here is my test data:
------- chan.xml
<FDSNStationXML schemaVersion="1.0">
<Channel locationCode=" " startDate="2012-03-12T20:28:00" restrictedStatus="open" endDate="2599-12-31T23:59:59" code="BHZ">
</Channel>
</FDSNStationXML>
-------
Here is a test with Python:
-------
from xml.etree import ElementTree
with open('chan.xml', 'rt') as f:
tree = ElementTree.parse(f)
node = tree.find('./with_attributes')
print node.tag
for name, value in sorted(node.attrib.items()):
print ' %-4s = "%s"' % (name, value)
-------
which produces:
-------
Channel
code = "BHZ"
endDate = "2599-12-31T23:59:59"
locationCode = " "
restrictedStatus = "open"
startDate = "2012-03-12T20:28:00"
-------
No trimming.
Here is a test with Perl:
-------
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $file = 'chan.xml';
my $test_data = XMLin($file);
print Dumper($test_data);
-------
which produces:
-------
$VAR1 = {
'schemaVersion' => '1.0',
'Channel' => {
'locationCode' => ' ',
'endDate' => '2599-12-31T23:59:59',
'restrictedStatus' => 'open',
'startDate' => '2012-03-12T20:28:00',
'code' => 'BHZ'
}
};
-------
No trimming.
There are many more parsing options for Perl and Python and other languages of course, but this is pretty basic stuff. It is how a user such as myself would go about parsing and using StationXML.
-
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dear all,
Maybe some stupid questions: Are there actually any valid use cases
for having to distinct between empty and unknown location code within
the data? If so, does this than also apply for network, station,
channel codes? So if the community opts to go for unknown as well as
an empty/unset markers for the location field shouldn't be the same
markers used for unknown/unset network etc.?
In terms of existing StationXML parsers I assume most are just
stripping whitespaces from the location code and thus “” and “
“ should already work resulting in minimal disruption in the
users’ workflows.
significant whitespaces and should be preserved by any XML parsers. I
guess Lion meant with StationXML parser more than the plain XML
parser. I don't know what other clients do, but ObsPy strips
internally all Net/Sta/Loc/Cha field values.
Cheers,
Robert
PS: for some reason did my previous mail sent last weekend not appear
at this list, also I didn't receive all replies to this thread as
archived in
http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
(e.g.
http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
was missing)- I didn't get any bounce or error message from mailman
either? Any idea?
- --
Dr. Robert Barsch
EGU Office Munich
Luisenstr. 37
80333 Munich
Germany
Phone: +49-89-21806565
Fax: +49-89-218017855
eMail: barsch<at>egu.eu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
=vpwD
-----END PGP SIGNATURE-----
-
On Jul 31, 2014, at 12:20 AM, Robert Barsch <barsch<at>egu.eu> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hi Robert,
Hash: SHA1
Dear all,
Maybe some stupid questions: Are there actually any valid use cases
for having to distinct between empty and unknown location code within
the data? If so, does this than also apply for network, station,
channel codes? So if the community opts to go for unknown as well as
an empty/unset markers for the location field shouldn't be the same
markers used for unknown/unset network etc.?
There is no rule in the SEED world preventing two channel names differing only by location ID, in fact it happens often. Since location can be empty it means that we can have both XX.STA.00.LHZ and XX.STA..LHZ, if location were described as "unknown" these two become ambiguous. I do not know off hand of any cases where the differences are between an empty location ID and an filled one, but it would be a weird case to eliminate (or even describe) in the specification.
In terms of existing StationXML parsers I assume most are just
stripping whitespaces from the location code and thus “” and “
“ should already work resulting in minimal disruption in the
users’ workflows.
significant whitespaces and should be preserved by any XML parsers.
Chad
I
guess Lion meant with StationXML parser more than the plain XML
parser. I don't know what other clients do, but ObsPy strips
internally all Net/Sta/Loc/Cha field values.
Cheers,
Robert
PS: for some reason did my previous mail sent last weekend not appear
at this list, also I didn't receive all replies to this thread as
archived in
http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.html
(e.g.
http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.html
was missing)- I didn't get any bounce or error message from mailman
either? Any idea?
- --
Dr. Robert Barsch
EGU Office Munich
Luisenstr. 37
80333 Munich
Germany
Phone: +49-89-21806565
Fax: +49-89-218017855
eMail: barsch<at>egu.eu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
=vpwD
-----END PGP SIGNATURE-----
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
Chad Trabant wrote on 31.07.2014 08:49:
In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus and should already work resulting in minimal disruption in the users workflows.
Actually, this does not appear to be happening, in the parsers Ive used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
Has anyone observed this automatic trimming on any system?
No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
-- would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
Joachim
-
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Chad Trabant wrote on 31.07.2014 08:49:
HI Joachim,
In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus and should already work resulting in minimal disruption in the users workflows.
Actually, this does not appear to be happening, in the parsers Ive used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
Has anyone observed this automatic trimming on any system?
No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
-- would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.
Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
Chad
-
Chad Trabant wrote on 31.07.2014 09:42:
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
Has anyone observed this automatic trimming on any system?
No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point.
Joachim
-
On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Chad Trabant wrote on 31.07.2014 09:42:
Hi Joachim,
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
Has anyone observed this automatic trimming on any system?
No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
That is a strange transition from libmseed to web service clients that I do not understand. You appear fixated on updating the clients, but as I have said many times that, by itself, will not solve the actual problem; the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
In general mappings are not the problem and are widely used anyway.
So what is the problem with mapping?
I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
Chad
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi Chad,
after a well-deserved creative break a little more feedback from Potsdam
on our favorite topic. :)
Chad Trabant wrote on 31.07.2014 10:37:
On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
In libmseed you treat the two spaces differently than some web service
Chad Trabant wrote on 31.07.2014 09:42:
Hi Joachim,
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
Has anyone observed this automatic trimming on any system?
No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
That is a strange transition from libmseed to web service clients that I do not understand.
client code. While in libmseed you trim the spaces, resulting in an
empty string, in web service clients (like FetchData) you keep the
spaces. By simply trimming them there, too, and then matching against an
empty string, you would not only maintain consistency in your
interpretation of waveform and meta data, but also be more "accepting"
in what your clients are able to process. In particular this would
enable your clients to parse strictly SEED compliant empty location
codes, which currently is not possible.
You appear fixated on updating the clients,
Yes, absolutely! Because that's where the current problem can be fixed
most easily.
but as I have said many times that, by itself, will not solve the actual problem;
That depends on what you consider as "the actual problem". If it is
empty location codes, I would not view that as a problem at all.
Cosmetics at worst, but it is as it is and we can live with it. Can't you?
the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
Once the issue is clarified, the clients will naturally be adopted to
the specification. The inconsistency is currently still at a
low/manageable level. In particular, there is absolutely nowhere an
inconsistency with (Mini)SEED headers, it's *currently* *only* a
relatively minor inconsistency at XML level that is not too big to be
handled. Besides the standard conformance this is IMHO the main
advantage of "" compared to "--".
Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
In general mappings are not the problem and are widely used anyway.
technical point of view. But it also depends on the kind of mapping.
BTW, I stated the above in the context of the mapping "" <-> " ", which
is very easy using trim() et al. and in particular does not require any
change to the *current* channel naming conventions. And which is also
why I wrote "widely used". Technically a mapping to/from "--" would be
quite different, because the range of values that need to be tested
against e.g. in a simple comparison makes this more complicated. In
practice, of course, one can implement this once as a library function
or by creating a location code class and overloading the == operator.
This is still considerably more work than just calling trim().
With a mapping to/from "--" we also have the "forever" issue. With ""
vs. " " this is not an issue at all, especially in view of the existing
SEED headers. That's a big difference.
So what is the problem with mapping?
Because as already said, the mapping would be required "forever" due to
persistence of the data. In particular, you cannot declare existing
metadata invalid. Hence you would have to keep supporting "" and " ",
too, to maintain backward compatibility.
I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
This is already a very technical discussion and where you detect
"personal indignation" is left as an exercise to the reader.
Here is the context: "StationXML is the new dataless SEED, as such it
should be compatible between data centers for at least the core
parameters. Currently StationXML produced by SeisComP3 and other data
centers for the same exact same channel can be documents that are
semantically different channels (NSLC do not match). We would not do
this with dataless SEED, right? Any notion that a reader of the XML
must apply rules to the core name values is rubbish, no transformation
should be needed at that point. These are documents that are being
stored as files, loaded into databases, and otherwise saved."
In my interpretation this is a statement against any mapping.
But I recognize we are all in a learning curve and opinions evolve and
sometimes change. Plus we are having two discussions on the list and
off-list, each with several sub-threads. This probably creates
additional confusion and doesn't quite help to focus on what the real
issues are *currently*. Channel naming can be discussed and should,
actually has been many times before, but can we not focus on what needs
to be solved in very short time without introducing additional
incompatibilities?
All frustration about ugly empty location codes aside, I maintain that
there are technically rather no issues with them. Nothing that cannot be
solved quickly with rather few modifications plus a clarification in the
FDSN StationXML specification. In fact I already proposed a clear
timeline you might want to comment on. What follows is a quote from my
email of July-24, 18:43 UTC to this list.
----------------------------------------------------------------------
Actually we are currently seeking to solve a particular incompatibility
between FDSN StationXML produced by different services, but technically
that is much, *much* easier to achieve than the introduction of a new
and incompatible channel naming. I would welcome an intensified
discussion on the latter, but not in the context of the current FDSN
StationXML or web services.
It's actually quite strange that already now, early after the
introduction of FDSN StationXML, we are not only choking over minor
incompatibilities, but are discussing "solutions" to problems that
apparently noone had noticed they existed before StationXML... Looks
like shooting at sparrows with cannons, IMO.
There used to be a IASPEI working group on station codes that even came
up with a new channel naming "standard"[*], which, however, doesn't seem
to have gained much acceptance so far. Nevertheless this is the level at
which changes to channel naming need to be discussed, even though the
process may be frustratingly slow. But the impact of such a change is
just too big to be decided ad hoc.
To summarize:
We will not find a future-proof channel naming convention quickly.
Partial changes, especially if incompatible, should be absolutely avoided.
The particular problem we attempted (and still need) to solve in the
first place is a location code incompatibility due to differently strict
adherence to the SEED specification. Not surprisingly I prefer the
empty-string representation for the empty location code. To be
pragmatic, I propose the following time line:
* Accept that at least for a transitional period we have to accept the
existence of space-space and empty location codes.
* During a transitional period, don't change the servers that now
produce space-space location codes, as that would break compatibility
with some clients. We want to keep compatibility rather than introducing
new incompatibility.
* Instead update the clients to accept both space-space and empty
location codes by trimming trailing spaces if present. This is a
relatively minor change and IIRC this is on IRIS's agenda already, which
is highly appreciated.
At this point in time, interoperability is restored, even without
server-side changes. This is important as it may take quite some time
for the users to actually upgrade their clients; but it doesn't hurt anyone.
* Finally the server upgrades where needed. The decision as to when to
upgrade the server side can be made once it is considered appropriate;
there is absolutely no hurry from the client side.
The needed changes for the above proposal are very small compared to the
huge changes that would be required at every level to implement a new
channel naming convention. This may (and hopefully will) take place some
time in the future, but it requires a lot of preparation and
coordination. I am pretty sure that we will have a considerable number
of beers in the meantime.
Besides the beers, we should focus on finalizing the specification of
FDSN StationXML. There are too many under-defined elements even in the
xsd and the risk of serious incompatibilities is very high.
Cheers
Joachim
[*] http://www.isc.ac.uk/registries/download/IR_implementation.pdf
-
-
-
-
-
-
-
Hi Philip
Philip Crotwell [07/28/2014 03:37 PM]:
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
Exactly. Good that you provided this as example because we were already
make a stab at the underlying issue.:)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
getting lost so deep within the details that we may have forgotten that
this thread has just moved to this list and that it might not have been
clear to everybody what the issue actually is...
Even few lines of XML can (sometimes) help make things clearer. ;)
There are two basic issues being discussed (and yes, more beer would help!:)
This would be ideal, but I think it is not realistic:
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
If "--" were introduced, it would be impossible not to keep supporting "
" and "" practically forever in order to maintain backward compatibility.
If "" were to become the preferred empty location code, we still have
probably billions of instances of " " out in the wild that should not
be declared invalid.
The same is true for " " resp. "".
In short some mapping is required anyway. Fortunately the mapping
between "" and " " is trivial by using methods like trim(), strip() or
so (depending on the language). Most seismic data handling software
already does it anyway because it's so obvious. For XML it's at least
ObsPy and SeisComP. SEED readers that trim the location code include
rdseed, libmseed and qlib2. All database engines provide a trim()
method, so database queries are not a problem either.
if trim(loc1) == trim(loc2) ...
may be slightly more expensive in terms of CPU cycles than
if loc1 == loc2 ...
but I presume that this is nowhere a real issue. With the added benefit
that the currently not strictly SEED compliant " " location code is
then within the valid range kind of automatically.
2) If we agree to 1), then what should those exact characters be? The
Compatibility is absolutely essential. This is probably the main reason
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
why even after more than 10 years of discussion about new channel
naming, there hasn't been any real progress AFAICS. And despite all
shortcomings the current NSLC is really remarkable as it is accepted and
used nearly everywhere. Don't put that at stake.
Thanks btw for your other comments about potential issues related to
white space.
Cheers
Joachim
-
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one representation, the sooner we stop creating incompatible metadata the better.
Regarding issue #2:
b) two spaces=" "
This is what IRIS currently does, not strictly SEED but avoids empty identifiers.
c) two dashes="--".
This would require work and continued mapping, the mapping is clear between SEED-based holdings and StationXML. SEED headers and data records could also be considered, but is a bigger can of worms.
a) empty=""
This is possibly the most straight forward mapping of SEED information, but leaves us with an empty string identifier.
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential ambiguity where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for example, when program is testing for and then extracting a value from an XML document parsed into a structure/object it could appear as if the value was not present. Of course the coding in probably every language can be done to avoid such a false negative, but it is a pitfall that we would be asking all future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or translate XML, the matching of a string attribute usually uses the string() function. Specifying the string attribute to match when the attribute has a value is straightforward, when trying to match the empty string the query is for NOT string. In the boolean functions of XPath "a string is true if and only if its length is non-zero" (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a fringe technology, an empty string is not just another kind of string but an anomoly.
c) In JavaScript the getAttribute() method returns the same value whether the attribute was an empty string or unspecified. The method is no longer recommended but illustrates that such thinking is not limited to niche projects.
2) Organizing data in structures such as a nested hash is pretty common: %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty identifier as a key works in some languages but it is obtuse and unclear. I'm sure there are many other data structures that would use location by itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs, etc. and non-obvious many other places such as GUI fields. We have largely addressed this issue for FDSN web services (at the DMC for other mechanisms as well) by making "--" a synonym for the empty location ID. In other words we are already mapping "--" into the empty location ID for requests and users are learning this association. A further adoption of the synonym into the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats outside of its purview, the adoption or matching of the core channel naming fields in other formats is certainly in the FDSN's best interest. This has been happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially empty (optional?) location ID could make such adoption harder as it is an wrinkle, especially for space delimited formats. I believe these broader implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical problems, but an empty identifier leaves the unfortunate wrinkles for all future users and coders.
Here is an example of someone that was confused by current metadata, I'll bet if there was a value in the locationCode it would have been easier:
https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
There is a chance we will end up with the empty location identifier, but the considerations should go beyond an assumption that an empty string is the only choice.
Since an empty location field in SEED essentially means unset, perhaps we should consider making the locationCode attribute optional and leaving it out of the XML when it is empty in SEED. In this line of thinking, the empty string is just a hack to include a required attribute when in fact there is nothing to include. For me the "unset" aspect is unsettlingly similar to "unknown", but it's an idea preferred by at least one engineer at the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
_______________________________________________
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Philip Crotwell2014-07-31 14:18:39Hi
Just another data point, Earthworm, which is widely used by regional
networks globally, has long had the "dash dash is the same as space
space" convention. So dash dash is not something pulled out of thin
air, it is how at least I do things already.
And this shows that it is fairly common (if not technically correct)
for users to regard space-space as the location id instead of
regarding it as null with 2 spaces for padding. My guess is that very
few users are aware of this, and even as someone who has been writing
seismic software for a couple of decades I still think of the location
id as space-space, not null.
http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
Philip
On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one
representation, the sooner we stop creating incompatible metadata the
better.
Regarding issue #2:
b) two spaces=" "
This is what IRIS currently does, not strictly SEED but avoids empty
identifiers.
c) two dashes="--".
This would require work and continued mapping, the mapping is clear between
SEED-based holdings and StationXML. SEED headers and data records could
also be considered, but is a bigger can of worms.
a) empty=""
This is possibly the most straight forward mapping of SEED information, but
leaves us with an empty string identifier.
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential ambiguity
where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for example,
when program is testing for and then extracting a value from an XML document
parsed into a structure/object it could appear as if the value was not
present. Of course the coding in probably every language can be done to
avoid such a false negative, but it is a pitfall that we would be asking all
future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or
translate XML, the matching of a string attribute usually uses the string()
function. Specifying the string attribute to match when the attribute has a
value is straightforward, when trying to match the empty string the query is
for NOT string. In the boolean functions of XPath "a string is true if and
only if its length is non-zero"
(http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
fringe technology, an empty string is not just another kind of string but an
anomoly.
c) In JavaScript the getAttribute() method returns the same value whether
the attribute was an empty string or unspecified. The method is no longer
recommended but illustrates that such thinking is not limited to niche
projects.
2) Organizing data in structures such as a nested hash is pretty common:
%{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
identifier as a key works in some languages but it is obtuse and unclear.
I'm sure there are many other data structures that would use location by
itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs,
etc. and non-obvious many other places such as GUI fields. We have largely
addressed this issue for FDSN web services (at the DMC for other mechanisms
as well) by making "--" a synonym for the empty location ID. In other words
we are already mapping "--" into the empty location ID for requests and
users are learning this association. A further adoption of the synonym into
the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats outside
of its purview, the adoption or matching of the core channel naming fields
in other formats is certainly in the FDSN's best interest. This has been
happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
empty (optional?) location ID could make such adoption harder as it is an
wrinkle, especially for space delimited formats. I believe these broader
implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical
problems, but an empty identifier leaves the unfortunate wrinkles for all
future users and coders.
Here is an example of someone that was confused by current metadata, I'll
bet if there was a value in the locationCode it would have been easier:
https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
There is a chance we will end up with the empty location identifier, but the
considerations should go beyond an assumption that an empty string is the
only choice.
Since an empty location field in SEED essentially means unset, perhaps we
should consider making the locationCode attribute optional and leaving it
out of the XML when it is empty in SEED. In this line of thinking, the
empty string is just a hack to include a required attribute when in fact
there is nothing to include. For me the "unset" aspect is unsettlingly
similar to "unknown", but it's an idea preferred by at least one engineer at
the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help!
:)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Philip Crotwell2014-07-31 15:59:04Yet another data point, going all the way back to vol 1 issue 1 of the
DMC newsletter introducing location ids:
"The Location Identifier is a two character code that, when used in
conjunction with the other data specifiers, uniquely identifies a data
stream."
and
"Historically, within a SEED volume, the Location Identifier was left
“blank” (consisted of two spaces)."
and
"GSN Use of Location Identifiers
Valid characters for location identifiers are [space, 0-9, A-Z][space,
0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "
http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
From this it seems that location id was intended to be exactly 2
characters, not zero or two. My feeling is that we have a long
tradition of the location id being "space-space" and not null or
empty. Personally I really dislike space-space, but the only thing I
dislike more than space-space is empty.
Philip
On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Just another data point, Earthworm, which is widely used by regional
networks globally, has long had the "dash dash is the same as space
space" convention. So dash dash is not something pulled out of thin
air, it is how at least I do things already.
And this shows that it is fairly common (if not technically correct)
for users to regard space-space as the location id instead of
regarding it as null with 2 spaces for padding. My guess is that very
few users are aware of this, and even as someone who has been writing
seismic software for a couple of decades I still think of the location
id as space-space, not null.
http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
Philip
On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one
representation, the sooner we stop creating incompatible metadata the
better.
Regarding issue #2:
b) two spaces=" "
This is what IRIS currently does, not strictly SEED but avoids empty
identifiers.
c) two dashes="--".
This would require work and continued mapping, the mapping is clear between
SEED-based holdings and StationXML. SEED headers and data records could
also be considered, but is a bigger can of worms.
a) empty=""
This is possibly the most straight forward mapping of SEED information, but
leaves us with an empty string identifier.
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential ambiguity
where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for example,
when program is testing for and then extracting a value from an XML document
parsed into a structure/object it could appear as if the value was not
present. Of course the coding in probably every language can be done to
avoid such a false negative, but it is a pitfall that we would be asking all
future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or
translate XML, the matching of a string attribute usually uses the string()
function. Specifying the string attribute to match when the attribute has a
value is straightforward, when trying to match the empty string the query is
for NOT string. In the boolean functions of XPath "a string is true if and
only if its length is non-zero"
(http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
fringe technology, an empty string is not just another kind of string but an
anomoly.
c) In JavaScript the getAttribute() method returns the same value whether
the attribute was an empty string or unspecified. The method is no longer
recommended but illustrates that such thinking is not limited to niche
projects.
2) Organizing data in structures such as a nested hash is pretty common:
%{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
identifier as a key works in some languages but it is obtuse and unclear.
I'm sure there are many other data structures that would use location by
itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs,
etc. and non-obvious many other places such as GUI fields. We have largely
addressed this issue for FDSN web services (at the DMC for other mechanisms
as well) by making "--" a synonym for the empty location ID. In other words
we are already mapping "--" into the empty location ID for requests and
users are learning this association. A further adoption of the synonym into
the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats outside
of its purview, the adoption or matching of the core channel naming fields
in other formats is certainly in the FDSN's best interest. This has been
happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially
empty (optional?) location ID could make such adoption harder as it is an
wrinkle, especially for space delimited formats. I believe these broader
implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical
problems, but an empty identifier leaves the unfortunate wrinkles for all
future users and coders.
Here is an example of someone that was confused by current metadata, I'll
bet if there was a value in the locationCode it would have been easier:
https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
There is a chance we will end up with the empty location identifier, but the
considerations should go beyond an assumption that an empty string is the
only choice.
Since an empty location field in SEED essentially means unset, perhaps we
should consider making the locationCode attribute optional and leaving it
out of the XML when it is empty in SEED. In this line of thinking, the
empty string is just a hack to include a required attribute when in fact
there is nothing to include. For me the "unset" aspect is unsettlingly
similar to "unknown", but it's an idea preferred by at least one engineer at
the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would help!
:)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has 1-N
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and null is
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not needed
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a restful WS
URI? (Or a URI on the command line!) So it may be that the use of "--" for
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02€ = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are not
addressing are the concerns about an empty ID that have been brought up by
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying. The
observations from Python, ObsPy and SeisComP are a few of many that need to
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might see
where this proposal comes from. Yes, we would need to treat empty location
IDs and "--" as synonyms for a very long time. Empty strings in XML mean
you will need to map empty IDs to empty strings, NULL and whatever an XML
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the the
answer is obvious to me: the FDSN can sanction that the two-space string is
the XML synonym for the empty SEED location ID and we adjust the schema to
make sure a string of whitespaces is preserved. Then SeisComP can change
its relatively new StationXML implementation and ALL existing clients will
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in effect,
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly not
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but they
are here and something the DMC and users of the DMC have dealt with. Issues
have been identified with empty location IDs by us and our users. If DMC is
going to change, and push the change on all users of the DMC's StationXML,
it would be much more compelling to have a solution that addresses the low
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent empty
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean what
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data format
cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an empty
string and a parser has to produce it as such. Not as null, nil or none, but
as an empty string. Otherwise the parser is broken and needs to be fixed,
not the data!
Again: It's not about beauty. We all agree that current channel naming is
not particularly beautiful and has limitations. But our business is not to
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi,
I realize I am coming pretty late to the party here, but I'll chime in
anyway. At SCEDC (the archive for the Southern California Seismic
Network), we represent our empty location codes with two blank spaces as
well. I suspect the same is done at the Northern California Earthquake
Data Center.
There are great arguments here for each of the options presented, but I
think unless we decide to make location id optional we should not use an
empty string to denote an unset location id in StationXML. I think there
is enough variety in how an empty string is treated in different
programming languages and databases to be problematic. From some
databases' perspectives, you really might as well make it null at that
point.
So I would say, if location id is required, use a two character
substitution; my personal preference is two spaces as that seems to be the
convention (although it ain't pretty) - and we should consider in a future
version of stationXML making the location id optional.
Ellen
On Thu, Jul 31, 2014 at 5:59 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
wrote:
Yet another data point, going all the way back to vol 1 issue 1 of the
DMC newsletter introducing location ids:
"The Location Identifier is a two character code that, when used in
conjunction with the other data specifiers, uniquely identifies a data
stream."
and
"Historically, within a SEED volume, the Location Identifier was left
"blank" (consisted of two spaces)."
and
"GSN Use of Location Identifiers
Valid characters for location identifiers are [space, 0-9, A-Z][space,
0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "
http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
From this it seems that location id was intended to be exactly 2
characters, not zero or two. My feeling is that we have a long
tradition of the location id being "space-space" and not null or
empty. Personally I really dislike space-space, but the only thing I
dislike more than space-space is empty.
Philip
On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
wrote:
Hi
wrote:
Just another data point, Earthworm, which is widely used by regional
networks globally, has long had the "dash dash is the same as space
space" convention. So dash dash is not something pulled out of thin
air, it is how at least I do things already.
And this shows that it is fairly common (if not technically correct)
for users to regard space-space as the location id instead of
regarding it as null with 2 spaces for padding. My guess is that very
few users are aware of this, and even as someone who has been writing
seismic software for a couple of decades I still think of the location
id as space-space, not null.
http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
Philip
On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad<at>iris.washington.edu>
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one
representation, the sooner we stop creating incompatible metadata the
better.
Regarding issue #2:
b) two spaces=" "
This is what IRIS currently does, not strictly SEED but avoids empty
identifiers.
c) two dashes="--".
This would require work and continued mapping, the mapping is clear
SEED-based holdings and StationXML. SEED headers and data records could
also be considered, but is a bigger can of worms.
a) empty=""
This is possibly the most straight forward mapping of SEED information,
leaves us with an empty string identifier.
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential
where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for
when program is testing for and then extracting a value from an XML
parsed into a structure/object it could appear as if the value was not
present. Of course the coding in probably every language can be done to
avoid such a false negative, but it is a pitfall that we would be
future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or
translate XML, the matching of a string attribute usually uses the
function. Specifying the string attribute to match when the attribute
value is straightforward, when trying to match the empty string the
for NOT string. In the boolean functions of XPath "a string is true if
only if its length is non-zero"
(http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a
fringe technology, an empty string is not just another kind of string
anomoly.
c) In JavaScript the getAttribute() method returns the same value
the attribute was an empty string or unspecified. The method is no
recommended but illustrates that such thinking is not limited to niche
projects.
2) Organizing data in structures such as a nested hash is pretty common:
%{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty
identifier as a key works in some languages but it is obtuse and
I'm sure there are many other data structures that would use location by
itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs,
etc. and non-obvious many other places such as GUI fields. We have
addressed this issue for FDSN web services (at the DMC for other
as well) by making "--" a synonym for the empty location ID. In other
we are already mapping "--" into the empty location ID for requests and
users are learning this association. A further adoption of the synonym
the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats
of its purview, the adoption or matching of the core channel naming
in other formats is certainly in the FDSN's best interest. This has
happening for a long time already (ISF/IASPEI, GSE, etc.). The
empty (optional?) location ID could make such adoption harder as it is
wrinkle, especially for space delimited formats. I believe these
implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical
problems, but an empty identifier leaves the unfortunate wrinkles for
future users and coders.
Here is an example of someone that was confused by current metadata,
bet if there was a value in the locationCode it would have been easier:
There is a chance we will end up with the empty location identifier,
considerations should go beyond an assumption that an empty string is
only choice.
Since an empty location field in SEED essentially means unset, perhaps
should consider making the locationCode attribute optional and leaving
out of the XML when it is empty in SEED. In this line of thinking, the
empty string is just a hack to include a required attribute when in fact
there is nothing to include. For me the "unset" aspect is unsettlingly
similar to "unknown", but it's an idea preferred by at least one
the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu>
Hi
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would
:)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or has
spaces is the same thing - there are often implicit or explicit trim()
function hiding in a processing pipeline. A null string is not the
So an empty or blank string is the same, valid location code, and null
undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not
because sometimes it is not possible or difficult to represent an empty
string or even a string? For example on the command line or in a
URI? (Or a URI on the command line!) So it may be that the use of
intermediate processing and requests could be tolerated and somehow
official, while empty or only-blanks strings official and for persistent
data.
Just my 0.02 EURO = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are
addressing are the concerns about an empty ID that have been brought up
more than one person. The answer that empty strings are technically
possible and it all works in Python/SeisComP is less than satisfying.
observations from Python, ObsPy and SeisComP are a few of many that
be taken into account.
I agree that there is a long tail consideration for the "--" location ID
solution. Understand that some folks find an empty ID to be problematic
regardless of whether it is XML, SEED, text, whatever, then you might
where this proposal comes from. Yes, we would need to treat empty
IDs and "--" as synonyms for a very long time. Empty strings in XML
you will need to map empty IDs to empty strings, NULL and whatever an
parser might or might not produce for a long time as well (think beyond
Python and SeisComP). Either is possible, only one of them is a unique
mapping.
If the main considerations are for the least amount of disruption the
answer is obvious to me: the FDSN can sanction that the two-space
the XML synonym for the empty SEED location ID and we adjust the schema
make sure a string of whitespaces is preserved. Then SeisComP can
its relatively new StationXML implementation and ALL existing clients
be compatible with all metadata and, mostly importantly, we would have
consistent metadata.
If the empty string ID representation is adopted it would would, in
mean that the DMC would need to change its metadata service and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is certainly
out of the question, but it is not something we would do without careful
consideration. I do not find the two-space strings all that great, but
are here and something the DMC and users of the DMC have dealt with.
have been identified with empty location IDs by us and our users. If
going to change, and push the change on all users of the DMC's
it would be much more compelling to have a solution that addresses the
level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to represent
IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station XML
supports nicely the empty string and it is not null. There is a type
difference here in Python and in any other language and can be nicely
handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it will
never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de>
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To quote
from the Zen of Python
<
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean
you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data
cannot accommodate existing channel naming, then the new format is
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an
string and a parser has to produce it as such. Not as null, nil or
as an empty string. Otherwise the parser is broken and needs to be
not the data!
Again: It's not about beauty. We all agree that current channel naming
not particularly beautiful and has limitations. But our business is not
try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
Hi Philip,
Philip Crotwell wrote on 07/31/2014 02:59 PM:
http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/
Now we have the above IRIS newsletter article vs. the FDSN standard.
From this it seems that location id was intended to be exactly 2
characters, not zero or two. My feeling is that we have a long
tradition of the location id being "space-space" and not null or
empty. Personally I really dislike space-space, but the only thing I
dislike more than space-space is empty.
Which one should be considered authoritative?
Philip Crotwell wrote on 07/31/2014 01:18 PM:
Just another data point, Earthworm, which is widely used by regional
OK, at least we know now where Chad and you got that idea from. ;)
networks globally, has long had the "dash dash is the same as space
space" convention. So dash dash is not something pulled out of thin
air, it is how at least I do things already.
And this shows that it is fairly common (if not technically correct)
If you say Earthworm I say SAC. The location code in SAC is trimmed just
for users to regard space-space as the location id instead of
regarding it as null with 2 spaces for padding. My guess is that very
few users are aware of this, and even as someone who has been writing
seismic software for a couple of decades I still think of the location
id as space-space, not null.
http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
like in other software already mentioned. Does that convince you? Of
course not, as we are here discussing neither Earthworm nor SAC channel
naming convention.
This discussion is about FDSN standard channel naming. Obviously neither
Earthworm nor SAC count. Both use their own formats and within their
respective ecosystems they can of course represent the location code in
whatever way is considered appropriate, as long as the export to FDSN
formats is done properly.
Cheers
Joachim
-
-
-
-
-
Hi Chad
Chad Trabant wrote on 27.07.2014 04:52:
The answer that empty strings are technically possible and it all
Please name a few. Not abstract claims or hearsay. Point us to client
works in Python/SeisComP is less than satisfying. The observations
from Python, ObsPy and SeisComP are a few of many that need to be
taken into account.
code that cannot parse an empty location code; only then someone can
take a closer look at the matter and quite possibly provide help.
Yes, we would need to treat empty location IDs and "--" as synonyms
I don't accept the parser issues unless you provide examples; see above.
for a very long time. Empty strings in XML mean you will need to map
empty IDs to empty strings, NULL and whatever an XML parser might or
might not produce for a long time as well (think beyond Python and
SeisComP). Either is possible, only one of them is a unique
mapping.
In general mappings are not the problem and are widely used anyway. Can
you name a single software that when reading (Mini)SEED does *not* map
the location code from " " to ""? Even libmseed does!
So why not be consistent and do the same when parsing XML? It would
solve the current issues. You can then keep your two spaces as long as
you like. ;)
If the main considerations are for the least amount of disruption the
Chad, this whole discussion started back in early January with your
the answer is obvious to me: the FDSN can sanction that the two-space
string is the XML synonym for the empty SEED location ID and we
adjust the schema to make sure a string of whitespaces is preserved.
Then SeisComP can change its relatively new StationXML implementation
and ALL existing clients will be compatible with all metadata and,
mostly importantly, we would have consistent metadata.
complaint about the SeisComP fdsnws server implementation. You were
alleging that 'The resulting StationXML includes empty location IDs
(locationCode=“”), this is not allowed in SEED and therefore not allowed
in StationXML.' If the SeisComP server were indeed producing wrong XML
it would have been corrected long ago. But that's not the case! It's
actually SeisComP that produces the more correct FDSN StationXML
compared to IRIS XML, not only w.r.t. locationCode.
Don't you think it is now time to roll up the sleeves and make your
client codes work with standard compliant FDSN StationXML rather than
doctoring an FDSN standard?
If the empty string ID representation is adopted it would would, in
Did you read my email of Thursday, 18:43 UTC? Following the ideas I
effect, mean that the DMC would need to change its metadata service
and (more importantly) all users of the DMC's metadata service would
need to transition to a new metadata channel naming scheme. This is
certainly not out of the question, but it is not something we would
do without careful consideration. I do not find the two-space
strings all that great, but they are here and something the DMC and
users of the DMC have dealt with. Issues have been identified with
empty location IDs by us and our users. If DMC is going to change,
and push the change on all users of the DMC's StationXML, it would be
much more compelling to have a solution that addresses the low level
issues.
outlined there, you are technically *not* required to change any of your
servers. Only a few client codes are actually affected and even I was
able to make the changes in one of those in 10 minutes. Of course, in
total it will take longer, but if specific problematic cases related to
parsing are identified and discussed, I am sure solutions can be found
quickly. We have this list, we have skilled and enthusiastic people
working on this, so why not use this as a platform even for more
technical discussions? Or how about creating a "developer's corner"
webservices-devel or so?
Cheers
Joachim
-
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hi Chad
OK, here are a few: IRIS-WS, IRIS Fetch scripts, irisFetch.m, JWEED and probably: SeisFile, EMERALD, Epicentral and all the other codes that users of the DMC have created to read the metadata we send them.
Chad Trabant wrote on 27.07.2014 04:52:
The answer that empty strings are technically possible and it all
Please name a few. Not abstract claims or hearsay. Point us to client
works in Python/SeisComP is less than satisfying. The observations
from Python, ObsPy and SeisComP are a few of many that need to be
taken into account.
code that cannot parse an empty location code; only then someone can
take a closer look at the matter and quite possibly provide help.
The statement that observations from Python, ObsPy and SeisComP alone are insufficient evidence for key changes to FDSN formats is not an abstract claim or hearsay, it is rather obvious since they are not the only (or even majority) systems handling these formats.
Yes, we would need to treat empty location IDs and "--" as synonyms
I don't accept the parser issues unless you provide examples; see above.
for a very long time. Empty strings in XML mean you will need to map
empty IDs to empty strings, NULL and whatever an XML parser might or
might not produce for a long time as well (think beyond Python and
SeisComP). Either is possible, only one of them is a unique
mapping.
In general mappings are not the problem and are widely used anyway. Can
you name a single software that when reading (Mini)SEED does *not* map
the location code from " " to ""? Even libmseed does!
Yes, collapsing the spaces is very common and in fact how SEED specifies that it be done, no one is arguing this that I have read.
So why not be consistent and do the same when parsing XML? It would
Yes, it totally makes sense to keep the same thing going in XML, except that there have been some issues identified in both SEED and XML and this is an opportunity to begin addressing the low level issue. In essence, the empty string solution is not ideal, even if it is the most appropriate mapping given the current rules. More on this later.
solve the current issues. You can then keep your two spaces as long as
you like. ;)
If the main considerations are for the least amount of disruption the
Chad, this whole discussion started back in early January with your
the answer is obvious to me: the FDSN can sanction that the two-space
string is the XML synonym for the empty SEED location ID and we
adjust the schema to make sure a string of whitespaces is preserved.
Then SeisComP can change its relatively new StationXML implementation
and ALL existing clients will be compatible with all metadata and,
mostly importantly, we would have consistent metadata.
complaint about the SeisComP fdsnws server implementation. You were
alleging that 'The resulting StationXML includes empty location IDs
(locationCode=“”), this is not allowed in SEED and therefore not allowed
in StationXML.'
The point I was making is that the least number of users would be effected if the FDSN decided to require two characters and allow spaces. I say this because I believe most of the users of StationXML get their metadata from the DMC at the moment and have already dealt with the metadata in some way.
If the SeisComP server were indeed producing wrong XML
This statement is heavy on hubris and naivety.
it would have been corrected long ago. But that's not the case! It's
actually SeisComP that produces the more correct FDSN StationXML
compared to IRIS XML, not only w.r.t. locationCode.
There in no easy way to determine if any given StationXML document is fully "correct". The schema does not have enough information to vet the contents of a StationXML document, it basically checks to make sure the layout is correct, so XML schema validity is not sufficient for "correct". Currently, the StationXML contents are supposed to follow the guidelines defined in SEED. I think many of us agree that we should work to put as many of the content rules as possible into future versions of the schema to clarify many of the gray areas of StationXML. The concept of "more correct" is qualitative when used generally and is rarely or never more important than "compatible with the consensus”.
Such gray areas exist even within SEED. Within the FDSN here is how we have traditionally dealt with the gray areas: when implementing a piece of software to produce something already in production at another center(s) you usually use the other(s) as a reference (or collaborate with then). If important differences are found they are brought up and discussed civilly and a plan is made to make things compatible, usually with user impact being a high priority. Unfortunately, this is not how this current situation unfolded and we are left with incompatible metadata.
Don't you think it is now time to roll up the sleeves and make your
You do not unilaterally decide what compliant FDSN StationXML is. As you well know I have made a proposal to the FDSN and asked for clarity on this issue, seems worth knowing where we are going.
client codes work with standard compliant FDSN StationXML rather than
doctoring an FDSN standard?
If the empty string ID representation is adopted it would would, in
Did you read my email of Thursday, 18:43 UTC? Following the ideas I
effect, mean that the DMC would need to change its metadata service
and (more importantly) all users of the DMC's metadata service would
need to transition to a new metadata channel naming scheme. This is
certainly not out of the question, but it is not something we would
do without careful consideration. I do not find the two-space
strings all that great, but they are here and something the DMC and
users of the DMC have dealt with. Issues have been identified with
empty location IDs by us and our users. If DMC is going to change,
and push the change on all users of the DMC's StationXML, it would be
much more compelling to have a solution that addresses the low level
issues.
outlined there, you are technically *not* required to change any of your
servers. Only a few client codes are actually affected and even I was
able to make the changes in one of those in 10 minutes.
Of course, in
Thanks for the suggestions. Technical discussions in other sub-threads.
total it will take longer, but if specific problematic cases related to
parsing are identified and discussed, I am sure solutions can be found
quickly. We have this list, we have skilled and enthusiastic people
working on this, so why not use this as a platform even for more
technical discussions? Or how about creating a "developer's corner"
webservices-devel or so?
Chad
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
-
-
Philip Crotwell [07/25/14 15:35]:
It sounds like you are saying "change is hard, so we shouldn't do it".
That depends very much on the kind of change I would say. The change
that is currently being discussed is a hack that might help XML parser
developers, with hefty repercussions otherwise.
If that is the change, it indeed shouldn't be done.
What I would highly welcome and support is a mature, future-proof
channel naming concept (involving network codes, too!) with a clear
implementation roadmap. There have been attempts in this direction, led
by the USGS and the ISC, but they are not reflected in current FDSN
StationXML.
Cheers
Joachim
-
-
-
-
Hi Yazan,
(passing along our in-person conversation for the list)
I do not think allowing a null or optional location ID is a good idea, here is why: in SEED there is always a location ID (the two-byte field cannot be left out), it is always known; when it is empty it is still a specific location ID. Allowing optional location ID in XML leaves a translation from StationXML to SEED a bit ambiguous. The spec would have to clarify that "not present" always means the empty location ID in SEED, I find this translation not nearly as clear and obvious as having a real value present.
As you say, many parsers will have problems with "" or " ". It should not be up to every reader (e.g. converters) of StationXML to properly interpret the multiple possible results coming out of any parser, the formats should have a unique and unambiguous mapping.
Chad
On Jul 24, 2014, at 9:29 AM, Yazan Suleiman <yazan.suleiman<at>gmail.com> wrote:
Is modifying stationxml schema (to allow null location, required=false) a possibility? example:
<Channel startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode=" " startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
vs
<Channel locationCode="--" startDate="1992-09-23T00:00:00" restrictedStatus="open" endDate="1994-04-01T00:00:00" code="BHE">
It is very reasonable to have a null value for location in any object representation of station schema. " " or "" is inaccurate and only introduces more trouble and complexity.
If changing the schema is not an option then " " or "" is a very bad idea. Many parsers treat "" or " " as empty and will ignore them. If translating this into SEED is the issue, then it is the convertor responsibility to take care of the conversion.
Yazan
On Wed, Jul 23, 2014 at 10:30 AM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hello WS users and developers,
A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
Hi Chad/Philip,
thanks for reviving this discussion on the appropriate mailing list.
Chad Trabant [07/23/14 19:30]:
Some background: In the SEED channel naming scheme there is a
Note that the padding spaces do not form part of the location code
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces.
string itself, according to the SEED specification, which only allows
alphanumeric characters.
Actually the location code is treated in the SEED specification not
differently than e.g. a station code, from which trailing spaces are
removed in every software that I know of.
BTW, I think the two spaces are not there to avoid having an empty
location ID, but are a relict from Fortran 77 days. :)
Following this practice, we express this in StationXML by setting
On the other hand, even in the IRIS ecosystem the empty location code is
the locationCode attribute to a string of two spaces. We have done
this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
prominently used as empty string. Not everywhere, but e.g. the
well-known rdseed program removes the trailing spaces when reading SEED,
resulting in an empty C string if there are two padding spaces in the
location code field. A very natural way of dealing with the trailing
spaces, especially in view of the clear specifications in the SEED
manual. Also in the IRIS BUD file name convention (e.g. [1]), empty
location codes become empty strings, with no apparent problems with
mapping or otherwise.
There now exists another fdsnws-station implementation that returns
This depends on how you evaluate the location code. If you simply follow
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this
follows the SEED rules of trimming the padding spaces from the
values.
Unfortunately this means there are now flavors of StationXML that
are incompatible in the core channel name identifiers. In other
words, two StationXML documents for the same SEED channel appear,
without extra field translation, to be different channels.
the SEED specification and always trim the location code, like e.g.
ObsPy and rdseed do, the problem you describe is avoided altogether.
Of course, the requirement for removing trailing white space doesn't
come without the cost of a few more CPU cycles. But if that were an
issue we wouldn't be using XML, would we? Also, this rule would need to
be written into the future specification of FDSN StationXML.
As most of you are users of SEED and StationXML metadata (at some
And would be easy to keep compatible with the two spaces.
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have
been using this for a long while, it is not precisely the SEED value
(but the mapping could be formalized). Also, whitespace in
attributes does have some theoretical challenges: the wonky rules for
XML attributes related to whitespace handling require removal of
spaces in some cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
This representation is also widely used for a long time already, incl.
at IRIS (see above).
3) Set locationCode to “--“ (two dashes). This avoids issues with
Let's not mix request mechanisms with the data format. Data formats are
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
a holy grail whereas request mechanisms change more frequently.
Suppose we could retrieve full SEED using the web services. Even then it
would be equally appropriate to use "--" on the request side. But there
is no justification for breaking data format compatibility just for
matching particular request mechanisms.
All of these solutions are viable in that we can make them work in
I share your view that the empty location code is not optimal. However,
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier
can be confused for “unknown” if the programmer is not careful,
which is semantically very different than “set to empty”. The
two-space strings that the DMC is currently using are also not ideal,
they are hard for humans to read and potentially weird with XML
rules. The dashed location ID avoids these issues but requires the
most change. I also think requiring all readers of StationXML to
translate (e.g. remove padding) is a bad idea, the values in SEED
should be uniquely mapped to values in StationXML.
the world is not perfect and the empty location code is a fact we have
to live with and have been able to live with for decades. Seismologists
have learned how to handle it. Existing software libraries make the
empty location code as painless as possible. Technically it is a no-issue.
The solution to the empty location code is not to incompatibly break a
data format without a technical reason but only because of aesthetics.
Empty strings are represented in XML without problems, particularly if
used in XML attributes. In fact, it is an advantage of a modern XML
format that we don't need the padding spaces etc. any more.
Philip Crotwell [07/23/14 20:37]:
Years ago we had full SEED. Then because of keeping metadata updated,
In all of the above cases it is the interfaces that have to deal with
we switched to a separation into dataless SEED + miniseed. Now,
because of the complexities and limitations of dataless SEED, the
future looks like StationXML + miniseed. I am all for this change,
but how the location id is resolved really needs to address not just
what do we do in StationXML, but what do we do in StationXML +
miniseed.
I also lean towards "--" for the simple reason that there are so many
instances where I have been bitten by spaces or nulls. Even though I
know about this, I still get caught. File names, urls, user gui
displays, etc all have problems with spaces nor nulls and as a
practical matter it is harder to see something that isn't there than
something that is there. Furthermore, using null or space-space is
really hard as a command line argument in the shell. That said, "--"
already means "long option name" in many *nix programs, so if we
were starting from scratch, underscores like "__" might be a better
choice. The SEED manual already lists underscore as a separate item
in the flags section (p32), so maybe worth considering.
the empty location code. I agree that an empty string is not always easy
to visualize, but we know how to deal with it. Nothing prevents us from
using "--" or "__" in GUIs or external formats or input to the fdsnws's.
I myself use "__" e.g. in pick lists for ease of visualization,
awk/grep'ing etc.; but that has nothing to do with the XML or SEED
representation. The same is true for the request formats; as long as the
user knows how to explicitly specify an empty location code, it's fine.
But if option 3 is choosen, would there be any possibility of
This would mean that GE.UGM.--.BHZ and GE.UGM..BHZ are equivalent, in
amending the SEED spec so that "--" is actually valid within the
location id field, with the caveat that it is synonymous with
space-space/null, but "--" is the preferred value?
fact: identical stream ID's. Technically this is feasible. But are the
downstream software repercussions, let alone the confusion among the
data users a price we are willing to pay? I don't think so.
I realize that doing a global search and replace on a petabyte of
On the other hand, the use of XML is a chance to get rid of the fixed
miniseed data is probably not going to happen, but it would be
really nice if whatever location id is in StationXML, it is exactly
2 characters and is the exact same 2 characters as in miniseed.
field values with padding. This may not be relevant today, but it might
become in the future.
Frankly the whole idea of making location ids "optional" was a real
But fortunately you do that only once and wrapping this into a library
mistake IMHO. I am sure that anyone that has every written code to
deal with location ids has something that looks like: if (locid ==
null or locid == "" or locid == " " or locid == "--") then locid =
"--" which is just a painfully stupid thing to have to do over and
over and over again. Grumble grumble grumble.:(
function is a no-brainer.
On a side note, I am curious to know (technically) under what
circumstances locid==null would evaluate to true, considering
<xs:attribute name="locationCode" type="xs:string" use="required"/>
from the xsd[2].
Lastly, as far as I can tell the SEED spec doesn't disallow
I haven't come across any of those but there it makes sense. Yet I don't
null/empty station or channel codes, so addressing that at the same
time might be wise.
think we can or should prevent empty location codes. They are a very
common reality.
My $0.02, please pick one string, and only one string, and use it
If "only one string" is a requirement, it is probably the strongest
everywhere.
argument against a change.
"Only one string" will only work without deviation from the current use
of SEED location code. We can't recode the archives, let alone the local
archives users have built for their work over the years. Well,
technically it could be done, but I think we all agree that we don't
want to, as this would have to involve not only (Mini)SEED waveform data
but also meta data and parametric data. How about... QuakeML archives?
Datalogger firmware? We can't change all of that and if we add e.g. "--"
to the range of *possible* location codes, we still have to continue to
"forever" support the other representations in order to be backward
compatible.
Generally speaking, it is good to discuss future possibilities for
channel naming conventions, not only with respect to the location code.
But the naming should ideally be independent of the used data formats.
XML is a big step towards becoming less dependent on the limits imposed
by SEED, but we are not going to get rid of SEED for many years to come.
Actually we are currently seeking to solve a particular incompatibility
between FDSN StationXML produced by different services needs to be
solved, but technically that is much, *much* easier to achieve than the
introduction of a new and incompatible channel naming. I would welcome
an intensified discussion on the latter, but not in the context of the
current FDSN StationXML or web services.
It's actually quite strange that already now, early after the
introduction of FDSN StationXML, we are not only choking over minor
incompatibilities, but are discussing "solutions" to problems that
apparently noone had noticed they existed before StationXML... Looks
like shooting at sparrows with cannons, IMO.
There used to be a IASPEI working group on station codes that even came
up with a new channel naming "standard"[3], which, however, doesn't seem
to have gained much acceptance so far. Nevertheless this is the level at
which changes to channel naming need to be discussed, even though the
process may be frustratingly slow. But the impact of such a change is
just too big to be decided ad hoc.
To summarize:
We will not find a future-proof channel naming convention quickly.
Partial changes, especially if incompatible, should be absolutely avoided.
The particular problem we attempted (and still need) to solve in the
first place is a location code incompatibility due to differently strict
adherence to the SEED specification. Not surprisingly I prefer the
empty-string representation for the empty location code. To be
pragmatic, I propose the following time line:
* Accept that at least for a transitional period we have to accept the
existence of space-space and empty location codes.
* During a transitional period, don't change the servers that now
produce space-space location codes, as that would break compatibility
with some clients. We want to keep compatibility rather than introducing
new incompatibility.
* Instead update the clients to accept both space-space and empty
location codes by trimming trailing spaces if present. This is a
relatively minor change and IIRC this is on IRIS's agenda already, which
is highly appreciated.
At this point in time, interoperability is restored, even without
server-side changes. This is important as it may take quite some time
for the users to actually upgrade their clients; but it doesn't hurt anyone.
* Finally the server upgrades where needed. The decision as to when to
upgrade the server side can be made once it is considered appropriate;
there is absolutely no hurry from the client side.
The needed changes for the above proposal are very small compared to the
huge changes that would be required at every level to implement a new
channel naming convention. This may (and hopefully will) take place some
time in the future, but it requires a lot of preparation and
coordination. I am pretty sure that we will have a considerable number
of beers in the meantime. ;)
Besides the beers, we should focus on finalizing the specification of
FDSN StationXML. There are too many under-defined elements even in the
xsd and the risk of serious incompatibilities is very high.
Cheers
Joachim
[1] http://www.iris.edu/bud_stuff/bud_dir/GE/UGM/UGM.GE..BHZ.2014.205
[2] http://www.fdsn.org/xml/station/fdsn-station-1.0.xsd
[3] http://www.isc.ac.uk/registries/download/IR_implementation.pdf
-
Doug Neuhauser2014-08-12 17:53:26I've been following this thread, and thought it was time to chime in.
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
--
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
-
SEED is a fixed width record format, most likely the reason for blank
padded fields. I'd recommend not carrying that over into the XML format.
The primary purpose of the channel code is to be a unique identifier, and
an empty string is distinct from any non-empty value.
Jeremy
On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu>
wrote:
I've been following this thread, and thought it was time to chime in.
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
--
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
No argument that the padding spaces should be left behind.
Assuming we are unwilling to actually address the blank location IDs we should consider making the attribute optional. It has been suggested by a few folks already.
One could argue that from a purists' point of view a blank location in SEED is an unset location, so the purist mapping of this is to leave it unset in XML. This would be simply done by changing the schema to make the attribute optional. (this is not my favorite idea, I've argued against it, but at least it is cleanly follows SEED common practice).
Allowing the value to be optional in SEED (within the limitations of a fixed width format) and required in StationXML is trying to eat your cake and keep it too. We do not force other optional string values of SEED to be required in the XML, so why make an exception for location?
Chad
On Aug 12, 2014, at 12:18 PM, Fee, Jeremy <jmfee<at>usgs.gov> wrote:
SEED is a fixed width record format, most likely the reason for blank padded fields. I'd recommend not carrying that over into the XML format.
The primary purpose of the channel code is to be a unique identifier, and an empty string is distinct from any non-empty value.
Jeremy
On Tue, Aug 12, 2014 at 12:53 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
I've been following this thread, and thought it was time to chime in.
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hi Chad,
Chad Trabant wrote on 08/12/2014 11:41 PM:
No argument that the padding spaces should be left behind.
Good.
Assuming we are unwilling to actually address the blank location IDs we
The "empty" location code format does need to be addressed, because even
should consider making the attribute optional. It has been suggested by
a few folks already.
if an attribute is optional, it may be present and it may represent an
empty location code.
Making the location code optional is of course possible. Actually in
QuakeML it is optional, too. There the reason and semantics are special,
though, because in parametric data like picks the location code *may* be
unknown indeed (same for the channel code).
By contrast, in SEED or StationXML the location code is not unknown.
One could argue that from a purists' point of view a blank location in
A location code that is encoded in SEED as two spaces is not unset but
SEED is an unset location, so the purist mapping of this is to leave it
unset in XML.
"empty" and still a string. The purist mapping is therefore the empty
string.
That said we can make the location code optional in XML, but isn't it
really any more than merely a cosmetic trick to hide an "ugly", empty
location code in the XML?
Conforming clients need to be prepared to receive an "empty" (not
unset!) value anyway. In other words, neither of
<Channel locationCode="" ...
<Channel locationCode=" " ...
<Channel locationCode="--" ...
is forbidden by just making locationCode optional. You can leave it
unset but you don't have to. A parser still has to accept at least ""
and " " (in fact " ", too, as we have just learned). And since
"explicit is better than implicit" it seems wise not to make the
location code optional. Though technically it is feasible provided there
is a clear default value, as otherwise a missing location code might (at
least in principle) be mistaken as unknown like in QuakeML.
This would be simply done by changing the schema to make
I would prefer to make optional only those fields, for which information
the attribute optional. (this is not my favorite idea, I've argued
against it, but at least it is cleanly follows SEED common practice).
may indeed be unknown, like a digitizer serial number.
Allowing the value to be optional in SEED (within the limitations of a
In SEED, the location code is always present even if it is an empty
fixed width format) and required in StationXML is trying to eat your
cake and keep it too. We do not force other optional string values of
SEED to be required in the XML, so why make an exception for location?
string. It's not optional.
But even "optional" still implies the option to explicitly specify an
"empty" location code. And the question of how to properly represent
this in XML is still an open one (even though opinions seem to converge
towards the empty string).
Yazan Suleiman wrote on 08/13/2014 06:17 AM:
In my opinion StationXml shouldn’t force me to provide a value for
But empty is not the same as unknown. The "empty" location code in SEED
an attribute that is unknown to me.
does carry a specific information that needs to be represented in XML,
whether we like the value or not.
While the purpose of StationXml schema is to map between SEED and
+1
XML, limitations of Seed shouldn’t be carried over.
Cheers
Joachim
-
-
-
Hi Doug,
Thanks for your 2 cents.
Regarding only certain software being the problem with blank location, I guess you did not like any of the others pointed out here?
http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
If you want a non-Oracle database example, the 'ltree' data type in Postgres is a natural fit for N.S.L.C hierarchal data and it cannot take a blank identifier either. I do not see how the number of pain points with empty identifiers will not grow over time.
The proposal for a "--" location ID was to change SEED by starting with StationXML as a transition. The first step could be done without changing all the miniSEED in all the archives, the next step could be done with a future revision in miniSEED. This would required mapping, which we are already doing for requests and will continue to do indefinitely. For sure this would be non-trivial change over time, the question is whether it is worth it or not.
If we are going to continue to shoot ourselves in the foot with unset location IDs let's do so with clear eyes, the problems are not limited to esoteric software or use cases. Also, a blank string is not the only choice, more on that next.
Chad
PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
I've been following this thread, and thought it was time to chime in.
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
--
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Doug Neuhauser2014-08-12 23:12:03
On 08/12/2014 02:31 PM, Chad Trabant wrote:
Hi Doug,
Most of these arguments are not related directly to stationxml, but to the
Thanks for your 2 cents.
Regarding only certain software being the problem with blank
location, I guess you did not like any of the others pointed out
here?
http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
empty location code. However, those that are related to empty location code
appear to be the inability to distinguish between an attribute that is not
supplied vs an empty string attribute. If you make the LocationCode optional,
it seems like you are in the same boat. If it is not specified, what do
you use for location code? blank-blank? Then that is the same logic you
use if your query does not return a location code.
Your example of:
%{net}{sta}{loc}{chan} = "some lvalue"
is not a good one, because of no separation between components.
How do you distinguish between
net = G, sta = ABCD
and net = GA, sta = BCD?
If you want a non-Oracle database example, the 'ltree' data type in
I don't see anything in your original proposal about changing SEED.
Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
take a blank identifier either. I do not see how the number of pain
points with empty identifiers will not grow over time.
The proposal for a "--" location ID was to change SEED by starting
with StationXML as a transition. The first step could be done without
changing all the miniSEED in all the archives, the next step could be
done with a future revision in miniSEED. This would required mapping,
which we are already doing for requests and will continue to do
indefinitely. For sure this would be non-trivial change over time,
the question is whether it is worth it or not.
I only see a proposal to change the SEED representation in StationXML.
If we are going to continue to shoot ourselves in the foot with unset
I agree with the above statement. However, trying to address the issue
location IDs let's do so with clear eyes, the problems are not
limited to esoteric software or use cases. Also, a blank string is
not the only choice, more on that next.
just within StationXML I think is just another bandaid, and I don't see
why the StationXML needs this bandaid.
Since StationXML does not appear to need this bandaid, I don't understand
the need.
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
- Doug N
Chad
--
PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
I've been following this thread, and thought it was time to chime in.
_______________________________________________
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
--
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
-
On Aug 12, 2014, at 4:12 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
On 08/12/2014 02:31 PM, Chad Trabant wrote:
I completely agree, mostly the same boat. The points were: a) empty IDs have challenges that are not limited to any esoteric software and b) if an value is unset why represent it with anything at all, playing devils advocate: why is location special?
Hi Doug,
Most of these arguments are not related directly to stationxml, but to the
Thanks for your 2 cents.
Regarding only certain software being the problem with blank
location, I guess you did not like any of the others pointed out
here?
http://www.iris.washington.edu/pipermail/webservices/2014-July/000583.html
empty location code. However, those that are related to empty location code
appear to be the inability to distinguish between an attribute that is not
supplied vs an empty string attribute. If you make the LocationCode optional,
it seems like you are in the same boat. If it is not specified, what do
you use for location code? blank-blank? Then that is the same logic you
use if your query does not return a location code.
Your example of:
Those are completely distinct values in a nested hash (they are not a concatenated string). {G}{ABCD} is a different path than {GA}{BCD}.
%{net}{sta}{loc}{chan} = "some lvalue"
is not a good one, because of no separation between components.
How do you distinguish between
net = G, sta = ABCD
and net = GA, sta = BCD?
If you want a non-Oracle database example, the 'ltree' data type in
I don't see anything in your original proposal about changing SEED.
Postgres is a natural fit for N.S.L.C hierarchal data and it cannot
take a blank identifier either. I do not see how the number of pain
points with empty identifiers will not grow over time.
The proposal for a "--" location ID was to change SEED by starting
with StationXML as a transition. The first step could be done without
changing all the miniSEED in all the archives, the next step could be
done with a future revision in miniSEED. This would required mapping,
which we are already doing for requests and will continue to do
indefinitely. For sure this would be non-trivial change over time,
the question is whether it is worth it or not.
I only see a proposal to change the SEED representation in StationXML.
If we are going to continue to shoot ourselves in the foot with unset
I agree with the above statement. However, trying to address the issue
location IDs let's do so with clear eyes, the problems are not
limited to esoteric software or use cases. Also, a blank string is
not the only choice, more on that next.
just within StationXML I think is just another bandaid, and I don't see
why the StationXML needs this bandaid.
Since StationXML does not appear to need this bandaid, I don't understand
It depends on what you mean by SEED. StationXML IS SEED for most intents and purposes. Changing all aspects of SEED at once is a much larger can of worms, and this would be an opportune time to change just the StationXML representation of SEED. Over the next couple of months and years as folks convert from dataless SEED to StationXML, an opportunity exists to make such low level changes, which will get much harder once the adoption is farther along.
the need.
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
Chad
- Doug N
Chad
--
PS. The TA started 10 years ago and followed common conventions at the time, that network now has many non-blank IDs. The GSN has converted to few to none blank IDs anymore and, ironically?, the BK network appears to use many non-blank location IDs too. Not sure how it's important but it does show the trend towards an increased use of non-blank location IDs.
On Aug 12, 2014, at 10:53 AM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
I've been following this thread, and thought it was time to chime in.
_______________________________________________
IMHO, the FDSN web services should follow the SEED convention.
The SEED convention states that station, network, channel, and location
are all blank-padded fields of fixed lengths.
To me, this means that that we should either use the full blank-padded
fields for ALL of these identifiers, or for none of them.
eg:
<Network code="G " >
<Station code="KIP ">
<Channel locationCode=" " code="BHZ">
or
<Network code="G" >
<Station code="KIP">
<Channel locationCode="" code="BHZ">
Personally I think the latter (blank trimmed) is better.
I agree that the blank location code is a pain when dealing with
Oracle, white-space delimited fields such as command lines, etc,
but unless we change the SEED convention, I don't see that making
an aliases of "-" or "--" in FDSN station XML improves the situation.
AFAIK, the ONLY reason that we struggle with the two-blank issue is
that certain software (eg Oracle) cannot distinguish between the
the empty string (string of length 0) and NULL. Therefore, the DMC,
NCEDC, AQMS, etc have been forced to not use a blank-trimmed string
for the location code.
Unless we propose to change the SEED standard, all of our data in
our archives, and all of our current acquisition systems, I think
that we have to live with "emtpy" location codes.
I have not seen any compelling argument for representing a blank (empty)
location code in FDSN station XML as anything but the empty string.
If you want to have "" and " " be equivalent in FDSN station XML,
you can simply change the schema definition of the field to be a "token"
rather than a "string", in which case any representation with blanks will
be reduced to the empty string. Problem solved?
I note that the NCEDC implementation currently uses 1 blank " "
for empty location code. I have no problem changing this if we can
agree on a convention.
I also note ironically that the TA network run by IRIS is one of the
largest networks in terms of stations, and uses blank location codes.
My 2 cents...
- Doug N
On 07/23/2014 10:30 AM, Chad Trabant wrote:
Hello WS users and developers,
--
A recent discussion between FDSN data centers is centered on
representation of empty location IDs in StationXML, the default
format returned by the fdsnws-station web service. The DMC may be
changing how it represents location ID in XML and text formats based
on these discussions. We are asking for input as any such change will
effect users of our metadata service.
Some background: In the SEED channel naming scheme there is a
hierarchy of network, station, location and channel identifiers. Of
these, it is only the location ID that is commonly accepted to be
empty. In the SEED format the location ID is a two-character field,
where the value is left justified and padded with spaces if needed.
When the value is empty the field is simply two spaces of padding.
Historically, and presumably to avoid having an empty location ID,
the DMC has represented “empty” location IDs as a string of two
spaces. Following this practice, we express this in StationXML by
setting the locationCode attribute to a string of two spaces. We have
done this so long we sometimes forget that it is not compliant with a
strict reading of SEED, at best it falls into the vagaries of SEED,
on the other hand we have been doing it for years with no apparent
problems (in fact it has helpfully avoided an empty core
identifier).
There now exists another fdsnws-station implementation that returns
StationXML with the locationCode attribute set to an empty string
when the SEED value is empty. The justification is that this follows
the SEED rules of trimming the padding spaces from the values.
Unfortunately this means there are now flavors of StationXML that are
incompatible in the core channel name identifiers. In other words,
two StationXML documents for the same SEED channel appear, without
extra field translation, to be different channels.
As most of you are users of SEED and StationXML metadata (at some
level) and some of you have written code to parse these formats and
manage the data returned by the DMC and other FDSN data centers, we
are asking for your input regarding the potential solutions.
Here are the options being considered for mapping an empty location
ID in SEED to StationXML:
1) Set locationCode to two spaces. While the DMC and users have been
using this for a long while, it is not precisely the SEED value (but
the mapping could be formalized). Also, whitespace in attributes does
have some theoretical challenges: the wonky rules for XML attributes
related to whitespace handling require removal of spaces in some
cases (we have never heard of problems though).
2) Set locationCode to an empty string. This would match the strict
value present in SEED, an empty identifier.
3) Set locationCode to “--“ (two dashes). This avoids issues with
whitespace in XML attribute values and avoids issues with an empty
identifier. Also, this matches the request mechanisms where “--“ is
accepted as a synonym for an empty location ID.
All of these solutions are viable in that we can make them work in
code, it is a matter of choosing one for future FDSN metadata, pick
your poison so to speak.
In my personal opinion, an empty location ID is an unfortunate quirk
of SEED that we should rectify in StationXML. An empty identifier can
be confused for “unknown” if the programmer is not careful, which is
semantically very different than “set to empty”. The two-space
strings that the DMC is currently using are also not ideal, they are
hard for humans to read and potentially weird with XML rules. The
dashed location ID avoids these issues but requires the most change.
I also think requiring all readers of StationXML to translate (e.g.
remove padding) is a bad idea, the values in SEED should be uniquely
mapped to values in StationXML.
Thanks for reading this far. Your opinion and input is appreciated.
regards,
Chad
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Philip Crotwell2014-08-13 03:50:25On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
<doug<at>seismo.berkeley.edu> wrote:
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
OK, I would like to offer a proposal to change to SEED to eliminate
- Doug N
blank location ids.
Wait, can I do that?
:)
Philip
-
Yazan Suleiman2014-08-13 04:17:19Seed represents unidentified locations as “ “ [2 empty spaces] (This is a
limitation of SEED), while most (if not all) modern languages represent
such thing as NULL. NULL does not equal “” or “ “ or “--“. NULL is NULL
and should be represented as such. While SEED is limited, XML is not. Why
should we incorporate Seed limitations into any new XML schema.
Unidentified attributes (values=unidentified or not provided or null) are
omitted in XML. When an attribute does not appear in the document, then it
is NULL (no confusion there).
In my opinion StationXml shouldn’t force me to provide a value for an
attribute that is unknown to me. While the purpose of StationXml schema is
to map between SEED and XML, limitations of Seed shouldn’t be carried over.
For historical data, software should take care of any conversion needed.
What you store in your database is outside the scope of StationXml.
On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
wrote:
On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
<doug<at>seismo.berkeley.edu> wrote:
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
OK, I would like to offer a proposal to change to SEED to eliminate
- Doug N
blank location ids.
Wait, can I do that?
:)
Philip
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
Hear hear Yazan!
Ellen
On Tue, Aug 12, 2014 at 9:17 PM, Yazan Suleiman <yazan.suleiman<at>gmail.com>
wrote:
Seed represents unidentified locations as " " [2 empty spaces] (This is a
limitation of SEED), while most (if not all) modern languages represent
such thing as NULL. NULL does not equal "" or " " or "--". NULL is NULL
and should be represented as such. While SEED is limited, XML is not. Why
should we incorporate Seed limitations into any new XML schema.
Unidentified attributes (values=unidentified or not provided or null) are
omitted in XML. When an attribute does not appear in the document, then it
is NULL (no confusion there).
In my opinion StationXml shouldn't force me to provide a value for an
attribute that is unknown to me. While the purpose of StationXml schema is
to map between SEED and XML, limitations of Seed shouldn't be carried over.
For historical data, software should take care of any conversion needed.
What you store in your database is outside the scope of StationXml.
On Tue, Aug 12, 2014 at 5:50 PM, Philip Crotwell <crotwell<at>seis.sc.edu>
wrote:
On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
_______________________________________________
<doug<at>seismo.berkeley.edu> wrote:
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
OK, I would like to offer a proposal to change to SEED to eliminate
- Doug N
blank location ids.
Wait, can I do that?
:)
Philip
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
-
-
On Tue, Aug 12, 2014 at 7:12 PM, Doug Neuhauser
Hi all,
<doug<at>seismo.berkeley.edu> wrote:
IF YOU WANT TO CHANGE SEED, THEN PROPOSE TO CHANGE SEED.
- Doug N
In fact changing SEED is the ultimate goal, and while the proposal was to start that process with StationXML, Doug is correct that the core issue is with the SEED rules themselves. With that, it's probably time to move any SEED-changing discussion to the FDSN mailing lists. Thank you to the users that voiced your opinion, you are welcome to continue to chime in with thoughts here if you would like.
The idea of changing the schema type for locationCode to a token is appealing if that will help make the now 3 flavors of locationCode more compatible to XML-parsing clients. We should discuss this more in the FDSN context, it might buy us more time to discuss and address the lower level issue. Unfortunately it does not address the text output used by many.
Chad
-
-
-
-