David Loring
2014-07-31 18:45:44
Dear All
To trim, or not to trim, that is the question—
Whether 'tis Nobler in the mind to suffer
the spaces and dashes of outrageous parsers,
or to take Arms against a Sea of spaces,
and by opposing end them? .............
This comment from Joachim says it all
Just, as another 'uropeen' said, my 2 euro cents worth. (€0.02)
Best regards
David
-----Original Message-----
From: webservices-bounces<at>iris.washington.edu [webservices-bounces<at>iris.washington.edu] On Behalf Of webservices-request<at>iris.washington.edu
Sent: 31 July 2014 11:19
To: webservices<at>iris.washington.edu
Subject: webservices Digest, Vol 40, Issue 20
Send webservices mailing list submissions to
webservices<at>iris.washington.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://www.iris.washington.edu/mailman/listinfo/webservices
or, via email, send a message with subject or body 'help' to
webservices-request<at>iris.washington.edu
You can reach the person managing the list at
webservices-owner<at>iris.washington.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of webservices digest..."
Today's Topics:
1. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
2. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
3. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
4. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
5. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
6. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
7. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
----------------------------------------------------------------------
Message: 1
Date: Thu, 31 Jul 2014 09:33:08 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9F134.5040404<at>gfz-potsdam.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Chad Trabant wrote on 31.07.2014 08:49:
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
Joachim
------------------------------
Message: 2
Date: Thu, 31 Jul 2014 00:42:29 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <87A0DA88-3009-4F51-A9A6-22D341EA7524<at>iris.washington.edu>
Content-Type: text/plain; charset=iso-8859-1
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
Chad
------------------------------
Message: 3
Date: Thu, 31 Jul 2014 01:01:30 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <CF7E3807-048D-4AF7-BC9A-A7169276A245<at>iris.washington.edu>
Content-Type: text/plain; charset=windows-1252
On Jul 31, 2014, at 12:20 AM, Robert Barsch <barsch<at>egu.eu> wrote:
There is no rule in the SEED world preventing two channel names differing only by location ID, in fact it happens often. Since location can be empty it means that we can have both XX.STA.00.LHZ and XX.STA..LHZ, if location were described as "unknown" these two become ambiguous. I do not know off hand of any cases where the differences are between an empty location ID and an filled one, but it would be a weird case to eliminate (or even describe) in the specification.
Chad
Message: 4
Date: Thu, 31 Jul 2014 10:07:54 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9F95A.7020209<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Chad Trabant wrote on 31.07.2014 07:42:
Chad Trabant wrote on 31.07.2014 06:57:
The issue is *not* about other data formats. It is up to every developer to save empty location codes in whatever way they like in their formats, databases, bulletins etc. That is absolutely no problem and hence doesn't require a solution.
Here the issue is about representing data in XML. Since we have a well accepted and widely implemented channel naming standard *already*, and since users are working with StationXML *already*, what we need *now* is a clarification about the proper representation of *current* channel naming in StationXML.
Joachim
------------------------------
Message: 5
Date: Thu, 31 Jul 2014 10:13:37 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9FAB1.1020104<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Chad Trabant wrote on 31.07.2014 09:42:
Joachim
------------------------------
Message: 6
Date: Thu, 31 Jul 2014 01:37:36 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <125129E1-E864-408F-8EE5-246B7E41B5BE<at>iris.washington.edu>
Content-Type: text/plain; charset=us-ascii
On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
That is a strange transition from libmseed to web service clients that I do not understand. You appear fixated on updating the clients, but as I have said many times that, by itself, will not solve the actual problem; the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
Chad
Message: 7
Date: Thu, 31 Jul 2014 03:18:47 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <F330F8A2-2D82-475C-919F-4FBA647087C7<at>iris.washington.edu>
Content-Type: text/plain; charset="windows-1252"
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one representation, the sooner we stop creating incompatible metadata the better.
Regarding issue #2:
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential ambiguity where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for example, when program is testing for and then extracting a value from an XML document parsed into a structure/object it could appear as if the value was not present. Of course the coding in probably every language can be done to avoid such a false negative, but it is a pitfall that we would be asking all future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or translate XML, the matching of a string attribute usually uses the string() function. Specifying the string attribute to match when the attribute has a value is straightforward, when trying to match the empty string the query is for NOT string. In the boolean functions of XPath "a string is true if and only if its length is non-zero" (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a fringe technology, an empty string is not just another kind of string but an anomoly.
c) In JavaScript the getAttribute() method returns the same value whether the attribute was an empty string or unspecified. The method is no longer recommended but illustrates that such thinking is not limited to niche projects.
2) Organizing data in structures such as a nested hash is pretty common: %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty identifier as a key works in some languages but it is obtuse and unclear. I'm sure there are many other data structures that would use location by itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs, etc. and non-obvious many other places such as GUI fields. We have largely addressed this issue for FDSN web services (at the DMC for other mechanisms as well) by making "--" a synonym for the empty location ID. In other words we are already mapping "--" into the empty location ID for requests and users are learning this association. A further adoption of the synonym into the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats outside of its purview, the adoption or matching of the core channel naming fields in other formats is certainly in the FDSN's best interest. This has been happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially empty (optional?) location ID could make such adoption harder as it is an wrinkle, especially for space delimited formats. I believe these broader implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical problems, but an empty identifier leaves the unfortunate wrinkles for all future users and coders.
Here is an example of someone that was confused by current metadata, I'll bet if there was a value in the locationCode it would have been easier:
https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
There is a chance we will end up with the empty location identifier, but the considerations should go beyond an assumption that an empty string is the only choice.
Since an empty location field in SEED essentially means unset, perhaps we should consider making the locationCode attribute optional and leaving it out of the XML when it is empty in SEED. In this line of thinking, the empty string is just a hack to include a required attribute when in fact there is nothing to include. For me the "unset" aspect is unsettlingly similar to "unknown", but it's an idea preferred by at least one engineer at the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
An HTML attachment was scrubbed...
URL: http://www.iris.washington.edu/pipermail/webservices/attachments/20140731/98927e71/attachment.html
------------------------------
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
End of webservices Digest, Vol 40, Issue 20
*******************************************
To trim, or not to trim, that is the question—
Whether 'tis Nobler in the mind to suffer
the spaces and dashes of outrageous parsers,
or to take Arms against a Sea of spaces,
and by opposing end them? .............
This comment from Joachim says it all
If it ain't broke don't fix it and risk breaking it. To trim or not to trim is so trivial that this discussion is a mass of bits creating a storm in a teacup. I parse websites all the time, creating parsers for XML, tables and all sorts of weird format and multiple languages. No programmer should be in the least bit concerned about this matter. I can't see that getting ones knickers in a twist, or as our American cousins say, getting ones panties in a bunch over a couple of spaces is worth the mental effort!!The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.<<
Just, as another 'uropeen' said, my 2 euro cents worth. (€0.02)
Best regards
David
-----Original Message-----
From: webservices-bounces<at>iris.washington.edu [webservices-bounces<at>iris.washington.edu] On Behalf Of webservices-request<at>iris.washington.edu
Sent: 31 July 2014 11:19
To: webservices<at>iris.washington.edu
Subject: webservices Digest, Vol 40, Issue 20
Send webservices mailing list submissions to
webservices<at>iris.washington.edu
To subscribe or unsubscribe via the World Wide Web, visit
http://www.iris.washington.edu/mailman/listinfo/webservices
or, via email, send a message with subject or body 'help' to
webservices-request<at>iris.washington.edu
You can reach the person managing the list at
webservices-owner<at>iris.washington.edu
When replying, please edit your Subject line so it is more specific than "Re: Contents of webservices digest..."
Today's Topics:
1. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
2. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
3. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
4. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
5. Re: A question of location ID, how to represent empty IDs in
XML? (Joachim Saul)
6. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
7. Re: A question of location ID, how to represent empty IDs in
XML? (Chad Trabant)
----------------------------------------------------------------------
Message: 1
Date: Thu, 31 Jul 2014 09:33:08 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9F134.5040404<at>gfz-potsdam.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Chad Trabant wrote on 31.07.2014 08:49:
There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus ?? and ? ? should already work resulting in minimal disruption in the users? workflows.Actually, this does not appear to be happening, in the parsers I?ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
Has anyone observed this automatic trimming on any system?No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.?--? would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
Joachim
------------------------------
Message: 2
Date: Thu, 31 Jul 2014 00:42:29 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <87A0DA88-3009-4F51-A9A6-22D341EA7524<at>iris.washington.edu>
Content-Type: text/plain; charset=iso-8859-1
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Chad Trabant wrote on 31.07.2014 08:49:HI Joachim,
There may be parsers that do strip whitespaces, but I also doubt that this is required by any standard.In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus ?? and ? ? should already work resulting in minimal disruption in the users? workflows.Actually, this does not appear to be happening, in the parsers I?ve used the whitespaces are not stripped. I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case. Then I wrote some test cases and observed no trimming, see test data and code below. Perhaps this attribute is CDATA for some reason? I think we are stuck with the fact that empty string and two spaces are different.
Has anyone observed this automatic trimming on any system?No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed
are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.The software changes are just one aspect. In fact, software changes are trivial compared to the nightmare of changing the existing metadata in databases, decades of SEED data, parametric data and so on.?--? would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.Yes, it would require software changes, the question is would what we gain be worth it. Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for *selecting* the empty SEED location IDs.
Chad
------------------------------
Message: 3
Date: Thu, 31 Jul 2014 01:01:30 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <CF7E3807-048D-4AF7-BC9A-A7169276A245<at>iris.washington.edu>
Content-Type: text/plain; charset=windows-1252
On Jul 31, 2014, at 12:20 AM, Robert Barsch <barsch<at>egu.eu> wrote:
-----BEGIN PGP SIGNED MESSAGE-----Hi Robert,
Hash: SHA1
Dear all,
Maybe some stupid questions: Are there actually any valid use cases
for having to distinct between empty and unknown location code within
the data? If so, does this than also apply for network, station,
channel codes? So if the community opts to go for unknown as well as
an empty/unset markers for the location field shouldn't be the same
markers used for unknown/unset network etc.?
There is no rule in the SEED world preventing two channel names differing only by location ID, in fact it happens often. Since location can be empty it means that we can have both XX.STA.00.LHZ and XX.STA..LHZ, if location were described as "unknown" these two become ambiguous. I do not know off hand of any cases where the differences are between an empty location ID and an filled one, but it would be a weird case to eliminate (or even describe) in the specification.
Ah. I think that is basically what I have been finding, thanks for the confirmation.Usually without DTD or XML schema definition, all whitespaces areIn terms of existing StationXML parsers I assume most are just
stripping whitespaces from the location code and thus ?? and ?
? should already work resulting in minimal disruption in the users?
workflows.
significant whitespaces and should be preserved by any XML parsers.
Chad
I------------------------------
guess Lion meant with StationXML parser more than the plain XML
parser. I don't know what other clients do, but ObsPy strips
internally all Net/Sta/Loc/Cha field values.
Cheers,
Robert
PS: for some reason did my previous mail sent last weekend not appear
at this list, also I didn't receive all replies to this thread as
archived in
http://www.iris.washington.edu/pipermail/webservices/2014-July/thread.
html
(e.g.
http://www.iris.washington.edu/pipermail/webservices/2014-July/000554.
html was missing)- I didn't get any bounce or error message from
mailman either? Any idea?
- --
Dr. Robert Barsch
EGU Office Munich
Luisenstr. 37
80333 Munich
Germany
Phone: +49-89-21806565
Fax: +49-89-218017855
eMail: barsch<at>egu.eu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlPZ7jAACgkQIVowwEY4LjSiAgCgusUFqWH2KagflnXyxGzGcynz
duEAn3TfsXf7uPmQ99c4N4V6v/KxUNel
=vpwD
-----END PGP SIGNATURE-----
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
Message: 4
Date: Thu, 31 Jul 2014 10:07:54 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9F95A.7020209<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Chad Trabant wrote on 31.07.2014 07:42:
On Jul 28, 2014, at 9:03 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:That's a pretty ambitious list considering...
On Mon, Jul 28, 2014 at 10:18 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:I completely agree that this should be our ultimate goal. The idea of making this change in XML is to set us on just such a path.
The most important consistency is with the SEED standard.I would argue that consistency for end users is the only thing that
matters. Consistency with the SEED spec may be a means to that end,
but if the end users do not perceive it as being consistent, it isn't
consistent.
To me, that means we need to look at the bigger picture. Ideally we
would have location ids that could be represented by exactly the same
characters in:
stationXML
miniseed
URLS
client displays
databases
and even email
in a way that is explicit, consistent and natural for the end user.
Here are some others I would add to the list:
use in command lines
use in other data formats
Chad Trabant wrote on 31.07.2014 06:57:
There are many more clients than there are servers, many clientsWe are still talking here about a metadata format, aren't we? And you want to prescribe how users shall display empty location codes in GUI displays? You must be kidding...
written by users and out of our direct control. Requiring every
client to know some post-parsing processing rules is a terrible idea,
[...]
The issue is *not* about other data formats. It is up to every developer to save empty location codes in whatever way they like in their formats, databases, bulletins etc. That is absolutely no problem and hence doesn't require a solution.
Here the issue is about representing data in XML. Since we have a well accepted and widely implemented channel naming standard *already*, and since users are working with StationXML *already*, what we need *now* is a clarification about the proper representation of *current* channel naming in StationXML.
Joachim
------------------------------
Message: 5
Date: Thu, 31 Jul 2014 10:13:37 +0200
From: Joachim Saul <saul<at>gfz-potsdam.de>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <53D9FAB1.1020104<at>gfz-potsdam.de>
Content-Type: text/plain; charset=windows-1252; format=flowed
Chad Trabant wrote on 31.07.2014 09:42:
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
HI Joachim,Has anyone observed this automatic trimming on any system?No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed
are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point.
Joachim
------------------------------
Message: 6
Date: Thu, 31 Jul 2014 01:37:36 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <125129E1-E864-408F-8EE5-246B7E41B5BE<at>iris.washington.edu>
Content-Type: text/plain; charset=us-ascii
On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Chad Trabant wrote on 31.07.2014 09:42:Hi Joachim,
On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
HI Joachim,Has anyone observed this automatic trimming on any system?No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
In fact even the *already* empty string location codes from libmseed
are trimmed again at ObsPy level, just in case. ;)
You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
That is a strange transition from libmseed to web service clients that I do not understand. You appear fixated on updating the clients, but as I have said many times that, by itself, will not solve the actual problem; the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
Here is what you said about mapping:Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.Because otherwise a mapping would be required "forever". Until at
least very recently you were strongly against any mapping, even
calling the idea "rubbish" at one point
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
In general mappings are not the problem and are widely used anyway.So what is the problem with mapping?
I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
Chad
Joachim------------------------------
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
Message: 7
Date: Thu, 31 Jul 2014 03:18:47 -0700
From: Chad Trabant <chad<at>iris.washington.edu>
To: IRIS Web Services List <webservices<at>iris.washington.edu>
Subject: Re: [webservices] A question of location ID, how to represent
empty IDs in XML?
Message-ID: <F330F8A2-2D82-475C-919F-4FBA647087C7<at>iris.washington.edu>
Content-Type: text/plain; charset="windows-1252"
Thanks Philip, I think you have outlined the issues well.
Regarding issue #1, I strongly feel that we need to choose one representation, the sooner we stop creating incompatible metadata the better.
Regarding issue #2:
b) two spaces=" "This is what IRIS currently does, not strictly SEED but avoids empty identifiers.
c) two dashes="--".This would require work and continued mapping, the mapping is clear between SEED-based holdings and StationXML. SEED headers and data records could also be considered, but is a bigger can of worms.
a) empty=""This is possibly the most straight forward mapping of SEED information, but leaves us with an empty string identifier.
Below are a few of the issues we note regarding empty identifiers
1) They are too similar to "unknown" (which results in potential ambiguity where channels are only differentiated by location ID):
a) In many languages an empty string evaluates to false; if, for example, when program is testing for and then extracting a value from an XML document parsed into a structure/object it could appear as if the value was not present. Of course the coding in probably every language can be done to avoid such a false negative, but it is a pitfall that we would be asking all future users and coders to know about.
b) In XPath (the query language for XSLT), which is used to search or translate XML, the matching of a string attribute usually uses the string() function. Specifying the string attribute to match when the attribute has a value is straightforward, when trying to match the empty string the query is for NOT string. In the boolean functions of XPath "a string is true if and only if its length is non-zero" (http://www.w3.org/TR/xpath/#function-boolean). So in XPath, hardly a fringe technology, an empty string is not just another kind of string but an anomoly.
c) In JavaScript the getAttribute() method returns the same value whether the attribute was an empty string or unspecified. The method is no longer recommended but illustrates that such thinking is not limited to niche projects.
2) Organizing data in structures such as a nested hash is pretty common: %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl). The empty identifier as a key works in some languages but it is obtuse and unclear. I'm sure there are many other data structures that would use location by itself as a key.
3) Empty identifiers are difficult to specify on the command line, URLs, etc. and non-obvious many other places such as GUI fields. We have largely addressed this issue for FDSN web services (at the DMC for other mechanisms as well) by making "--" a synonym for the empty location ID. In other words we are already mapping "--" into the empty location ID for requests and users are learning this association. A further adoption of the synonym into the metadata would solve many of these problems.
4) While it is certainly not the FDSN's task to define data formats outside of its purview, the adoption or matching of the core channel naming fields in other formats is certainly in the FDSN's best interest. This has been happening for a long time already (ISF/IASPEI, GSE, etc.). The potentially empty (optional?) location ID could make such adoption harder as it is an wrinkle, especially for space delimited formats. I believe these broader implications deserve some consideration.
I'm sure most developers could come up with solutions to the technical problems, but an empty identifier leaves the unfortunate wrinkles for all future users and coders.
Here is an example of someone that was confused by current metadata, I'll bet if there was a value in the locationCode it would have been easier:
https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
There is a chance we will end up with the empty location identifier, but the considerations should go beyond an assumption that an empty string is the only choice.
Since an empty location field in SEED essentially means unset, perhaps we should consider making the locationCode attribute optional and leaving it out of the XML when it is empty in SEED. In this line of thinking, the empty string is just a hack to include a required attribute when in fact there is nothing to include. For me the "unset" aspect is unsettlingly similar to "unknown", but it's an idea preferred by at least one engineer at the DMC.
Chad
On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
Hi-------------- next part --------------
Being on the cheap side of the Atlantic, I'll save us $0.00068 and
make a stab at the underlying issue. :)
Here, with lots of stuff cut out, is how a channel is "identified" in
stationXML via the fdsn station web service at the IRIS DMC,
http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&
level=channel&format=xml&nodata=404
<Network code="GE" >
<Station code="UGM">
<Channel locationCode=" " code="BHZ">
Another implementation of the same web service (not sure of url) gives
back this:
<Network code="GE" >
<Station code="UGM">
<Channel locationCode="" code="BHZ">
with locationCode="" vs =" " being the difference under consideration.
There are two basic issues being discussed (and yes, more beer would
help! :)
1) Should all valid stationXML documents be required to use the exact
same string of characters to represent the location id for this
channel. This is would allow a comparison operation to be "simple" in
that it can compare the attribute values without additional
processing.
2) If we agree to 1), then what should those exact characters be? The
current top choices are
a) empty=""
b) two spaces=" "
c) two dashes="--".
1) seems less controversial than 2) in that greater compatibility is
generally seen as positive.
This is primarily a question about the form of the stationXML
documents, but obviously there are connections to the way requests are
formed, the relationship to miniseed/seed, the way things are coded in
software and how much detailed understanding we expect of end users.
Philip
On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax<at>free.fr> wrote:
Hello all,_______________________________________________
Can someone give a concise statement of the original problem being
discussed, it only or primarily a concern about XML?
It seems to me that with modern languages a string that is empty or
has 1-N spaces is the same thing - there are often implicit or
explicit trim() function hiding in a processing pipeline. A null string is not the same.
So an empty or blank string is the same, valid location code, and
null is undefined or uninitialized location code.
With regards to the "--" pseudo for the location code, is this not
needed because sometimes it is not possible or difficult to represent
an empty string or even a string? For example on the command line or
in a restful WS URI? (Or a URI on the command line!) So it may be
that the use of "--" for intermediate processing and requests could
be tolerated and somehow official, while empty or only-blanks strings
official and for persistent data.
Just my 0.02? = $0.0268
Best regards to all,
Anthony
On 27/07/2014 04:52, Chad Trabant wrote:
Hi Marcelo,
Thanks for your thoughts as well. Something that you and Joachim are
not addressing are the concerns about an empty ID that have been
brought up by more than one person. The answer that empty strings
are technically possible and it all works in Python/SeisComP is less
than satisfying. The observations from Python, ObsPy and SeisComP
are a few of many that need to be taken into account.
I agree that there is a long tail consideration for the "--" location
ID solution. Understand that some folks find an empty ID to be
problematic regardless of whether it is XML, SEED, text, whatever,
then you might see where this proposal comes from. Yes, we would
need to treat empty location IDs and "--" as synonyms for a very long
time. Empty strings in XML mean you will need to map empty IDs to
empty strings, NULL and whatever an XML parser might or might not
produce for a long time as well (think beyond Python and SeisComP).
Either is possible, only one of them is a unique mapping.
If the main considerations are for the least amount of disruption the
the answer is obvious to me: the FDSN can sanction that the two-space
string is the XML synonym for the empty SEED location ID and we
adjust the schema to make sure a string of whitespaces is preserved.
Then SeisComP can change its relatively new StationXML implementation
and ALL existing clients will be compatible with all metadata and,
mostly importantly, we would have consistent metadata.
If the empty string ID representation is adopted it would would, in
effect, mean that the DMC would need to change its metadata service
and (more
importantly) all users of the DMC's metadata service would need to
transition to a new metadata channel naming scheme. This is
certainly not out of the question, but it is not something we would
do without careful consideration. I do not find the two-space
strings all that great, but they are here and something the DMC and
users of the DMC have dealt with. Issues have been identified with
empty location IDs by us and our users. If DMC is going to change,
and push the change on all users of the DMC's StationXML, it would be
much more compelling to have a solution that addresses the low level issues.
regards,
Chad
----- Original Message -----
From: "Marcelo Bianchi" <m.tchelo<at>gmail.com>
To: "IRIS Web Services List" <webservices<at>iris.washington.edu>
Sent: Friday, July 25, 2014 7:38:17 PM
Subject: Re: [webservices] A question of location ID, how to
represent empty IDs in XML?
Hi Philip and All,
I totaly agree with Joachim, was planning to answer but he was much
faster. What you guys are proposing is not a solution. the station
XML supports nicely the empty string and it is not null. There is a
type difference here in Python and in any other language and can be
nicely handled internally.
Also the location id is not just a string it is a key entry to link
miniseed to metadata and making an exception at this level just
because a user interface cannot proper render it without ambiguity
does not sounds like a proper way proposal. I am not favorable in
creating an exception that will have to be carried over along the
decades to come. Alternatives solutions for this issue should be
searched on the end user interface.
with my best regards,
Marcelo Bianchi
--
2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell<at>seis.sc.edu>:
It sounds like you are saying "change is hard, so we shouldn't do it".
I would argue that change is hard and so if we don't do it now it
will never happen. StationXML is new enough that there is already a
disruption, we should seize the chance. If we do not do something now
about null loc ids, it will be a decade or two before we get another
chance.
It is time to drive the stake through the heart of null location ids.
Kill the evil while we have a chance.
Philip
On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul<at>gfz-potsdam.de> wrote:
Hello Rob,
Rob Newman wrote on 24.07.2014 18:51:
For what it's worth, I would also vote for the "--" standard. To
quote from the Zen of Python
http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.h
tml>
(my language of choice):
"Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced."
I'd add "Compatible is better than incompatible." :)
Number 2 is especially relevant here:
"Explicit is better than implicit."
My favorite would be:
"Special cases aren't special enough to break the rules."
Quoted whitespace and nulls are painful. Code what you mean, and mean
what you code. It's easier for everyone.
But what if we simply *mean* "empty string"?
The issue is not about beauty, pain or ease. It's about standard
conformance. We already have a channel naming standard. If a new data
format cannot accommodate existing channel naming, then the new format is flawed.
But that's not even the case here...
An XML document that contains
<Channel locationCode="" ...
is not malformed. There's an attribute that *explicitly* contains an
empty string and a parser has to produce it as such. Not as null, nil
or none, but as an empty string. Otherwise the parser is broken and
needs to be fixed, not the data!
Again: It's not about beauty. We all agree that current channel
naming is not particularly beautiful and has limitations. But our
business is not to try to solve that issue now and here.
Cheers
Joachim
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
--
Sent from my iClayTablet
________________________________
Anthony Lomax
161 All?e du Micocoulier, 06370 Mouans-Sartoux, France
tel: +33 (0)4 93 75 25 02 e-mail: anthony<at>alomax.net web:
http://www.alomax.net
Twitter: @ALomaxNet
Science & Special Topics: http://www.alomax.net/science
Software: http://www.alomax.net/software - updates:
https://twitter.com/ALomaxNet
________________________________
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
An HTML attachment was scrubbed...
URL: http://www.iris.washington.edu/pipermail/webservices/attachments/20140731/98927e71/attachment.html
------------------------------
_______________________________________________
webservices mailing list
webservices<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/webservices
End of webservices Digest, Vol 40, Issue 20
*******************************************