[webservices] A question of location ID, how to represent empty IDs in XML?

Philip Crotwell crotwell at seis.sc.edu
Thu Jul 31 05:59:04 PDT 2014


Yet another data point, going all the way back to vol 1 issue 1 of the
DMC newsletter introducing location ids:

"The Location Identifier is a two character code that, when used in
conjunction with the other data specifiers, uniquely identifies a data
stream."
and
"Historically, within a SEED volume, the Location Identifier was left
“blank” (consisted of two spaces)."
and
"GSN Use of Location Identifiers
Valid characters for location identifiers are [space, 0-9, A-Z][space,
0-9, A-Z]. (So space-space is a legitimate Location Identifier.) "

http://www.iris.edu/ds/newsletter/vol1/no1/specification-of-seismograms-the-location-identifier/

>From this it seems that  location id was intended to be exactly 2
characters, not zero or two. My feeling is that we have a long
tradition of the location id being "space-space" and not null or
empty. Personally I really dislike space-space, but the only thing I
dislike more than space-space is empty.

Philip

On Thu, Jul 31, 2014 at 7:18 AM, Philip Crotwell <crotwell at seis.sc.edu> wrote:
> Hi
>
> Just another data point, Earthworm, which is widely used by regional
> networks globally, has long had the "dash dash is the same as space
> space" convention. So dash dash is not something pulled out of thin
> air, it is how at least I do things already.
>
> And this shows that it is fairly common (if not technically correct)
> for users to regard space-space as the location id instead of
> regarding it as null with 2 spaces for padding. My guess is that very
> few users are aware of this, and even as someone who has been writing
> seismic software for a couple of decades I still think of the location
> id as space-space, not null.
>
> http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt
>
> Philip
>
>
> On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad at iris.washington.edu> wrote:
>>
>> Thanks Philip, I think you have outlined the issues well.
>>
>> Regarding issue #1, I strongly feel that we need to choose one
>> representation, the sooner we stop creating incompatible metadata the
>> better.
>>
>> Regarding issue #2:
>>
>>  b) two spaces="  "
>>
>>
>> This is what IRIS currently does, not strictly SEED but avoids empty
>> identifiers.
>>
>>  c) two dashes="--".
>>
>>
>> This would require work and continued mapping, the mapping is clear between
>> SEED-based holdings and StationXML.  SEED headers and data records could
>> also be considered,  but is a bigger can of worms.
>>
>>  a) empty=""
>>
>>
>> This is possibly the most straight forward mapping of SEED information, but
>> leaves us with an empty string identifier.
>>
>> Below are a few of the issues we note regarding empty identifiers
>>
>> 1) They are too similar to "unknown" (which results in potential ambiguity
>> where channels are only differentiated by location ID):
>>
>>   a) In many languages an empty string evaluates to false; if, for example,
>> when program is testing for and then extracting a value from an XML document
>> parsed into a structure/object it could appear as if the value was not
>> present.  Of course the coding in probably every language can be done to
>> avoid such a false negative, but it is a pitfall that we would be asking all
>> future users and coders to know about.
>>
>>   b) In XPath (the query language for XSLT), which is used to search or
>> translate XML, the matching of a string attribute usually uses the string()
>> function.  Specifying the string attribute to match when the attribute has a
>> value is straightforward, when trying to match the empty string the query is
>> for NOT string.  In the boolean functions of XPath "a string is true if and
>> only if its length is non-zero"
>> (http://www.w3.org/TR/xpath/#function-boolean).  So in XPath, hardly a
>> fringe technology, an empty string is not just another kind of string but an
>> anomoly.
>>
>>   c) In JavaScript the getAttribute() method returns the same value whether
>> the attribute was an empty string or unspecified.  The method is no longer
>> recommended but illustrates that such thinking is not limited to niche
>> projects.
>>
>> 2) Organizing data in structures such as a nested hash is pretty common:
>> %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl).  The empty
>> identifier as a key works in some languages but it is obtuse and unclear.
>> I'm sure there are many other data structures that would use location by
>> itself as a key.
>>
>> 3) Empty identifiers are difficult to specify on the command line, URLs,
>> etc. and non-obvious many other places such as GUI fields.  We have largely
>> addressed this issue for FDSN web services (at the DMC for other mechanisms
>> as well) by making "--" a synonym for the empty location ID.  In other words
>> we are already mapping "--" into the empty location ID for requests and
>> users are learning this association.  A further adoption of the synonym into
>> the metadata would solve many of these problems.
>>
>> 4) While it is certainly not the FDSN's task to define data formats outside
>> of its purview, the adoption or matching of the core channel naming fields
>> in other formats is certainly in the FDSN's best interest.  This has been
>> happening for a long time already (ISF/IASPEI, GSE, etc.).  The potentially
>> empty (optional?) location ID could make such adoption harder as it is an
>> wrinkle, especially for space delimited formats.  I believe these broader
>> implications deserve some consideration.
>>
>> I'm sure most developers could come up with solutions to the technical
>> problems, but an empty identifier leaves the unfortunate wrinkles for all
>> future users and coders.
>>
>> Here is an example of someone that was confused by current metadata, I'll
>> bet if there was a value in the locationCode it would have been easier:
>> https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
>>
>> There is a chance we will end up with the empty location identifier, but the
>> considerations should go beyond an assumption that an empty string is the
>> only choice.
>>
>> Since an empty location field in SEED essentially means unset, perhaps we
>> should consider making the locationCode attribute optional and leaving it
>> out of the XML when it is empty in SEED.  In this line of thinking, the
>> empty string is just a hack to include a required attribute when in fact
>> there is nothing to include.  For me the "unset" aspect is unsettlingly
>> similar to "unknown", but it's an idea preferred by at least one engineer at
>> the DMC.
>>
>> Chad
>>
>>
>> On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell at seis.sc.edu> wrote:
>>
>> Hi
>>
>> Being on the cheap side of the Atlantic, I'll save us $0.00068 and
>> make a stab at the underlying issue. :)
>>
>> Here, with lots of stuff cut out, is how a channel is "identified" in
>> stationXML via the fdsn station web service at the IRIS DMC,
>> http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
>>
>> <Network code="GE" >
>> <Station code="UGM">
>> <Channel locationCode="  " code="BHZ">
>>
>> Another implementation of the same web service (not sure of url) gives
>> back this:
>>
>> <Network code="GE" >
>> <Station code="UGM">
>> <Channel locationCode="" code="BHZ">
>>
>> with locationCode="" vs ="  " being the difference under consideration.
>>
>> There are two basic issues being discussed (and yes, more beer would help!
>> :)
>>
>> 1) Should all valid stationXML documents be required to use the exact
>> same string of characters to represent the location id for this
>> channel. This is would allow a comparison operation to be "simple" in
>> that it can compare the attribute values without additional
>> processing.
>>
>> 2) If we agree to 1), then what should those exact characters be? The
>> current top choices are
>>  a) empty=""
>>  b) two spaces="  "
>>  c) two dashes="--".
>>
>> 1) seems less controversial than 2) in that greater compatibility is
>> generally seen as positive.
>>
>> This is primarily a question about the form of the stationXML
>> documents, but obviously there are connections to the way requests are
>> formed, the relationship to miniseed/seed, the way things are coded in
>> software and how much detailed understanding we expect of end users.
>>
>> Philip
>>
>>
>>
>> On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax at free.fr> wrote:
>>
>> Hello all,
>>
>> Can someone give a concise statement of the original problem being
>> discussed, it only or primarily a concern about XML?
>>
>> It seems to me that with modern languages a string that is empty or has 1-N
>> spaces is the same thing - there are often implicit or explicit trim()
>> function hiding in a processing pipeline.  A null string is not the same.
>> So an empty or blank string is the same, valid location code, and null is
>> undefined or uninitialized location code.
>>
>> With regards to the "--" pseudo for the location code, is this not needed
>> because sometimes it is not possible or difficult to represent an empty
>> string or even a string?  For example on the command line or in a restful WS
>> URI?  (Or a URI on the command line!)  So it may be that the use of "--" for
>> intermediate processing and requests could be tolerated and somehow
>> official, while empty or only-blanks strings official and for persistent
>> data.
>>
>> Just my 0.02€ = $0.0268
>>
>> Best regards to all,
>>
>> Anthony
>>
>>
>>
>> On 27/07/2014 04:52, Chad Trabant wrote:
>>
>> Hi Marcelo,
>>
>> Thanks for your thoughts as well.  Something that you and Joachim are not
>> addressing are the concerns about an empty ID that have been brought up by
>> more than one person.  The answer that empty strings are technically
>> possible and it all works in Python/SeisComP is less than satisfying.  The
>> observations from Python, ObsPy and SeisComP are a few of many that need to
>> be taken into account.
>>
>> I agree that there is a long tail consideration for the "--" location ID
>> solution.  Understand that some folks find an empty ID to be problematic
>> regardless of whether it is XML, SEED, text, whatever, then you might see
>> where this proposal comes from.  Yes, we would need to treat empty location
>> IDs and "--" as synonyms for a very long time.  Empty strings in XML mean
>> you will need to map empty IDs to empty strings, NULL and whatever an XML
>> parser might or might not produce for a long time as well (think beyond
>> Python and SeisComP).  Either is possible, only one of them is a unique
>> mapping.
>>
>> If the main considerations are for the least amount of disruption the the
>> answer is obvious to me: the FDSN can sanction that the two-space string is
>> the XML synonym for the empty SEED location ID and we adjust the schema to
>> make sure a string of whitespaces is preserved.  Then SeisComP can change
>> its relatively new StationXML implementation and ALL existing clients will
>> be compatible with all metadata and, mostly importantly, we would have
>> consistent metadata.
>>
>> If the empty string ID representation is adopted it would would, in effect,
>> mean that the DMC would need to change its metadata service and (more
>> importantly) all users of the DMC's metadata service would need to
>> transition to a new metadata channel naming scheme.  This is certainly not
>> out of the question, but it is not something we would do without careful
>> consideration.  I do not find the two-space strings all that great, but they
>> are here and something the DMC and users of the DMC have dealt with.  Issues
>> have been identified with empty location IDs by us and our users.  If DMC is
>> going to change, and push the change on all users of the DMC's StationXML,
>> it would be much more compelling to have a solution that addresses the low
>> level issues.
>>
>> regards,
>> Chad
>>
>>
>> ----- Original Message -----
>> From: "Marcelo Bianchi" <m.tchelo at gmail.com>
>> To: "IRIS Web Services List" <webservices at iris.washington.edu>
>> Sent: Friday, July 25, 2014 7:38:17 PM
>> Subject: Re: [webservices] A question of location ID, how to represent empty
>> IDs in XML?
>>
>> Hi Philip and All,
>>
>> I totaly agree with Joachim, was planning to answer but he was much
>> faster. What you guys are proposing is not a solution. the station XML
>> supports nicely the empty string and it is not null. There is a type
>> difference here in Python and in any other language and can be nicely
>> handled internally.
>>
>> Also the location id is not just a string it is a key entry to link
>> miniseed to metadata and making an exception at this level just
>> because a user interface cannot proper render it without ambiguity
>> does not sounds like a proper way proposal.  I am not favorable in
>> creating an exception that will have to be carried over along the
>> decades to come. Alternatives solutions for this issue should be
>> searched on the end user interface.
>>
>> with my best regards,
>>
>> Marcelo Bianchi
>> --
>>
>>
>> 2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell at seis.sc.edu>:
>>
>> It sounds like you are saying "change is hard, so we shouldn't do it".
>> I would argue that change is hard and so if we don't do it now it will
>> never happen. StationXML is new enough that there is already a
>> disruption, we should seize the chance. If we do not do something now
>> about null loc ids, it will be a decade or two before we get another
>> chance.
>>
>> It is time to drive the stake through the heart of null location ids.
>> Kill the evil while we have a chance.
>>
>> Philip
>>
>>
>> On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>>
>> Hello Rob,
>>
>> Rob Newman wrote on 24.07.2014 18:51:
>>
>> For what it's worth, I would also vote for the "--" standard. To quote
>> from the Zen of Python
>> <http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html>
>> (my language of choice):
>>
>>
>> "Beautiful is better than ugly.
>> Explicit is better than implicit.
>> Simple is better than complex.
>> Complex is better than complicated.
>> Flat is better than nested.
>> Sparse is better than dense.
>> Readability counts.
>> Special cases aren't special enough to break the rules.
>> Although practicality beats purity.
>> Errors should never pass silently.
>> Unless explicitly silenced."
>>
>> I'd add "Compatible is better than incompatible." :)
>>
>>
>> Number 2 is especially relevant here:
>> "Explicit is better than implicit."
>>
>> My favorite would be:
>>
>> "Special cases aren't special enough to break the rules."
>>
>> Quoted whitespace and nulls are painful. Code what you mean, and mean what
>> you code. It's easier for everyone.
>>
>> But what if we simply *mean* "empty string"?
>>
>> The issue is not about beauty, pain or ease. It's about standard
>> conformance. We already have a channel naming standard. If a new data format
>> cannot accommodate existing channel naming, then the new format is flawed.
>> But that's not even the case here...
>>
>> An XML document that contains
>>
>> <Channel locationCode="" ...
>>
>> is not malformed. There's an attribute that *explicitly* contains an empty
>> string and a parser has to produce it as such. Not as null, nil or none, but
>> as an empty string. Otherwise the parser is broken and needs to be fixed,
>> not the data!
>>
>> Again: It's not about beauty. We all agree that current channel naming is
>> not particularly beautiful and has limitations. But our business is not to
>> try to solve that issue now and here.
>>
>> Cheers
>> Joachim
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>>
>> --
>> Sent from my iClayTablet
>>
>> ________________________________
>>
>>   Anthony Lomax
>>   161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
>>   tel: +33 (0)4 93 75 25 02    e-mail: anthony at alomax.net    web:
>> http://www.alomax.net
>>
>>   Twitter: @ALomaxNet
>>   Science & Special Topics: http://www.alomax.net/science
>>   Software: http://www.alomax.net/software - updates:
>> https://twitter.com/ALomaxNet
>> ________________________________
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>>
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>



More information about the webservices mailing list