[webservices] A question of location ID, how to represent empty IDs in XML?

Philip Crotwell crotwell at seis.sc.edu
Thu Jul 31 04:18:39 PDT 2014


Hi

Just another data point, Earthworm, which is widely used by regional
networks globally, has long had the "dash dash is the same as space
space" convention. So dash dash is not something pulled out of thin
air, it is how at least I do things already.

And this shows that it is fairly common (if not technically correct)
for users to regard space-space as the location id instead of
regarding it as null with 2 spaces for padding. My guess is that very
few users are aware of this, and even as someone who has been writing
seismic software for a couple of decades I still think of the location
id as space-space, not null.

http://www.isti2.com/ew/PROGRAMMER/location_codes/EW_Loc_policy.txt

Philip


On Thu, Jul 31, 2014 at 6:18 AM, Chad Trabant <chad at iris.washington.edu> wrote:
>
> Thanks Philip, I think you have outlined the issues well.
>
> Regarding issue #1, I strongly feel that we need to choose one
> representation, the sooner we stop creating incompatible metadata the
> better.
>
> Regarding issue #2:
>
>  b) two spaces="  "
>
>
> This is what IRIS currently does, not strictly SEED but avoids empty
> identifiers.
>
>  c) two dashes="--".
>
>
> This would require work and continued mapping, the mapping is clear between
> SEED-based holdings and StationXML.  SEED headers and data records could
> also be considered,  but is a bigger can of worms.
>
>  a) empty=""
>
>
> This is possibly the most straight forward mapping of SEED information, but
> leaves us with an empty string identifier.
>
> Below are a few of the issues we note regarding empty identifiers
>
> 1) They are too similar to "unknown" (which results in potential ambiguity
> where channels are only differentiated by location ID):
>
>   a) In many languages an empty string evaluates to false; if, for example,
> when program is testing for and then extracting a value from an XML document
> parsed into a structure/object it could appear as if the value was not
> present.  Of course the coding in probably every language can be done to
> avoid such a false negative, but it is a pitfall that we would be asking all
> future users and coders to know about.
>
>   b) In XPath (the query language for XSLT), which is used to search or
> translate XML, the matching of a string attribute usually uses the string()
> function.  Specifying the string attribute to match when the attribute has a
> value is straightforward, when trying to match the empty string the query is
> for NOT string.  In the boolean functions of XPath "a string is true if and
> only if its length is non-zero"
> (http://www.w3.org/TR/xpath/#function-boolean).  So in XPath, hardly a
> fringe technology, an empty string is not just another kind of string but an
> anomoly.
>
>   c) In JavaScript the getAttribute() method returns the same value whether
> the attribute was an empty string or unspecified.  The method is no longer
> recommended but illustrates that such thinking is not limited to niche
> projects.
>
> 2) Organizing data in structures such as a nested hash is pretty common:
> %{net}{sta}{loc}{chan} = "some lvalue" (sorry for the Perl).  The empty
> identifier as a key works in some languages but it is obtuse and unclear.
> I'm sure there are many other data structures that would use location by
> itself as a key.
>
> 3) Empty identifiers are difficult to specify on the command line, URLs,
> etc. and non-obvious many other places such as GUI fields.  We have largely
> addressed this issue for FDSN web services (at the DMC for other mechanisms
> as well) by making "--" a synonym for the empty location ID.  In other words
> we are already mapping "--" into the empty location ID for requests and
> users are learning this association.  A further adoption of the synonym into
> the metadata would solve many of these problems.
>
> 4) While it is certainly not the FDSN's task to define data formats outside
> of its purview, the adoption or matching of the core channel naming fields
> in other formats is certainly in the FDSN's best interest.  This has been
> happening for a long time already (ISF/IASPEI, GSE, etc.).  The potentially
> empty (optional?) location ID could make such adoption harder as it is an
> wrinkle, especially for space delimited formats.  I believe these broader
> implications deserve some consideration.
>
> I'm sure most developers could come up with solutions to the technical
> problems, but an empty identifier leaves the unfortunate wrinkles for all
> future users and coders.
>
> Here is an example of someone that was confused by current metadata, I'll
> bet if there was a value in the locationCode it would have been easier:
> https://stackoverflow.com/questions/19348855/checking-for-empty-attributes-while-parsing-an-xml-file
>
> There is a chance we will end up with the empty location identifier, but the
> considerations should go beyond an assumption that an empty string is the
> only choice.
>
> Since an empty location field in SEED essentially means unset, perhaps we
> should consider making the locationCode attribute optional and leaving it
> out of the XML when it is empty in SEED.  In this line of thinking, the
> empty string is just a hack to include a required attribute when in fact
> there is nothing to include.  For me the "unset" aspect is unsettlingly
> similar to "unknown", but it's an idea preferred by at least one engineer at
> the DMC.
>
> Chad
>
>
> On Jul 28, 2014, at 6:37 AM, Philip Crotwell <crotwell at seis.sc.edu> wrote:
>
> Hi
>
> Being on the cheap side of the Atlantic, I'll save us $0.00068 and
> make a stab at the underlying issue. :)
>
> Here, with lots of stuff cut out, is how a channel is "identified" in
> stationXML via the fdsn station web service at the IRIS DMC,
> http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="  " code="BHZ">
>
> Another implementation of the same web service (not sure of url) gives
> back this:
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="" code="BHZ">
>
> with locationCode="" vs ="  " being the difference under consideration.
>
> There are two basic issues being discussed (and yes, more beer would help!
> :)
>
> 1) Should all valid stationXML documents be required to use the exact
> same string of characters to represent the location id for this
> channel. This is would allow a comparison operation to be "simple" in
> that it can compare the attribute values without additional
> processing.
>
> 2) If we agree to 1), then what should those exact characters be? The
> current top choices are
>  a) empty=""
>  b) two spaces="  "
>  c) two dashes="--".
>
> 1) seems less controversial than 2) in that greater compatibility is
> generally seen as positive.
>
> This is primarily a question about the form of the stationXML
> documents, but obviously there are connections to the way requests are
> formed, the relationship to miniseed/seed, the way things are coded in
> software and how much detailed understanding we expect of end users.
>
> Philip
>
>
>
> On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax at free.fr> wrote:
>
> Hello all,
>
> Can someone give a concise statement of the original problem being
> discussed, it only or primarily a concern about XML?
>
> It seems to me that with modern languages a string that is empty or has 1-N
> spaces is the same thing - there are often implicit or explicit trim()
> function hiding in a processing pipeline.  A null string is not the same.
> So an empty or blank string is the same, valid location code, and null is
> undefined or uninitialized location code.
>
> With regards to the "--" pseudo for the location code, is this not needed
> because sometimes it is not possible or difficult to represent an empty
> string or even a string?  For example on the command line or in a restful WS
> URI?  (Or a URI on the command line!)  So it may be that the use of "--" for
> intermediate processing and requests could be tolerated and somehow
> official, while empty or only-blanks strings official and for persistent
> data.
>
> Just my 0.02€ = $0.0268
>
> Best regards to all,
>
> Anthony
>
>
>
> On 27/07/2014 04:52, Chad Trabant wrote:
>
> Hi Marcelo,
>
> Thanks for your thoughts as well.  Something that you and Joachim are not
> addressing are the concerns about an empty ID that have been brought up by
> more than one person.  The answer that empty strings are technically
> possible and it all works in Python/SeisComP is less than satisfying.  The
> observations from Python, ObsPy and SeisComP are a few of many that need to
> be taken into account.
>
> I agree that there is a long tail consideration for the "--" location ID
> solution.  Understand that some folks find an empty ID to be problematic
> regardless of whether it is XML, SEED, text, whatever, then you might see
> where this proposal comes from.  Yes, we would need to treat empty location
> IDs and "--" as synonyms for a very long time.  Empty strings in XML mean
> you will need to map empty IDs to empty strings, NULL and whatever an XML
> parser might or might not produce for a long time as well (think beyond
> Python and SeisComP).  Either is possible, only one of them is a unique
> mapping.
>
> If the main considerations are for the least amount of disruption the the
> answer is obvious to me: the FDSN can sanction that the two-space string is
> the XML synonym for the empty SEED location ID and we adjust the schema to
> make sure a string of whitespaces is preserved.  Then SeisComP can change
> its relatively new StationXML implementation and ALL existing clients will
> be compatible with all metadata and, mostly importantly, we would have
> consistent metadata.
>
> If the empty string ID representation is adopted it would would, in effect,
> mean that the DMC would need to change its metadata service and (more
> importantly) all users of the DMC's metadata service would need to
> transition to a new metadata channel naming scheme.  This is certainly not
> out of the question, but it is not something we would do without careful
> consideration.  I do not find the two-space strings all that great, but they
> are here and something the DMC and users of the DMC have dealt with.  Issues
> have been identified with empty location IDs by us and our users.  If DMC is
> going to change, and push the change on all users of the DMC's StationXML,
> it would be much more compelling to have a solution that addresses the low
> level issues.
>
> regards,
> Chad
>
>
> ----- Original Message -----
> From: "Marcelo Bianchi" <m.tchelo at gmail.com>
> To: "IRIS Web Services List" <webservices at iris.washington.edu>
> Sent: Friday, July 25, 2014 7:38:17 PM
> Subject: Re: [webservices] A question of location ID, how to represent empty
> IDs in XML?
>
> Hi Philip and All,
>
> I totaly agree with Joachim, was planning to answer but he was much
> faster. What you guys are proposing is not a solution. the station XML
> supports nicely the empty string and it is not null. There is a type
> difference here in Python and in any other language and can be nicely
> handled internally.
>
> Also the location id is not just a string it is a key entry to link
> miniseed to metadata and making an exception at this level just
> because a user interface cannot proper render it without ambiguity
> does not sounds like a proper way proposal.  I am not favorable in
> creating an exception that will have to be carried over along the
> decades to come. Alternatives solutions for this issue should be
> searched on the end user interface.
>
> with my best regards,
>
> Marcelo Bianchi
> --
>
>
> 2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell at seis.sc.edu>:
>
> It sounds like you are saying "change is hard, so we shouldn't do it".
> I would argue that change is hard and so if we don't do it now it will
> never happen. StationXML is new enough that there is already a
> disruption, we should seize the chance. If we do not do something now
> about null loc ids, it will be a decade or two before we get another
> chance.
>
> It is time to drive the stake through the heart of null location ids.
> Kill the evil while we have a chance.
>
> Philip
>
>
> On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>
> Hello Rob,
>
> Rob Newman wrote on 24.07.2014 18:51:
>
> For what it's worth, I would also vote for the "--" standard. To quote
> from the Zen of Python
> <http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html>
> (my language of choice):
>
>
> "Beautiful is better than ugly.
> Explicit is better than implicit.
> Simple is better than complex.
> Complex is better than complicated.
> Flat is better than nested.
> Sparse is better than dense.
> Readability counts.
> Special cases aren't special enough to break the rules.
> Although practicality beats purity.
> Errors should never pass silently.
> Unless explicitly silenced."
>
> I'd add "Compatible is better than incompatible." :)
>
>
> Number 2 is especially relevant here:
> "Explicit is better than implicit."
>
> My favorite would be:
>
> "Special cases aren't special enough to break the rules."
>
> Quoted whitespace and nulls are painful. Code what you mean, and mean what
> you code. It's easier for everyone.
>
> But what if we simply *mean* "empty string"?
>
> The issue is not about beauty, pain or ease. It's about standard
> conformance. We already have a channel naming standard. If a new data format
> cannot accommodate existing channel naming, then the new format is flawed.
> But that's not even the case here...
>
> An XML document that contains
>
> <Channel locationCode="" ...
>
> is not malformed. There's an attribute that *explicitly* contains an empty
> string and a parser has to produce it as such. Not as null, nil or none, but
> as an empty string. Otherwise the parser is broken and needs to be fixed,
> not the data!
>
> Again: It's not about beauty. We all agree that current channel naming is
> not particularly beautiful and has limitations. But our business is not to
> try to solve that issue now and here.
>
> Cheers
> Joachim
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>
>
> --
> Sent from my iClayTablet
>
> ________________________________
>
>   Anthony Lomax
>   161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
>   tel: +33 (0)4 93 75 25 02    e-mail: anthony at alomax.net    web:
> http://www.alomax.net
>
>   Twitter: @ALomaxNet
>   Science & Special Topics: http://www.alomax.net/science
>   Software: http://www.alomax.net/software - updates:
> https://twitter.com/ALomaxNet
> ________________________________
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>
>
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
>



More information about the webservices mailing list