[webservices] A question of location ID, how to represent empty IDs in XML?
Philip Crotwell
crotwell at seis.sc.edu
Mon Jul 28 06:53:09 PDT 2014
One more thing is that this is not something that we can resolve based
on the XML spec as all three variations are well-formed and can be
valid XML depending on the schema.
There is another issue in that white space in xml attributes can be
normalized by the parsers, but this behavior is not standard across
all parsers, so dealing with attributes that are not limited to
non-whitespace characters means that you likely have to consider
empth, one space and two spaces, and even N spaces as all being
equivalent. Depending on the parser, you may be able to have this
handled for you, or you may have to code explicitly for the cases.
I think per the xml spec, even these two are considered "the same" as well:
locationCode="
"
locationCode="
"
as newlines in attributes can be normalized to whilespace on parsing.
But again, exactly how it is done depends on the parser.
Philip
PS I am NOT advocating we choose newline-newline as the default
location id!!! :)
On Mon, Jul 28, 2014 at 9:37 AM, Philip Crotwell <crotwell at seis.sc.edu> wrote:
> Hi
>
> Being on the cheap side of the Atlantic, I'll save us $0.00068 and
> make a stab at the underlying issue. :)
>
> Here, with lots of stuff cut out, is how a channel is "identified" in
> stationXML via the fdsn station web service at the IRIS DMC,
> http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode=" " code="BHZ">
>
> Another implementation of the same web service (not sure of url) gives
> back this:
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="" code="BHZ">
>
> with locationCode="" vs =" " being the difference under consideration.
>
> There are two basic issues being discussed (and yes, more beer would help! :)
>
> 1) Should all valid stationXML documents be required to use the exact
> same string of characters to represent the location id for this
> channel. This is would allow a comparison operation to be "simple" in
> that it can compare the attribute values without additional
> processing.
>
> 2) If we agree to 1), then what should those exact characters be? The
> current top choices are
> a) empty=""
> b) two spaces=" "
> c) two dashes="--".
>
> 1) seems less controversial than 2) in that greater compatibility is
> generally seen as positive.
>
> This is primarily a question about the form of the stationXML
> documents, but obviously there are connections to the way requests are
> formed, the relationship to miniseed/seed, the way things are coded in
> software and how much detailed understanding we expect of end users.
>
> Philip
>
>
>
> On Mon, Jul 28, 2014 at 3:59 AM, Anthony Lomax <alomax at free.fr> wrote:
>> Hello all,
>>
>> Can someone give a concise statement of the original problem being
>> discussed, it only or primarily a concern about XML?
>>
>> It seems to me that with modern languages a string that is empty or has 1-N
>> spaces is the same thing - there are often implicit or explicit trim()
>> function hiding in a processing pipeline. A null string is not the same.
>> So an empty or blank string is the same, valid location code, and null is
>> undefined or uninitialized location code.
>>
>> With regards to the "--" pseudo for the location code, is this not needed
>> because sometimes it is not possible or difficult to represent an empty
>> string or even a string? For example on the command line or in a restful WS
>> URI? (Or a URI on the command line!) So it may be that the use of "--" for
>> intermediate processing and requests could be tolerated and somehow
>> official, while empty or only-blanks strings official and for persistent
>> data.
>>
>> Just my 0.02€ = $0.0268
>>
>> Best regards to all,
>>
>> Anthony
>>
>>
>>
>> On 27/07/2014 04:52, Chad Trabant wrote:
>>
>> Hi Marcelo,
>>
>> Thanks for your thoughts as well. Something that you and Joachim are not
>> addressing are the concerns about an empty ID that have been brought up by
>> more than one person. The answer that empty strings are technically
>> possible and it all works in Python/SeisComP is less than satisfying. The
>> observations from Python, ObsPy and SeisComP are a few of many that need to
>> be taken into account.
>>
>> I agree that there is a long tail consideration for the "--" location ID
>> solution. Understand that some folks find an empty ID to be problematic
>> regardless of whether it is XML, SEED, text, whatever, then you might see
>> where this proposal comes from. Yes, we would need to treat empty location
>> IDs and "--" as synonyms for a very long time. Empty strings in XML mean
>> you will need to map empty IDs to empty strings, NULL and whatever an XML
>> parser might or might not produce for a long time as well (think beyond
>> Python and SeisComP). Either is possible, only one of them is a unique
>> mapping.
>>
>> If the main considerations are for the least amount of disruption the the
>> answer is obvious to me: the FDSN can sanction that the two-space string is
>> the XML synonym for the empty SEED location ID and we adjust the schema to
>> make sure a string of whitespaces is preserved. Then SeisComP can change
>> its relatively new StationXML implementation and ALL existing clients will
>> be compatible with all metadata and, mostly importantly, we would have
>> consistent metadata.
>>
>> If the empty string ID representation is adopted it would would, in effect,
>> mean that the DMC would need to change its metadata service and (more
>> importantly) all users of the DMC's metadata service would need to
>> transition to a new metadata channel naming scheme. This is certainly not
>> out of the question, but it is not something we would do without careful
>> consideration. I do not find the two-space strings all that great, but they
>> are here and something the DMC and users of the DMC have dealt with. Issues
>> have been identified with empty location IDs by us and our users. If DMC is
>> going to change, and push the change on all users of the DMC's StationXML,
>> it would be much more compelling to have a solution that addresses the low
>> level issues.
>>
>> regards,
>> Chad
>>
>>
>> ----- Original Message -----
>> From: "Marcelo Bianchi" <m.tchelo at gmail.com>
>> To: "IRIS Web Services List" <webservices at iris.washington.edu>
>> Sent: Friday, July 25, 2014 7:38:17 PM
>> Subject: Re: [webservices] A question of location ID, how to represent empty
>> IDs in XML?
>>
>> Hi Philip and All,
>>
>> I totaly agree with Joachim, was planning to answer but he was much
>> faster. What you guys are proposing is not a solution. the station XML
>> supports nicely the empty string and it is not null. There is a type
>> difference here in Python and in any other language and can be nicely
>> handled internally.
>>
>> Also the location id is not just a string it is a key entry to link
>> miniseed to metadata and making an exception at this level just
>> because a user interface cannot proper render it without ambiguity
>> does not sounds like a proper way proposal. I am not favorable in
>> creating an exception that will have to be carried over along the
>> decades to come. Alternatives solutions for this issue should be
>> searched on the end user interface.
>>
>> with my best regards,
>>
>> Marcelo Bianchi
>> --
>>
>>
>> 2014-07-25 10:35 GMT-03:00 Philip Crotwell <crotwell at seis.sc.edu>:
>>
>> It sounds like you are saying "change is hard, so we shouldn't do it".
>> I would argue that change is hard and so if we don't do it now it will
>> never happen. StationXML is new enough that there is already a
>> disruption, we should seize the chance. If we do not do something now
>> about null loc ids, it will be a decade or two before we get another
>> chance.
>>
>> It is time to drive the stake through the heart of null location ids.
>> Kill the evil while we have a chance.
>>
>> Philip
>>
>>
>> On Fri, Jul 25, 2014 at 9:26 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>>
>> Hello Rob,
>>
>> Rob Newman wrote on 24.07.2014 18:51:
>>
>> For what it's worth, I would also vote for the "--" standard. To quote
>> from the Zen of Python
>> <http://python.net/%7Egoodger/projects/pycon/2007/idiomatic/handout.html>
>> (my language of choice):
>>
>>
>> "Beautiful is better than ugly.
>> Explicit is better than implicit.
>> Simple is better than complex.
>> Complex is better than complicated.
>> Flat is better than nested.
>> Sparse is better than dense.
>> Readability counts.
>> Special cases aren't special enough to break the rules.
>> Although practicality beats purity.
>> Errors should never pass silently.
>> Unless explicitly silenced."
>>
>> I'd add "Compatible is better than incompatible." :)
>>
>>
>> Number 2 is especially relevant here:
>> "Explicit is better than implicit."
>>
>> My favorite would be:
>>
>> "Special cases aren't special enough to break the rules."
>>
>> Quoted whitespace and nulls are painful. Code what you mean, and mean what
>> you code. It's easier for everyone.
>>
>> But what if we simply *mean* "empty string"?
>>
>> The issue is not about beauty, pain or ease. It's about standard
>> conformance. We already have a channel naming standard. If a new data format
>> cannot accommodate existing channel naming, then the new format is flawed.
>> But that's not even the case here...
>>
>> An XML document that contains
>>
>> <Channel locationCode="" ...
>>
>> is not malformed. There's an attribute that *explicitly* contains an empty
>> string and a parser has to produce it as such. Not as null, nil or none, but
>> as an empty string. Otherwise the parser is broken and needs to be fixed,
>> not the data!
>>
>> Again: It's not about beauty. We all agree that current channel naming is
>> not particularly beautiful and has limitations. But our business is not to
>> try to solve that issue now and here.
>>
>> Cheers
>> Joachim
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
>>
>> --
>> Sent from my iClayTablet
>>
>> ________________________________
>>
>> Anthony Lomax
>> 161 Allée du Micocoulier, 06370 Mouans-Sartoux, France
>> tel: +33 (0)4 93 75 25 02 e-mail: anthony at alomax.net web:
>> http://www.alomax.net
>>
>> Twitter: @ALomaxNet
>> Science & Special Topics: http://www.alomax.net/science
>> Software: http://www.alomax.net/software - updates:
>> https://twitter.com/ALomaxNet
>> ________________________________
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>>
More information about the webservices
mailing list