[webservices] A question of location ID, how to represent empty IDs in XML?
Joachim Saul
saul at gfz-potsdam.de
Mon Jul 28 09:24:36 PDT 2014
Hi Philip
Philip Crotwell [07/28/2014 03:37 PM]:
> Being on the cheap side of the Atlantic, I'll save us $0.00068 and
> make a stab at the underlying issue.:)
>
> Here, with lots of stuff cut out, is how a channel is "identified" in
> stationXML via the fdsn station web service at the IRIS DMC,
> http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode=" " code="BHZ">
>
> Another implementation of the same web service (not sure of url) gives
> back this:
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="" code="BHZ">
>
> with locationCode="" vs =" " being the difference under consideration.
Exactly. Good that you provided this as example because we were already
getting lost so deep within the details that we may have forgotten that
this thread has just moved to this list and that it might not have been
clear to everybody what the issue actually is...
Even few lines of XML can (sometimes) help make things clearer. ;)
> There are two basic issues being discussed (and yes, more beer would help!:)
>
> 1) Should all valid stationXML documents be required to use the exact
> same string of characters to represent the location id for this
> channel. This is would allow a comparison operation to be "simple" in
> that it can compare the attribute values without additional
> processing.
This would be ideal, but I think it is not realistic:
If "--" were introduced, it would be impossible not to keep supporting "
" and "" practically forever in order to maintain backward compatibility.
If "" were to become the preferred empty location code, we still have
probably billions of instances of " " out in the wild that should not
be declared invalid.
The same is true for " " resp. "".
In short some mapping is required anyway. Fortunately the mapping
between "" and " " is trivial by using methods like trim(), strip() or
so (depending on the language). Most seismic data handling software
already does it anyway because it's so obvious. For XML it's at least
ObsPy and SeisComP. SEED readers that trim the location code include
rdseed, libmseed and qlib2. All database engines provide a trim()
method, so database queries are not a problem either.
if trim(loc1) == trim(loc2) ...
may be slightly more expensive in terms of CPU cycles than
if loc1 == loc2 ...
but I presume that this is nowhere a real issue. With the added benefit
that the currently not strictly SEED compliant " " location code is
then within the valid range kind of automatically.
> 2) If we agree to 1), then what should those exact characters be? The
> current top choices are
> a) empty=""
> b) two spaces=" "
> c) two dashes="--".
>
> 1) seems less controversial than 2) in that greater compatibility is
> generally seen as positive.
Compatibility is absolutely essential. This is probably the main reason
why even after more than 10 years of discussion about new channel
naming, there hasn't been any real progress AFAICS. And despite all
shortcomings the current NSLC is really remarkable as it is accepted and
used nearly everywhere. Don't put that at stake.
Thanks btw for your other comments about potential issues related to
white space.
Cheers
Joachim
More information about the webservices
mailing list