[webservices] A question of location ID, how to represent empty IDs in XML?

Mon Jul 28 09:24:36 PDT 2014

Hi Philip

Philip Crotwell [07/28/2014 03:37 PM]:
> Being on the cheap side of the Atlantic, I'll save us $0.00068 and
> make a stab at the underlying issue.:)
>
> Here, with lots of stuff cut out, is how a channel is "identified" in
> stationXML via the fdsn station web service at the IRIS DMC,
> http://service.iris.edu/fdsnws/station/1/query?net=GE&sta=UGM&cha=BHZ&level=channel&format=xml&nodata=404
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="  " code="BHZ">
>
> Another implementation of the same web service (not sure of url) gives
> back this:
>
> <Network code="GE" >
> <Station code="UGM">
> <Channel locationCode="" code="BHZ">
>
> with locationCode="" vs ="  " being the difference under consideration.

Exactly. Good that you provided this as example because we were already 
getting lost so deep within the details that we may have forgotten that 
this thread has just moved to this list and that it might not have been 
clear to everybody what the issue actually is...

Even few lines of XML can (sometimes) help make things clearer. ;)

> There are two basic issues being discussed (and yes, more beer would help!:)
>
> 1) Should all valid stationXML documents be required to use the exact
> same string of characters to represent the location id for this
> channel. This is would allow a comparison operation to be "simple" in
> that it can compare the attribute values without additional
> processing.

This would be ideal, but I think it is not realistic:

If "--" were introduced, it would be impossible not to keep supporting " 
  " and "" practically forever in order to maintain backward compatibility.

If "" were to become the preferred empty location code, we still have 
probably billions of instances of "  " out in the wild that should not 
be declared invalid.

The same is true for "  " resp. "".

In short some mapping is required anyway. Fortunately the mapping 
between "" and "  " is trivial by using methods like trim(), strip() or 
so (depending on the language). Most seismic data handling software 
already does it anyway because it's so obvious. For XML it's at least 
ObsPy and SeisComP. SEED readers that trim the location code include 
rdseed, libmseed and qlib2. All database engines provide a trim() 
method, so database queries are not a problem either.

if trim(loc1) == trim(loc2) ...

may be slightly more expensive in terms of CPU cycles than

if loc1 == loc2 ...

but I presume that this is nowhere a real issue. With the added benefit 
that the currently not strictly SEED compliant "  " location code is 
then within the valid range kind of automatically.

> 2) If we agree to 1), then what should those exact characters be? The
> current top choices are
>    a) empty=""
>    b) two spaces="  "
>    c) two dashes="--".
>
> 1) seems less controversial than 2) in that greater compatibility is
> generally seen as positive.

Compatibility is absolutely essential. This is probably the main reason 
why even after more than 10 years of discussion about new channel 
naming, there hasn't been any real progress AFAICS. And despite all 
shortcomings the current NSLC is really remarkable as it is accepted and 
used nearly everywhere. Don't put that at stake.

Thanks btw for your other comments about potential issues related to 
white space.

Cheers
Joachim