[webservices] A question of location ID, how to represent empty IDs in XML?

Thu Aug 7 08:44:07 PDT 2014

Hi Chad,

after a well-deserved creative break a little more feedback from Potsdam 
on our favorite topic. :)

Chad Trabant wrote on 31.07.2014 10:37:
> On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>
>> Chad Trabant wrote on 31.07.2014 09:42:
>>> On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>>>>> Has anyone observed this automatic trimming on any system?
>>>>
>>>> No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
>>>>
>>>> In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
>>>
>>> HI Joachim,
>>>
>>> You keep coming back to this as if it is meaningful.  libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear.  What is your point exactly?
>>
>> The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
>
> Hi Joachim,
>
> That is a strange transition from libmseed to web service clients that I do not understand.

In libmseed you treat the two spaces differently than some web service 
client code. While in libmseed you trim the spaces, resulting in an 
empty string, in web service clients (like FetchData) you keep the 
spaces. By simply trimming them there, too, and then matching against an 
empty string, you would not only maintain consistency in your 
interpretation of waveform and meta data, but also be more "accepting" 
in what your clients are able to process. In particular this would 
enable your clients to parse strictly SEED compliant empty location 
codes, which currently is not possible.

> You appear fixated on updating the clients,

Yes, absolutely! Because that's where the current problem can be fixed 
most easily.

> but as I have said many times that, by itself, will not solve the actual problem;

That depends on what you consider as "the actual problem". If it is 
empty location codes, I would not view that as a problem at all. 
Cosmetics at worst, but it is as it is and we can live with it. Can't you?

> the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.

Once the issue is clarified, the clients will naturally be adopted to 
the specification. The inconsistency is currently still at a 
low/manageable level. In particular, there is absolutely nowhere an 
inconsistency with (Mini)SEED headers, it's *currently* *only* a 
relatively minor inconsistency at XML level that is not too big to be 
handled. Besides the standard conformance this is IMHO the main 
advantage of "" compared to "--".

>>> Why do you think the existing metadata and decades of SEED would need to be changed?  Please explain.
>>
>> Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
>
> Here is what you said about mapping:
>
> On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>> In general mappings are not the problem and are widely used anyway.

Yes, of course mappings are not a problem, especially not from a 
technical point of view. But it also depends on the kind of mapping.

BTW, I stated the above in the context of the mapping "" <-> "  ", which 
is very easy using trim() et al. and in particular does not require any 
change to the *current* channel naming conventions. And which is also 
why I wrote "widely used". Technically a mapping to/from "--" would be 
quite different, because the range of values that need to be tested 
against e.g. in a simple comparison makes this more complicated. In 
practice, of course, one can implement this once as a library function 
or by creating a location code class and overloading the == operator. 
This is still considerably more work than just calling trim().

With a mapping to/from "--" we also have the "forever" issue. With "" 
vs. "  " this is not an issue at all, especially in view of the existing 
SEED headers. That's a big difference.

> So what is the problem with mapping?

Because as already said, the mapping would be required "forever" due to 
persistence of the data. In particular, you cannot declare existing 
metadata invalid. Hence you would have to keep supporting "" and "  ", 
too, to maintain backward compatibility.

> I was certainly not against mapping to/from "--", after all it was my proposal!  You have taken words of my out context.  Please stick to the technical issues and leave your personal indignation off of this mailing list.

This is already a very technical discussion and where you detect 
"personal indignation" is left as an exercise to the reader.

Here is the context: "StationXML is the new dataless SEED, as such it 
should be compatible between data centers for at least the core 
parameters.  Currently StationXML produced by SeisComP3 and other data 
centers for the same exact same channel can be documents that are 
semantically different channels (NSLC do not match).  We would not do 
this with dataless SEED, right?  Any notion that a reader of the XML 
must apply rules to the core name values is rubbish, no transformation 
should be needed at that point.  These are documents that are being 
stored as files, loaded into databases, and otherwise saved."

In my interpretation this is a statement against any mapping.

But I recognize we are all in a learning curve and opinions evolve and 
sometimes change. Plus we are having two discussions on the list and 
off-list, each with several sub-threads. This probably creates 
additional confusion and doesn't quite help to focus on what the real 
issues are *currently*. Channel naming can be discussed and should, 
actually has been many times before, but can we not focus on what needs 
to be solved in very short time without introducing additional 
incompatibilities?

All frustration about ugly empty location codes aside, I maintain that 
there are technically rather no issues with them. Nothing that cannot be 
solved quickly with rather few modifications plus a clarification in the 
FDSN StationXML specification. In fact I already proposed a clear 
timeline you might want to comment on. What follows is a quote from my 
email of July-24, 18:43 UTC to this list.

----------------------------------------------------------------------

Actually we are currently seeking to solve a particular incompatibility 
between FDSN StationXML produced by different services, but technically 
that is much, *much* easier to achieve than the introduction of a new 
and incompatible channel naming. I would welcome an intensified 
discussion on the latter, but not in the context of the current FDSN 
StationXML or web services.

It's actually quite strange that already now, early after the 
introduction of FDSN StationXML, we are not only choking over minor 
incompatibilities, but are discussing "solutions" to problems that 
apparently noone had noticed they existed before StationXML... Looks 
like shooting at sparrows with cannons, IMO.

There used to be a IASPEI working group on station codes that even came 
up with a new channel naming "standard"[*], which, however, doesn't seem 
to have gained much acceptance so far. Nevertheless this is the level at 
which changes to channel naming need to be discussed, even though the 
process may be frustratingly slow. But the impact of such a change is 
just too big to be decided ad hoc.

To summarize:

We will not find a future-proof channel naming convention quickly. 
Partial changes, especially if incompatible, should be absolutely avoided.

The particular problem we attempted (and still need) to solve in the 
first place is a location code incompatibility due to differently strict 
adherence to the SEED specification. Not surprisingly I prefer the 
empty-string representation for the empty location code. To be 
pragmatic, I propose the following time line:

* Accept that at least for a transitional period we have to accept the 
existence of space-space and empty location codes.

* During a transitional period, don't change the servers that now 
produce space-space location codes, as that would break compatibility 
with some clients. We want to keep compatibility rather than introducing 
new incompatibility.

* Instead update the clients to accept both space-space and empty 
location codes by trimming trailing spaces if present. This is a 
relatively minor change and IIRC this is on IRIS's agenda already, which 
is highly appreciated.

At this point in time, interoperability is restored, even without 
server-side changes. This is important as it may take quite some time 
for the users to actually upgrade their clients; but it doesn't hurt anyone.

* Finally the server upgrades where needed. The decision as to when to 
upgrade the server side can be made once it is considered appropriate; 
there is absolutely no hurry from the client side.

The needed changes for the above proposal are very small compared to the 
huge changes that would be required at every level to implement a new 
channel naming convention. This may (and hopefully will) take place some 
time in the future, but it requires a lot of preparation and 
coordination. I am pretty sure that we will have a considerable number 
of beers in the meantime.

Besides the beers, we should focus on finalizing the specification of 
FDSN StationXML. There are too many under-defined elements even in the 
xsd and the risk of serious incompatibilities is very high.

Cheers
Joachim

[*] http://www.isc.ac.uk/registries/download/IR_implementation.pdf