[webservices] A question of location ID, how to represent empty IDs in XML?
Joachim Saul
saul at gfz-potsdam.de
Thu Aug 7 08:44:07 PDT 2014
Hi Chad,
after a well-deserved creative break a little more feedback from Potsdam
on our favorite topic. :)
Chad Trabant wrote on 31.07.2014 10:37:
> On Jul 31, 2014, at 1:13 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>
>> Chad Trabant wrote on 31.07.2014 09:42:
>>> On Jul 31, 2014, at 12:33 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>>>>> Has anyone observed this automatic trimming on any system?
>>>>
>>>> No, and I agree that a generic parser should return a raw string as it is in the XML without implicit trimming, nullifying etc. To obtain a trimmed string, it's trivial to trim() the input strings as needed. That's what's done in ObsPy, too.
>>>>
>>>> In fact even the *already* empty string location codes from libmseed are trimmed again at ObsPy level, just in case. ;)
>>>
>>> HI Joachim,
>>>
>>> You keep coming back to this as if it is meaningful. libmseed and the parts of ObsPy getting information from libmseed are dealing with SEED data, where the current rules of parsing are clear. What is your point exactly?
>>
>> The point is that in libmseed you use a different empty location code naming than in StationXML. As I said a number of times, for me that's not a problem at all. Many clients can handle this and those that cannot can be modified easily. In particular, if you applied the same naming rules as in libmseed also in e.g. FetchData (by making a trivial change in the code) they would become consistent at a very low cost. It would be a benefit for the user.
>
> Hi Joachim,
>
> That is a strange transition from libmseed to web service clients that I do not understand.
In libmseed you treat the two spaces differently than some web service
client code. While in libmseed you trim the spaces, resulting in an
empty string, in web service clients (like FetchData) you keep the
spaces. By simply trimming them there, too, and then matching against an
empty string, you would not only maintain consistency in your
interpretation of waveform and meta data, but also be more "accepting"
in what your clients are able to process. In particular this would
enable your clients to parse strictly SEED compliant empty location
codes, which currently is not possible.
> You appear fixated on updating the clients,
Yes, absolutely! Because that's where the current problem can be fixed
most easily.
> but as I have said many times that, by itself, will not solve the actual problem;
That depends on what you consider as "the actual problem". If it is
empty location codes, I would not view that as a problem at all.
Cosmetics at worst, but it is as it is and we can live with it. Can't you?
> the metadata remains inconsistent and at any rate we do not control many of the most popular parsers of this information such user-created programs.
Once the issue is clarified, the clients will naturally be adopted to
the specification. The inconsistency is currently still at a
low/manageable level. In particular, there is absolutely nowhere an
inconsistency with (Mini)SEED headers, it's *currently* *only* a
relatively minor inconsistency at XML level that is not too big to be
handled. Besides the standard conformance this is IMHO the main
advantage of "" compared to "--".
>>> Why do you think the existing metadata and decades of SEED would need to be changed? Please explain.
>>
>> Because otherwise a mapping would be required "forever". Until at least very recently you were strongly against any mapping, even calling the idea "rubbish" at one point
>
> Here is what you said about mapping:
>
> On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
>> In general mappings are not the problem and are widely used anyway.
Yes, of course mappings are not a problem, especially not from a
technical point of view. But it also depends on the kind of mapping.
BTW, I stated the above in the context of the mapping "" <-> " ", which
is very easy using trim() et al. and in particular does not require any
change to the *current* channel naming conventions. And which is also
why I wrote "widely used". Technically a mapping to/from "--" would be
quite different, because the range of values that need to be tested
against e.g. in a simple comparison makes this more complicated. In
practice, of course, one can implement this once as a library function
or by creating a location code class and overloading the == operator.
This is still considerably more work than just calling trim().
With a mapping to/from "--" we also have the "forever" issue. With ""
vs. " " this is not an issue at all, especially in view of the existing
SEED headers. That's a big difference.
> So what is the problem with mapping?
Because as already said, the mapping would be required "forever" due to
persistence of the data. In particular, you cannot declare existing
metadata invalid. Hence you would have to keep supporting "" and " ",
too, to maintain backward compatibility.
> I was certainly not against mapping to/from "--", after all it was my proposal! You have taken words of my out context. Please stick to the technical issues and leave your personal indignation off of this mailing list.
This is already a very technical discussion and where you detect
"personal indignation" is left as an exercise to the reader.
Here is the context: "StationXML is the new dataless SEED, as such it
should be compatible between data centers for at least the core
parameters. Currently StationXML produced by SeisComP3 and other data
centers for the same exact same channel can be documents that are
semantically different channels (NSLC do not match). We would not do
this with dataless SEED, right? Any notion that a reader of the XML
must apply rules to the core name values is rubbish, no transformation
should be needed at that point. These are documents that are being
stored as files, loaded into databases, and otherwise saved."
In my interpretation this is a statement against any mapping.
But I recognize we are all in a learning curve and opinions evolve and
sometimes change. Plus we are having two discussions on the list and
off-list, each with several sub-threads. This probably creates
additional confusion and doesn't quite help to focus on what the real
issues are *currently*. Channel naming can be discussed and should,
actually has been many times before, but can we not focus on what needs
to be solved in very short time without introducing additional
incompatibilities?
All frustration about ugly empty location codes aside, I maintain that
there are technically rather no issues with them. Nothing that cannot be
solved quickly with rather few modifications plus a clarification in the
FDSN StationXML specification. In fact I already proposed a clear
timeline you might want to comment on. What follows is a quote from my
email of July-24, 18:43 UTC to this list.
----------------------------------------------------------------------
Actually we are currently seeking to solve a particular incompatibility
between FDSN StationXML produced by different services, but technically
that is much, *much* easier to achieve than the introduction of a new
and incompatible channel naming. I would welcome an intensified
discussion on the latter, but not in the context of the current FDSN
StationXML or web services.
It's actually quite strange that already now, early after the
introduction of FDSN StationXML, we are not only choking over minor
incompatibilities, but are discussing "solutions" to problems that
apparently noone had noticed they existed before StationXML... Looks
like shooting at sparrows with cannons, IMO.
There used to be a IASPEI working group on station codes that even came
up with a new channel naming "standard"[*], which, however, doesn't seem
to have gained much acceptance so far. Nevertheless this is the level at
which changes to channel naming need to be discussed, even though the
process may be frustratingly slow. But the impact of such a change is
just too big to be decided ad hoc.
To summarize:
We will not find a future-proof channel naming convention quickly.
Partial changes, especially if incompatible, should be absolutely avoided.
The particular problem we attempted (and still need) to solve in the
first place is a location code incompatibility due to differently strict
adherence to the SEED specification. Not surprisingly I prefer the
empty-string representation for the empty location code. To be
pragmatic, I propose the following time line:
* Accept that at least for a transitional period we have to accept the
existence of space-space and empty location codes.
* During a transitional period, don't change the servers that now
produce space-space location codes, as that would break compatibility
with some clients. We want to keep compatibility rather than introducing
new incompatibility.
* Instead update the clients to accept both space-space and empty
location codes by trimming trailing spaces if present. This is a
relatively minor change and IIRC this is on IRIS's agenda already, which
is highly appreciated.
At this point in time, interoperability is restored, even without
server-side changes. This is important as it may take quite some time
for the users to actually upgrade their clients; but it doesn't hurt anyone.
* Finally the server upgrades where needed. The decision as to when to
upgrade the server side can be made once it is considered appropriate;
there is absolutely no hurry from the client side.
The needed changes for the above proposal are very small compared to the
huge changes that would be required at every level to implement a new
channel naming convention. This may (and hopefully will) take place some
time in the future, but it requires a lot of preparation and
coordination. I am pretty sure that we will have a considerable number
of beers in the meantime.
Besides the beers, we should focus on finalizing the specification of
FDSN StationXML. There are too many under-defined elements even in the
xsd and the risk of serious incompatibilities is very high.
Cheers
Joachim
[*] http://www.isc.ac.uk/registries/download/IR_implementation.pdf
More information about the webservices
mailing list