[webservices] A question of location ID, how to represent empty IDs in XML?
Chad Trabant
chad at iris.washington.edu
Wed Jul 30 21:57:34 PDT 2014
On Jul 28, 2014, at 4:51 AM, Joachim Saul <saul at gfz-potsdam.de> wrote:
> Hi Chad
>
> Chad Trabant wrote on 27.07.2014 04:52:
>> The answer that empty strings are technically possible and it all
>> works in Python/SeisComP is less than satisfying. The observations
>> from Python, ObsPy and SeisComP are a few of many that need to be
>> taken into account.
>
> Please name a few. Not abstract claims or hearsay. Point us to client
> code that cannot parse an empty location code; only then someone can
> take a closer look at the matter and quite possibly provide help.
OK, here are a few: IRIS-WS, IRIS Fetch scripts, irisFetch.m, JWEED and probably: SeisFile, EMERALD, Epicentral and all the other codes that users of the DMC have created to read the metadata we send them.
The statement that observations from Python, ObsPy and SeisComP alone are insufficient evidence for key changes to FDSN formats is not an abstract claim or hearsay, it is rather obvious since they are not the only (or even majority) systems handling these formats.
>> Yes, we would need to treat empty location IDs and "--" as synonyms
>> for a very long time. Empty strings in XML mean you will need to map
>> empty IDs to empty strings, NULL and whatever an XML parser might or
>> might not produce for a long time as well (think beyond Python and
>> SeisComP). Either is possible, only one of them is a unique
>> mapping.
>
> I don't accept the parser issues unless you provide examples; see above.
>
> In general mappings are not the problem and are widely used anyway. Can
> you name a single software that when reading (Mini)SEED does *not* map
> the location code from " " to ""? Even libmseed does!
The code that reads dataless SEED into the DMC's metadata tables. If you want two: the code that reads the values from the DMC's database and creates StationXML. But it doesn’t really matter.
Yes, collapsing the spaces is very common and in fact how SEED specifies that it be done, no one is arguing this that I have read.
> So why not be consistent and do the same when parsing XML? It would
> solve the current issues. You can then keep your two spaces as long as
> you like. ;)
Yes, it totally makes sense to keep the same thing going in XML, except that there have been some issues identified in both SEED and XML and this is an opportunity to begin addressing the low level issue. In essence, the empty string solution is not ideal, even if it is the most appropriate mapping given the current rules. More on this later.
>> If the main considerations are for the least amount of disruption the
>> the answer is obvious to me: the FDSN can sanction that the two-space
>> string is the XML synonym for the empty SEED location ID and we
>> adjust the schema to make sure a string of whitespaces is preserved.
>> Then SeisComP can change its relatively new StationXML implementation
>> and ALL existing clients will be compatible with all metadata and,
>> mostly importantly, we would have consistent metadata.
>
> Chad, this whole discussion started back in early January with your
> complaint about the SeisComP fdsnws server implementation. You were
> alleging that 'The resulting StationXML includes empty location IDs
> (locationCode=“”), this is not allowed in SEED and therefore not allowed
> in StationXML.'
And I have since written that my thoughts have changed, that indeed the location code in SEED does not contain spaces or is required to be two characters.
The point I was making is that the least number of users would be effected if the FDSN decided to require two characters and allow spaces. I say this because I believe most of the users of StationXML get their metadata from the DMC at the moment and have already dealt with the metadata in some way.
> If the SeisComP server were indeed producing wrong XML
> it would have been corrected long ago. But that's not the case! It's
> actually SeisComP that produces the more correct FDSN StationXML
> compared to IRIS XML, not only w.r.t. locationCode.
This statement is heavy on hubris and naivety.
There in no easy way to determine if any given StationXML document is fully "correct". The schema does not have enough information to vet the contents of a StationXML document, it basically checks to make sure the layout is correct, so XML schema validity is not sufficient for "correct". Currently, the StationXML contents are supposed to follow the guidelines defined in SEED. I think many of us agree that we should work to put as many of the content rules as possible into future versions of the schema to clarify many of the gray areas of StationXML. The concept of "more correct" is qualitative when used generally and is rarely or never more important than "compatible with the consensus”.
Such gray areas exist even within SEED. Within the FDSN here is how we have traditionally dealt with the gray areas: when implementing a piece of software to produce something already in production at another center(s) you usually use the other(s) as a reference (or collaborate with then). If important differences are found they are brought up and discussed civilly and a plan is made to make things compatible, usually with user impact being a high priority. Unfortunately, this is not how this current situation unfolded and we are left with incompatible metadata.
> Don't you think it is now time to roll up the sleeves and make your
> client codes work with standard compliant FDSN StationXML rather than
> doctoring an FDSN standard?
You do not unilaterally decide what compliant FDSN StationXML is. As you well know I have made a proposal to the FDSN and asked for clarity on this issue, seems worth knowing where we are going.
>> If the empty string ID representation is adopted it would would, in
>> effect, mean that the DMC would need to change its metadata service
>> and (more importantly) all users of the DMC's metadata service would
>> need to transition to a new metadata channel naming scheme. This is
>> certainly not out of the question, but it is not something we would
>> do without careful consideration. I do not find the two-space
>> strings all that great, but they are here and something the DMC and
>> users of the DMC have dealt with. Issues have been identified with
>> empty location IDs by us and our users. If DMC is going to change,
>> and push the change on all users of the DMC's StationXML, it would be
>> much more compelling to have a solution that addresses the low level
>> issues.
>
> Did you read my email of Thursday, 18:43 UTC? Following the ideas I
> outlined there, you are technically *not* required to change any of your
> servers. Only a few client codes are actually affected and even I was
> able to make the changes in one of those in 10 minutes.
You miss the main issue: the metadata is incompatible, some servers must change. There are many more clients than there are servers, many clients written by users and out of our direct control. Requiring every client to know some post-parsing processing rules is a terrible idea, in fact it is an artifact of the same “anachronism” that you claim to dislike, bringing us back to SEED-like parsing.
> Of course, in
> total it will take longer, but if specific problematic cases related to
> parsing are identified and discussed, I am sure solutions can be found
> quickly. We have this list, we have skilled and enthusiastic people
> working on this, so why not use this as a platform even for more
> technical discussions? Or how about creating a "developer's corner"
> webservices-devel or so?
Thanks for the suggestions. Technical discussions in other sub-threads.
Chad
> Cheers
> Joachim
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
More information about the webservices
mailing list