[webservices] A question of location ID, how to represent empty IDs in XML?
Chad Trabant
chad at iris.washington.edu
Thu Jul 24 10:48:52 PDT 2014
On Jul 23, 2014, at 11:37 AM, Philip Crotwell <crotwell at seis.sc.edu> wrote:
> Hi
>
> Years ago we had full SEED. Then because of keeping metadata updated,
> we switched to a separation into dataless SEED + miniseed. Now,
> because of the complexities and limitations of dataless SEED, the
> future looks like StationXML + miniseed. I am all for this change, but
> how the location id is resolved really needs to address not just what
> do we do in StationXML, but what do we do in StationXML + miniseed.
>
> I also lean towards "--" for the simple reason that there are so many
> instances where I have been bitten by spaces or nulls. Even though I
> know about this, I still get caught. File names, urls, user gui
> displays, etc all have problems with spaces nor nulls and as a
> practical matter it is harder to see something that isn't there than
> something that is there. Furthermore, using null or space-space is
> really hard as a command line argument in the shell. That said, "--"
> already means "long option name" in many *nix programs, so if we were
> starting from scratch, underscores like "__" might be a better choice.
> The SEED manual already lists underscore as a separate item in the
> flags section (p32), so maybe worth considering.
Hi Philip, thanks for your thoughts.
The underscore character is certainly another option. What I do not like about it is low readability, in particular in URLs they can become completely lost.
> But if option 3 is choosen, would there be any possibility of amending
> the SEED spec so that "--" is actually valid within the location id
> field, with the caveat that it is synonymous with space-space/null,
> but "--" is the preferred value? I realize that doing a global search
> and replace on a petabyte of miniseed data is probably not going to
> happen, but it would be really nice if whatever location id is in
> StationXML, it is exactly 2 characters and is the exact same 2
> characters as in miniseed.
If the FDSN were to go the route of "--" in StationXML it seems natural to extend the conversation to potential changes in SEED headers and data records. That is just a bigger can of worms and would take more time to address. The idea the two should be treated synonymously is just what I have in mind and would allow us to transition over time.
> Frankly the whole idea of making location ids "optional" was a real
> mistake IMHO. I am sure that anyone that has every written code to
> deal with location ids has something that looks like:
> if (locid == null or locid == "" or locid == " " or locid == "--")
> then locid = "--"
> which is just a painfully stupid thing to have to do over and over and
> over again. Grumble grumble grumble. :(
>
> Lastly, as far as I can tell the SEED spec doesn't disallow null/empty
> station or channel codes, so addressing that at the same time might be
> wise.
Indeed, this should be clarified in the SEED spec.
Chad
> My $0.02, please pick one string, and only one string, and use it everywhere.
>
> thanks
> Philip
>
>
> On Wed, Jul 23, 2014 at 1:30 PM, Chad Trabant <chad at iris.washington.edu> wrote:
>>
>> Hello WS users and developers,
>>
>> A recent discussion between FDSN data centers is centered on representation of empty location IDs in StationXML, the default format returned by the fdsnws-station web service. The DMC may be changing how it represents location ID in XML and text formats based on these discussions. We are asking for input as any such change will effect users of our metadata service.
>>
>> Some background: In the SEED channel naming scheme there is a hierarchy of network, station, location and channel identifiers. Of these, it is only the location ID that is commonly accepted to be empty. In the SEED format the location ID is a two-character field, where the value is left justified and padded with spaces if needed. When the value is empty the field is simply two spaces of padding.
>>
>> Historically, and presumably to avoid having an empty location ID, the DMC has represented “empty” location IDs as a string of two spaces. Following this practice, we express this in StationXML by setting the locationCode attribute to a string of two spaces. We have done this so long we sometimes forget that it is not compliant with a strict reading of SEED, at best it falls into the vagaries of SEED, on the other hand we have been doing it for years with no apparent problems (in fact it has helpfully avoided an empty core identifier).
>>
>> There now exists another fdsnws-station implementation that returns StationXML with the locationCode attribute set to an empty string when the SEED value is empty. The justification is that this follows the SEED rules of trimming the padding spaces from the values.
>>
>> Unfortunately this means there are now flavors of StationXML that are incompatible in the core channel name identifiers. In other words, two StationXML documents for the same SEED channel appear, without extra field translation, to be different channels.
>>
>> As most of you are users of SEED and StationXML metadata (at some level) and some of you have written code to parse these formats and manage the data returned by the DMC and other FDSN data centers, we are asking for your input regarding the potential solutions.
>>
>> Here are the options being considered for mapping an empty location ID in SEED to StationXML:
>>
>> 1) Set locationCode to two spaces. While the DMC and users have been using this for a long while, it is not precisely the SEED value (but the mapping could be formalized). Also, whitespace in attributes does have some theoretical challenges: the wonky rules for XML attributes related to whitespace handling require removal of spaces in some cases (we have never heard of problems though).
>>
>> 2) Set locationCode to an empty string. This would match the strict value present in SEED, an empty identifier.
>>
>> 3) Set locationCode to “--“ (two dashes). This avoids issues with whitespace in XML attribute values and avoids issues with an empty identifier. Also, this matches the request mechanisms where “--“ is accepted as a synonym for an empty location ID.
>>
>> All of these solutions are viable in that we can make them work in code, it is a matter of choosing one for future FDSN metadata, pick your poison so to speak.
>>
>> In my personal opinion, an empty location ID is an unfortunate quirk of SEED that we should rectify in StationXML. An empty identifier can be confused for “unknown” if the programmer is not careful, which is semantically very different than “set to empty”. The two-space strings that the DMC is currently using are also not ideal, they are hard for humans to read and potentially weird with XML rules. The dashed location ID avoids these issues but requires the most change. I also think requiring all readers of StationXML to translate (e.g. remove padding) is a bad idea, the values in SEED should be uniquely mapped to values in StationXML.
>>
>> Thanks for reading this far. Your opinion and input is appreciated.
>>
>> regards,
>> Chad
>>
>>
>> _______________________________________________
>> webservices mailing list
>> webservices at iris.washington.edu
>> http://www.iris.washington.edu/mailman/listinfo/webservices
>
> _______________________________________________
> webservices mailing list
> webservices at iris.washington.edu
> http://www.iris.washington.edu/mailman/listinfo/webservices
More information about the webservices
mailing list