[webservices] A question of location ID, how to represent empty IDs in XML?

Wed Jul 30 23:49:09 PDT 2014

On Jul 28, 2014, at 6:54 AM, Lion Krischer <krischer at geophysik.uni-muenchen.de> wrote:

> Hi all,
> 
> leaving the greater issues aside: why not just force the location code to have a certain form with a regex in the schema?

Hi Lion,

We should definately add the rules to the schema, we just need to decide what they are!

> The following group will match any uppercase alphanumeric two letter code and two spaces:
> 
> ^([A-Z0-9]{2}|  )$
> 
> It matches “AA”, “00”, “10”, “A1”, “  “ , …
> but not “—“, “”, “-“, “a1”, ...
> 
> Everything else will get rejected. Then one can be sure that it is consistent everywhere (assuming people test their web services against the schema which is a good idea in any case). Similar regexes should also be defined for the network, station, and channel codes to assure compatibility with SEED. In general it would be a good idea to have the schema enforce as many things as possible and leave little to no room for interpretations.
> 
> 
> Now whether one uses two spaces, two dashes, an empty string or what not for an empty location code does not really matter. All are syntactically valid XML and thus any parser can be expected to be able to deal with them. Consistency is by far the most important thing in my opinion. So best choose one and force it with the schema. This will reduce errors and misinterpretations in the long run.
> 
> In terms of existing StationXML parsers I assume most are just stripping whitespaces from the location code and thus “”  and “  “ should already work resulting in minimal disruption in the users’ workflows.

Actually, this does not appear to be happening, in the parsers I’ve used the whitespaces are not stripped.  I have read through the XML specifications until my eyes were crossed to try and understand why this would be the case.  Then I wrote some test cases and observed no trimming, see test data and code below.  Perhaps this attribute is CDATA for some reason?  I think we are stuck with the fact that empty string and two spaces are different.

Has anyone observed this automatic trimming on any system?

> “--“ would require software to be updated and looks a little bit weird in my opinion and unsuspecting users might interpret it as an invalid location code.

Yes, it would require software changes, the question is would what we gain be worth it.  Maybe it looks a little weird, but it is already becoming synonymous in the minds of many because "--" is used for selecting the empty SEED location IDs.

Chad

PS.  here is my test data:

------- chan.xml
<FDSNStationXML schemaVersion="1.0">
 <Channel locationCode="  " startDate="2012-03-12T20:28:00" restrictedStatus="open" endDate="2599-12-31T23:59:59" code="BHZ">
 </Channel>
</FDSNStationXML>
-------

Here is a test with Python:
-------
from xml.etree import ElementTree

with open('chan.xml', 'rt') as f:
    tree = ElementTree.parse(f)

node = tree.find('./with_attributes')
print node.tag
for name, value in sorted(node.attrib.items()):
    print '  %-4s = "%s"' % (name, value)
-------

which produces:
-------
Channel
  code = "BHZ"
  endDate = "2599-12-31T23:59:59"
  locationCode = "  "
  restrictedStatus = "open"
  startDate = "2012-03-12T20:28:00"
-------

No trimming.

Here is a test with Perl:
-------
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;

my $file = 'chan.xml';

my $test_data = XMLin($file);
print Dumper($test_data);
-------

which produces:
-------
$VAR1 = {
          'schemaVersion' => '1.0',
          'Channel' => {
                       'locationCode' => '  ',
                       'endDate' => '2599-12-31T23:59:59',
                       'restrictedStatus' => 'open',
                       'startDate' => '2012-03-12T20:28:00',
                       'code' => 'BHZ'
                     }
        };
-------

No trimming.

There are many more parsing options for Perl and Python and other languages of course, but this is pretty basic stuff.   It is how a user such as myself would go about parsing and using StationXML.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.iris.washington.edu/pipermail/webservices/attachments/20140730/38f71645/attachment.html>