Thread: some comments on new staitonXML

Started: 2011-10-27 19:20:15
Last activity: 2011-10-27 22:43:58
Topics: Web Services
Philip Crotwell
2011-10-27 19:20:15
HI

Time has gotten away from me, but I will try and read the new schema
more carefully soon and send in any more comments I have. But do have
three so far.

First is, please consider some sort of versioning in the schema. I
believe I posted a comment some time back about this, but for an xml
schema to be useful, you need to be able to pair an instance XML
document with the schema that it uses. My understanding is that this
is usually accomplished by adding a version of some kind to the
namespace url. Currently the namespace for stationxml is:
http://www.data.scec.org/xml/station/
and as far as I can tell has always been that in spite of several
revisions. This means that if I have two stationXML instances, one
from yesterday and one from the day after you release this new schema,
there will be no way for an application to decide which schema to use
to validate the document. You might want to change the namespace to be
something like:
http://www.data.scec.org/xml/station/2.0
or
http://www.data.scec.org/xml/station/2011
and then store the appropriate schema file within that directory on
the web server for easy access, ie at
http://www.data.scec.org/xml/station/2011/station.xsd

Along with that, old versions of the schema should be kept online so
that old xml instances can still be validated against their version of
the schema.

There are probably other ways of versioning, so perhaps look around,
but please, please use some type of version. There needs to be
something in both the stationxml schema and in a stationxml instance
document that give the appropriate versions of the schema to use for a
client to validate and parse against.

Second, just my opinion and I have not looked at what you have done
with using <any> elements in the schema, but I would be very careful
about this. While the notion of an "any" is powerful and seems to
allow great flexibility, it has a real downside because the contents
in the "any" is no longer stationxml and hence harder to parse and
extract without additional information. It appears you are using the
any in a very limited manner, so you have probably already considered
this, but just wanted to sound a warning.

My third comment is probably not likely to happen, but I will put it
out there anyway, consider using relaxng instead of xschema. Relaxng
is so much nicer to read and has features that xschema lacks that are
really useful. For example, relaxng has the notion "interleave"
elements, so you can specify that the contents of an Network element,
for example, has to contain a <startDate>, an <endDate> and a
<description>, but that the order does not matter. This is more
natural way to think of the concept of a "network" as opposed to a
xschema <sequence> where order is required for validity. Of course
order does matter some times, but for the bulk of data-centric xml,
the order requirement is simply irrelevant, and yet the schema
requires it to be matched for validation. This has come up recently
because the IRIS ws/station web service generates xml with
<SelectedNumberStations> coming after the <Station> elements. I can
deal with the out of order, but it causes validation errors that
preclude me validating the output of the station web service to check
for other more serious validation issues. As I said, I realize
switching from xschema to relaxng would be a big change, but thought I
might as well toss the idea into the ring. I would be willing to help
with this should you choose to make the translation.

More info on relaxNG here:
relaxng.org
and
http://books.xmlschemata.org/relaxng/

thanks,
Philip

On Fri, Oct 21, 2011 at 11:55 AM, Ellen Yu <eyu<at>gps.caltech.edu> wrote:
Phillip,

We are hoping that we will only need to make minor revisions.  We definitely
would like any feedback as we realize there may be unforeseen issues that
only will be shook out as people start to use the format.

We are going to release a new version that has <any> elements to allow
people to add additional information not included in StationXML.  You can
find it at
http://www.data.scec.org/xml/station/20111019/station.xsd


Regards,

Ellen



  • Shang-Lin Chen
    2011-10-27 22:43:58
    Hi Philip,

    We store versions of the schema in subdirectories named for the release
    date. However, we were also storing a copy of the current version of the
    schema at http://www.data.scec.org/xml/station/station.xsd, which was
    the copy linked by the StationXML web page. With the October 19, 2011,
    revision, we've switched to linking directly to the date directory,
    http://www.data.scec.org/xml/station/20111019/. Documents that specify
    http://www.data.scec.org/xml/station/20111019/station.xsd as the
    schemaLocation should not break with future releases.

    Past revisions of the schema are listed on
    http://www.data.scec.org/station/xml_old_versions.html (previously
    http://www.data.scec.org/xml/station/old_versions.html, which now
    forwards to the new page).

    Shang-Lin

    Philip Crotwell wrote:
    HI

    Time has gotten away from me, but I will try and read the new schema
    more carefully soon and send in any more comments I have. But do have
    three so far.

    First is, please consider some sort of versioning in the schema. I
    believe I posted a comment some time back about this, but for an xml
    schema to be useful, you need to be able to pair an instance XML
    document with the schema that it uses. My understanding is that this
    is usually accomplished by adding a version of some kind to the
    namespace url. Currently the namespace for stationxml is:
    http://www.data.scec.org/xml/station/
    and as far as I can tell has always been that in spite of several
    revisions. This means that if I have two stationXML instances, one
    from yesterday and one from the day after you release this new schema,
    there will be no way for an application to decide which schema to use
    to validate the document. You might want to change the namespace to be
    something like:
    http://www.data.scec.org/xml/station/2.0
    or
    http://www.data.scec.org/xml/station/2011
    and then store the appropriate schema file within that directory on
    the web server for easy access, ie at
    http://www.data.scec.org/xml/station/2011/station.xsd

    Along with that, old versions of the schema should be kept online so
    that old xml instances can still be validated against their version of
    the schema.

    There are probably other ways of versioning, so perhaps look around,
    but please, please use some type of version. There needs to be
    something in both the stationxml schema and in a stationxml instance
    document that give the appropriate versions of the schema to use for a
    client to validate and parse against.

    Second, just my opinion and I have not looked at what you have done
    with using <any> elements in the schema, but I would be very careful
    about this. While the notion of an "any" is powerful and seems to
    allow great flexibility, it has a real downside because the contents
    in the "any" is no longer stationxml and hence harder to parse and
    extract without additional information. It appears you are using the
    any in a very limited manner, so you have probably already considered
    this, but just wanted to sound a warning.

    My third comment is probably not likely to happen, but I will put it
    out there anyway, consider using relaxng instead of xschema. Relaxng
    is so much nicer to read and has features that xschema lacks that are
    really useful. For example, relaxng has the notion "interleave"
    elements, so you can specify that the contents of an Network element,
    for example, has to contain a <startDate>, an <endDate> and a
    <description>, but that the order does not matter. This is more
    natural way to think of the concept of a "network" as opposed to a
    xschema <sequence> where order is required for validity. Of course
    order does matter some times, but for the bulk of data-centric xml,
    the order requirement is simply irrelevant, and yet the schema
    requires it to be matched for validation. This has come up recently
    because the IRIS ws/station web service generates xml with
    <SelectedNumberStations> coming after the <Station> elements. I can
    deal with the out of order, but it causes validation errors that
    preclude me validating the output of the station web service to check
    for other more serious validation issues. As I said, I realize
    switching from xschema to relaxng would be a big change, but thought I
    might as well toss the idea into the ring. I would be willing to help
    with this should you choose to make the translation.

    More info on relaxNG here:
    relaxng.org
    and
    http://books.xmlschemata.org/relaxng/

    thanks,
    Philip

    On Fri, Oct 21, 2011 at 11:55 AM, Ellen Yu <eyu<at>gps.caltech.edu> wrote:

    Phillip,

    We are hoping that we will only need to make minor revisions. We definitely
    would like any feedback as we realize there may be unforeseen issues that
    only will be shook out as people start to use the format.

    We are going to release a new version that has <any> elements to allow
    people to add additional information not included in StationXML. You can
    find it at
    http://www.data.scec.org/xml/station/20111019/station.xsd


    Regards,

    Ellen



    _______________________________________________
    webservices mailing list
    webservices<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/webservices



    --
    Shang-Lin Chen
    Southern California Earthquake Data Center (SCEDC)
    System administrator/Programmer
    http://www.data.scec.org


11:03:51 v.b4412d20