Thread: Re: Subroutine interface to SAC XML datasets

Started: 2008-01-31 17:59:10
Last activity: 2008-01-31 23:04:48
Topics: SAC Developers
George Helffrich
2008-01-31 17:59:10
Dear All -

The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.

Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is more
concise in the latter case.

On 31 Jan 2008, at 09:34, James Wookey wrote:

Hi Rob, George;

I can see the point that the data format currently proposed is a terse
one, basically a minimalist description of a set of SAC traces. This
does have some significant advantages: it is efficient in file size,
and it provides a direct connection to the header variables which,
after all, SAC users (as well as programmers) still have to refer to
by their short name. I don't think we should go the KML route, which
as George says, makes my eyes water with all the detail that is
required. As a 'consultation' format for SACML it is well designed, as
it is simple to understand and is conceptually very close to the
binary SAC format, and George has done sterling work implementing it.

However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay the
price of adopting a verbose format like XML (and I think we should) we
might as well try to reap some of the rewards, and also build enough
flexibility into the format to allow incorporation of future things
(even if they are currently ignored by the current input routines -
the ability to do that is one of the advantages of XML). It seems to
me that one thing worth considering is structuring the header. So one
possible format might look like:

<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>

This has the advantage of still being easy to 'one-sweep' read with
event-driven parsers like SAX (because you simply ignore the container
elements), plus providing a more object-oriented format for use with
parsers like DOM or xpath. We might also want to include/allow a
subgrouping of traces within the file: a <tracegroup> container
element for example.

Cheers,

James

On 31 Jan 2008, at 08:38, George Helffrich wrote:

Dear Rob -

The XML DTD is versioned, and one could imagine defining a new DTD
with alternative element groupings that would reflect the data
structure. One can obsess in describing data details, and one view
won't necessarily coincide with another person or community's view.
Google Earth's KML comes to mind -- unbelievably baroque for putting
points on a map for a seismologist, but probably glibly expressive
for GISers.

Another view to take of the data is programming semantics, however.
Programmers see 1) header variables that are peeked and poked at; 2)
data. That was the view I took of the present DTD definition.

On 31 Jan 2008, at 00:12, Robert Casey wrote:


Hi George-

An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:

It seems to me that the SAC XML format only half-divorces itself
from fixed-format files due to the naming scheme for the header
elements you've provided. Essentially, you've got just two entity
names inside of <trace> that have no semantic quality to them: 'h'
and 'd'.

The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:

<sacdataset>
<trace>
<header>
<stel>

Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.

The reason for the comment is not just for human readability, but
for the notion that many XML parsers will treat such elements as
objects, carrying the element name with them. Having your fields
broken down into meaningful names means that your objects will be
more independent and have stronger encapsulation properties.

If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to address
some naming aesthetics for consideration. I can imagine that
writing a parsing engine for XML in Fortran is headache enough, so I
don't want to cause you a migraine on top of it.

Cheers,

-Rob

On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

Dear All -

I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:

George Helffrich
george<at>geology.bristol.ac.uk


George Helffrich
george<at>geology.bristol.ac.uk


  • Brian Savage
    2008-01-31 17:46:08
    George

    I think moving away from the original header names might be a good idea.
    The header names such at stlo, stla and kstnm can be cryptic upon
    first encounter.
    I would suggest more readable names
    <station latitude="" longitude="" elevation="" name="" network=""></
    station>
    or
    <station>
    <latitude></latitude>
    <name></name>
    ....
    </station>

    Also, would it be possible to include references to the sac binary
    files, as they are now, into the xml format.
    <sacdataset>
    <trace file="PAS.CI.BHZ.SAC" />
    <trace file="PAS.CI.BHE.SAC" />
    <trace file="PAS.CI.BHN.SAC" />
    <trace id="HRV.IU.BHZ.SAC">
    <header >
    ...
    </header>
    <trace>
    ...
    </trace>
    </trace>
    </sacdataset>
    Which would allow for storing the data in either the current format
    (binary) or in the xml file.


    Cheers,
    Brian

    On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:

    Dear All -

    The key idea here, which is a good one, is subgroupings of the
    information in the header: 1) station information; 2) event
    information; 3) data characteristics. A fourth item, not well-
    served by the present SAC file structure, is more complete response
    information.

    Whether you express header information by <stel>500</stel> or <h
    name="stel">500</h> is a stylistic choice. The DTD description is
    more concise in the latter case.

    On 31 Jan 2008, at 09:34, James Wookey wrote:

    Hi Rob, George;

    I can see the point that the data format currently proposed is a
    terse one, basically a minimalist description of a set of SAC
    traces. This does have some significant advantages: it is
    efficient in file size, and it provides a direct connection to the
    header variables which, after all, SAC users (as well as
    programmers) still have to refer to by their short name. I don't
    think we should go the KML route, which as George says, makes my
    eyes water with all the detail that is required. As a
    'consultation' format for SACML it is well designed, as it is
    simple to understand and is conceptually very close to the binary
    SAC format, and George has done sterling work implementing it.

    However, in the longer term, I can also see the value in a limited
    expansion of the structure of SACML, if it is going to represent a
    large step forward in the SAC file format. If we are going to pay
    the price of adopting a verbose format like XML (and I think we
    should) we might as well try to reap some of the rewards, and also
    build enough flexibility into the format to allow incorporation of
    future things (even if they are currently ignored by the current
    input routines - the ability to do that is one of the advantages
    of XML). It seems to me that one thing worth considering is
    structuring the header. So one possible format might look like:

    <sacdataset>
    <trace>
    <header>
    <station>
    <kstnm>TEST</kstnm>
    <stla>40</stla>
    ...
    </station>
    <event>
    <evla>-20</evla>
    ...
    </event>
    <trace_info>
    <delta>0.05</delta>
    ...
    </trace_info>
    </header>
    <data>
    ...
    </data>
    </trace>
    </sacdataset>

    This has the advantage of still being easy to 'one-sweep' read
    with event-driven parsers like SAX (because you simply ignore the
    container elements), plus providing a more object-oriented format
    for use with parsers like DOM or xpath. We might also want to
    include/allow a subgrouping of traces within the file: a
    <tracegroup> container element for example.

    Cheers,

    James

    On 31 Jan 2008, at 08:38, George Helffrich wrote:

    Dear Rob -

    The XML DTD is versioned, and one could imagine defining a new
    DTD with alternative element groupings that would reflect the
    data structure. One can obsess in describing data details, and
    one view won't necessarily coincide with another person or
    community's view. Google Earth's KML comes to mind --
    unbelievably baroque for putting points on a map for a
    seismologist, but probably glibly expressive for GISers.

    Another view to take of the data is programming semantics,
    however. Programmers see 1) header variables that are peeked and
    poked at; 2) data. That was the view I took of the present DTD
    definition.

    On 31 Jan 2008, at 00:12, Robert Casey wrote:


    Hi George-

    An interesting effort with SAC XML and you've made a lot of
    progress from the looks of it. I hate to comment too harshly on
    something that may already be an established standard, so my
    comments are only meant as an observation:

    It seems to me that the SAC XML format only half-divorces
    itself from fixed-format files due to the naming scheme for the
    header elements you've provided. Essentially, you've got just
    two entity names inside of <trace> that have no semantic quality
    to them: 'h' and 'd'.

    The nature of XML is such that the entities tend to me more
    grouped and self descriptive as far as their names go. So
    instead of having <h name="STEL">, why could it not instead be
    <STEL> ? To indicate this as a subcomponent of a SAC header of
    a SAC trace, you'd form a hierarchy:

    <sacdataset>
    <trace>
    <header>
    <stel>

    Even better would be to just call it <elevation>, maybe with a
    reference to its SAC field abbreviation as an attribute:
    <elevation id="STEL">.

    The reason for the comment is not just for human readability,
    but for the notion that many XML parsers will treat such
    elements as objects, carrying the element name with them.
    Having your fields broken down into meaningful names means that
    your objects will be more independent and have stronger
    encapsulation properties.

    If your example format is already set in stone, then please
    continue with what works. I just felt the floor was open to
    address some naming aesthetics for consideration. I can imagine
    that writing a parsing engine for XML in Fortran is headache
    enough, so I don't want to cause you a migraine on top of it.

    Cheers,

    -Rob

    On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

    Dear All -

    I designed and implemented a subroutine interface to SAC XML
    datasets in the latest release of MacSAC. This message is to
    make you aware of the design ideas for architectural comment.
    I think that it shows the way forward to
    how SAC can move away from from a purely binary data format to
    one that
    embraces current practice in structuring and delivering data.

    A test program illustrates the concepts. Here is Fortran
    source code of
    an actual program used for testing during development:

    George Helffrich
    george<at>geology.bristol.ac.uk


    George Helffrich
    george<at>geology.bristol.ac.uk

    _______________________________________________
    sac-dev mailing list
    sac-dev<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/sac-dev



    • George Helffrich
      2008-01-31 23:04:48
      Dear Brian -

      Indeed, header field names are cryptic and they could be improved on.
      With a clearer idea of the intended users and/or viewers of the
      elements, the name scheme should clarify.

      I'm not sure about <trace file="PAS.CI.BHN.SAC" />, but it is an
      interesting idea. It is analogous to a Unix symbolic link, with
      similar semantic confusion (stat/lstat). Here it raises the issues of
      1) whether a dataset is entirely self-contained; and 2) what it means
      to "write" a dataset that contains links as well as trace data.

      On 31 Jan 2008, at 14:46, Brian Savage wrote:

      George

      I think moving away from the original header names might be a good
      idea.
      The header names such at stlo, stla and kstnm can be cryptic upon
      first encounter.
      I would suggest more readable names
      <station latitude="" longitude="" elevation="" name=""
      network=""></station>
      or
      <station>
      <latitude></latitude>
      <name></name>
      ....
      </station>

      Also, would it be possible to include references to the sac binary
      files, as they are now, into the xml format.
      <sacdataset>
      <trace file="PAS.CI.BHZ.SAC" />
      <trace file="PAS.CI.BHE.SAC" />
      <trace file="PAS.CI.BHN.SAC" />
      <trace id="HRV.IU.BHZ.SAC">
      <header >
      ...
      </header>
      <trace>
      ...
      </trace>
      </trace>
      </sacdataset>
      Which would allow for storing the data in either the current format
      (binary) or in the xml file.


      Cheers,
      Brian

      On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:

      Dear All -

      The key idea here, which is a good one, is subgroupings of the
      information in the header: 1) station information; 2) event
      information; 3) data characteristics. A fourth item, not well-served
      by the present SAC file structure, is more complete response
      information.

      Whether you express header information by <stel>500</stel> or <h
      name="stel">500</h> is a stylistic choice. The DTD description is
      more concise in the latter case.

      On 31 Jan 2008, at 09:34, James Wookey wrote:

      Hi Rob, George;

      I can see the point that the data format currently proposed is a
      terse one, basically a minimalist description of a set of SAC
      traces. This does have some significant advantages: it is efficient
      in file size, and it provides a direct connection to the header
      variables which, after all, SAC users (as well as programmers) still
      have to refer to by their short name. I don't think we should go the
      KML route, which as George says, makes my eyes water with all the
      detail that is required. As a 'consultation' format for SACML it is
      well designed, as it is simple to understand and is conceptually
      very close to the binary SAC format, and George has done sterling
      work implementing it.

      However, in the longer term, I can also see the value in a limited
      expansion of the structure of SACML, if it is going to represent a
      large step forward in the SAC file format. If we are going to pay
      the price of adopting a verbose format like XML (and I think we
      should) we might as well try to reap some of the rewards, and also
      build enough flexibility into the format to allow incorporation of
      future things (even if they are currently ignored by the current
      input routines - the ability to do that is one of the advantages of
      XML). It seems to me that one thing worth considering is structuring
      the header. So one possible format might look like:

      <sacdataset>
      <trace>
      <header>
      <station>
      <kstnm>TEST</kstnm>
      <stla>40</stla>
      ...
      </station>
      <event>
      <evla>-20</evla>
      ...
      </event>
      <trace_info>
      <delta>0.05</delta>
      ...
      </trace_info>
      </header>
      <data>
      ...
      </data>
      </trace>
      </sacdataset>

      This has the advantage of still being easy to 'one-sweep' read with
      event-driven parsers like SAX (because you simply ignore the
      container elements), plus providing a more object-oriented format
      for use with parsers like DOM or xpath. We might also want to
      include/allow a subgrouping of traces within the file: a
      <tracegroup> container element for example.

      Cheers,

      James

      On 31 Jan 2008, at 08:38, George Helffrich wrote:

      Dear Rob -

      The XML DTD is versioned, and one could imagine defining a new DTD
      with alternative element groupings that would reflect the data
      structure. One can obsess in describing data details, and one view
      won't necessarily coincide with another person or community's view.
      Google Earth's KML comes to mind -- unbelievably baroque for
      putting points on a map for a seismologist, but probably glibly
      expressive for GISers.

      Another view to take of the data is programming semantics,
      however. Programmers see 1) header variables that are peeked and
      poked at; 2) data. That was the view I took of the present DTD
      definition.

      On 31 Jan 2008, at 00:12, Robert Casey wrote:


      Hi George-

      An interesting effort with SAC XML and you've made a lot of
      progress from the looks of it. I hate to comment too harshly on
      something that may already be an established standard, so my
      comments are only meant as an observation:

      It seems to me that the SAC XML format only half-divorces itself
      from fixed-format files due to the naming scheme for the header
      elements you've provided. Essentially, you've got just two entity
      names inside of <trace> that have no semantic quality to them: 'h'
      and 'd'.

      The nature of XML is such that the entities tend to me more
      grouped and self descriptive as far as their names go. So instead
      of having <h name="STEL">, why could it not instead be <STEL> ?
      To indicate this as a subcomponent of a SAC header of a SAC trace,
      you'd form a hierarchy:

      <sacdataset>
      <trace>
      <header>
      <stel>

      Even better would be to just call it <elevation>, maybe with a
      reference to its SAC field abbreviation as an attribute:
      <elevation id="STEL">.

      The reason for the comment is not just for human readability, but
      for the notion that many XML parsers will treat such elements as
      objects, carrying the element name with them. Having your fields
      broken down into meaningful names means that your objects will be
      more independent and have stronger encapsulation properties.

      If your example format is already set in stone, then please
      continue with what works. I just felt the floor was open to
      address some naming aesthetics for consideration. I can imagine
      that writing a parsing engine for XML in Fortran is headache
      enough, so I don't want to cause you a migraine on top of it.

      Cheers,

      -Rob

      On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

      Dear All -

      I designed and implemented a subroutine interface to SAC XML
      datasets in the latest release of MacSAC. This message is to
      make you aware of the design ideas for architectural comment. I
      think that it shows the way forward to
      how SAC can move away from from a purely binary data format to
      one that
      embraces current practice in structuring and delivering data.

      A test program illustrates the concepts. Here is Fortran source
      code of
      an actual program used for testing during development:

      George Helffrich
      george<at>geology.bristol.ac.uk


      George Helffrich
      george<at>geology.bristol.ac.uk

      _______________________________________________
      sac-dev mailing list
      sac-dev<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/sac-dev


      _______________________________________________
      sac-dev mailing list
      sac-dev<at>iris.washington.edu
      http://www.iris.washington.edu/mailman/listinfo/sac-dev

      George Helffrich
      george<at>geology.bristol.ac.uk


05:18:06 v.22510d55