SAGE: Thread: Re: Subroutine interface to SAC XML datasets

Started: 2008-01-31 17:59:10

Last activity: 2008-01-31 23:04:48

Topics: SAC Developers

George Helffrich

Re: Subroutine interface to SAC XML datasets

2008-01-31 17:59:10

Dear All -

The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.

Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is more
concise in the latter case.

On 31 Jan 2008, at 09:34, James Wookey wrote:

Hi Rob, George;

I can see the point that the data format currently proposed is a terse
one, basically a minimalist description of a set of SAC traces. This
does have some significant advantages: it is efficient in file size,
and it provides a direct connection to the header variables which,
after all, SAC users (as well as programmers) still have to refer to
by their short name. I don't think we should go the KML route, which
as George says, makes my eyes water with all the detail that is
required. As a 'consultation' format for SACML it is well designed, as
it is simple to understand and is conceptually very close to the
binary SAC format, and George has done sterling work implementing it.

However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay the
price of adopting a verbose format like XML (and I think we should) we
might as well try to reap some of the rewards, and also build enough
flexibility into the format to allow incorporation of future things
(even if they are currently ignored by the current input routines -
the ability to do that is one of the advantages of XML). It seems to
me that one thing worth considering is structuring the header. So one
possible format might look like:

<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>

This has the advantage of still being easy to 'one-sweep' read with
event-driven parsers like SAX (because you simply ignore the container
elements), plus providing a more object-oriented format for use with
parsers like DOM or xpath. We might also want to include/allow a
subgrouping of traces within the file: a <tracegroup> container
element for example.

Cheers,

James

On 31 Jan 2008, at 08:38, George Helffrich wrote:

Dear Rob -

The XML DTD is versioned, and one could imagine defining a new DTD
with alternative element groupings that would reflect the data
structure. One can obsess in describing data details, and one view
won't necessarily coincide with another person or community's view.
Google Earth's KML comes to mind -- unbelievably baroque for putting
points on a map for a seismologist, but probably glibly expressive
for GISers.

Another view to take of the data is programming semantics, however.
Programmers see 1) header variables that are peeked and poked at; 2)
data. That was the view I took of the present DTD definition.

On 31 Jan 2008, at 00:12, Robert Casey wrote:

Hi George-

An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:

It seems to me that the SAC XML format only half-divorces itself
from fixed-format files due to the naming scheme for the header
elements you've provided. Essentially, you've got just two entity
names inside of <trace> that have no semantic quality to them: 'h'
and 'd'.

The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:

<sacdataset>
<trace>
<header>
<stel>

Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.

The reason for the comment is not just for human readability, but
for the notion that many XML parsers will treat such elements as
objects, carrying the element name with them. Having your fields
broken down into meaningful names means that your objects will be
more independent and have stronger encapsulation properties.

If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to address
some naming aesthetics for consideration. I can imagine that
writing a parsing engine for XML in Fortran is headache enough, so I
don't want to cause you a migraine on top of it.

Cheers,

-Rob

On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

Dear All -

I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:

George Helffrich
george<at>geology.bristol.ac.uk

George Helffrich
george<at>geology.bristol.ac.uk

Brian Savage

Re: Subroutine interface to SAC XML datasets

2008-01-31 17:46:08

George

I think moving away from the original header names might be a good idea.
The header names such at stlo, stla and kstnm can be cryptic upon
first encounter.
I would suggest more readable names
<station latitude="" longitude="" elevation="" name="" network=""></
station>
or
<station>
<latitude></latitude>
<name></name>
....
</station>

Also, would it be possible to include references to the sac binary
files, as they are now, into the xml format.
<sacdataset>
<trace file="PAS.CI.BHZ.SAC" />
<trace file="PAS.CI.BHE.SAC" />
<trace file="PAS.CI.BHN.SAC" />
<trace id="HRV.IU.BHZ.SAC">
<header >
...
</header>
<trace>
...
</trace>
</trace>
</sacdataset>
Which would allow for storing the data in either the current format
(binary) or in the xml file.

Cheers,
Brian

On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:

Dear All -

The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-
served by the present SAC file structure, is more complete response
information.

Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is
more concise in the latter case.

On 31 Jan 2008, at 09:34, James Wookey wrote:

Hi Rob, George;

I can see the point that the data format currently proposed is a
terse one, basically a minimalist description of a set of SAC
traces. This does have some significant advantages: it is
efficient in file size, and it provides a direct connection to the
header variables which, after all, SAC users (as well as
programmers) still have to refer to by their short name. I don't
think we should go the KML route, which as George says, makes my
eyes water with all the detail that is required. As a
'consultation' format for SACML it is well designed, as it is
simple to understand and is conceptually very close to the binary
SAC format, and George has done sterling work implementing it.

However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay
the price of adopting a verbose format like XML (and I think we
should) we might as well try to reap some of the rewards, and also
build enough flexibility into the format to allow incorporation of
future things (even if they are currently ignored by the current
input routines - the ability to do that is one of the advantages
of XML). It seems to me that one thing worth considering is
structuring the header. So one possible format might look like:

<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>

This has the advantage of still being easy to 'one-sweep' read
with event-driven parsers like SAX (because you simply ignore the
container elements), plus providing a more object-oriented format
for use with parsers like DOM or xpath. We might also want to
include/allow a subgrouping of traces within the file: a
<tracegroup> container element for example.

Cheers,

James

On 31 Jan 2008, at 08:38, George Helffrich wrote:

Dear Rob -

The XML DTD is versioned, and one could imagine defining a new
DTD with alternative element groupings that would reflect the
data structure. One can obsess in describing data details, and
one view won't necessarily coincide with another person or
community's view. Google Earth's KML comes to mind --
unbelievably baroque for putting points on a map for a
seismologist, but probably glibly expressive for GISers.

Another view to take of the data is programming semantics,
however. Programmers see 1) header variables that are peeked and
poked at; 2) data. That was the view I took of the present DTD
definition.

On 31 Jan 2008, at 00:12, Robert Casey wrote:

Hi George-

An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:

It seems to me that the SAC XML format only half-divorces
itself from fixed-format files due to the naming scheme for the
header elements you've provided. Essentially, you've got just
two entity names inside of <trace> that have no semantic quality
to them: 'h' and 'd'.

The nature of XML is such that the entities tend to me more
grouped and self descriptive as far as their names go. So
instead of having <h name="STEL">, why could it not instead be
<STEL> ? To indicate this as a subcomponent of a SAC header of
a SAC trace, you'd form a hierarchy:

<sacdataset>
<trace>
<header>
<stel>

Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute:
<elevation id="STEL">.

The reason for the comment is not just for human readability,
but for the notion that many XML parsers will treat such
elements as objects, carrying the element name with them.
Having your fields broken down into meaningful names means that
your objects will be more independent and have stronger
encapsulation properties.

If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to
address some naming aesthetics for consideration. I can imagine
that writing a parsing engine for XML in Fortran is headache
enough, so I don't want to cause you a migraine on top of it.

Cheers,

-Rob

On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

Dear All -

I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to
make you aware of the design ideas for architectural comment.
I think that it shows the way forward to
how SAC can move away from from a purely binary data format to
one that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran
source code of
an actual program used for testing during development:

George Helffrich
george<at>geology.bristol.ac.uk

George Helffrich
george<at>geology.bristol.ac.uk

_______________________________________________
sac-dev mailing list
sac-dev<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/sac-dev
- George Helffrich
  
  Re: Subroutine interface to SAC XML datasets
  
  2008-01-31 23:04:48
  
  Dear Brian -
  
  Indeed, header field names are cryptic and they could be improved on.
  With a clearer idea of the intended users and/or viewers of the
  elements, the name scheme should clarify.
  
  I'm not sure about <trace file="PAS.CI.BHN.SAC" />, but it is an
  interesting idea. It is analogous to a Unix symbolic link, with
  similar semantic confusion (stat/lstat). Here it raises the issues of
  1) whether a dataset is entirely self-contained; and 2) what it means
  to "write" a dataset that contains links as well as trace data.
  
  On 31 Jan 2008, at 14:46, Brian Savage wrote:
  
  George
  
  I think moving away from the original header names might be a good
  idea.
  The header names such at stlo, stla and kstnm can be cryptic upon
  first encounter.
  I would suggest more readable names
  <station latitude="" longitude="" elevation="" name=""
  network=""></station>
  or
  <station>
  <latitude></latitude>
  <name></name>
  ....
  </station>
  
  Also, would it be possible to include references to the sac binary
  files, as they are now, into the xml format.
  <sacdataset>
  <trace file="PAS.CI.BHZ.SAC" />
  <trace file="PAS.CI.BHE.SAC" />
  <trace file="PAS.CI.BHN.SAC" />
  <trace id="HRV.IU.BHZ.SAC">
  <header >
  ...
  </header>
  <trace>
  ...
  </trace>
  </trace>
  </sacdataset>
  Which would allow for storing the data in either the current format
  (binary) or in the xml file.
  
  Cheers,
  Brian
  
  On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:
  
  Dear All -
  
  The key idea here, which is a good one, is subgroupings of the
  information in the header: 1) station information; 2) event
  information; 3) data characteristics. A fourth item, not well-served
  by the present SAC file structure, is more complete response
  information.
  
  Whether you express header information by <stel>500</stel> or <h
  name="stel">500</h> is a stylistic choice. The DTD description is
  more concise in the latter case.
  
  On 31 Jan 2008, at 09:34, James Wookey wrote:
  
  Hi Rob, George;
  
  I can see the point that the data format currently proposed is a
  terse one, basically a minimalist description of a set of SAC
  traces. This does have some significant advantages: it is efficient
  in file size, and it provides a direct connection to the header
  variables which, after all, SAC users (as well as programmers) still
  have to refer to by their short name. I don't think we should go the
  KML route, which as George says, makes my eyes water with all the
  detail that is required. As a 'consultation' format for SACML it is
  well designed, as it is simple to understand and is conceptually
  very close to the binary SAC format, and George has done sterling
  work implementing it.
  
  However, in the longer term, I can also see the value in a limited
  expansion of the structure of SACML, if it is going to represent a
  large step forward in the SAC file format. If we are going to pay
  the price of adopting a verbose format like XML (and I think we
  should) we might as well try to reap some of the rewards, and also
  build enough flexibility into the format to allow incorporation of
  future things (even if they are currently ignored by the current
  input routines - the ability to do that is one of the advantages of
  XML). It seems to me that one thing worth considering is structuring
  the header. So one possible format might look like:
  
  <sacdataset>
  <trace>
  <header>
  <station>
  <kstnm>TEST</kstnm>
  <stla>40</stla>
  ...
  </station>
  <event>
  <evla>-20</evla>
  ...
  </event>
  <trace_info>
  <delta>0.05</delta>
  ...
  </trace_info>
  </header>
  <data>
  ...
  </data>
  </trace>
  </sacdataset>
  
  This has the advantage of still being easy to 'one-sweep' read with
  event-driven parsers like SAX (because you simply ignore the
  container elements), plus providing a more object-oriented format
  for use with parsers like DOM or xpath. We might also want to
  include/allow a subgrouping of traces within the file: a
  <tracegroup> container element for example.
  
  Cheers,
  
  James
  
  On 31 Jan 2008, at 08:38, George Helffrich wrote:
  
  Dear Rob -
  
  The XML DTD is versioned, and one could imagine defining a new DTD
  with alternative element groupings that would reflect the data
  structure. One can obsess in describing data details, and one view
  won't necessarily coincide with another person or community's view.
  Google Earth's KML comes to mind -- unbelievably baroque for
  putting points on a map for a seismologist, but probably glibly
  expressive for GISers.
  
  Another view to take of the data is programming semantics,
  however. Programmers see 1) header variables that are peeked and
  poked at; 2) data. That was the view I took of the present DTD
  definition.
  
  On 31 Jan 2008, at 00:12, Robert Casey wrote:
  
  Hi George-
  
  An interesting effort with SAC XML and you've made a lot of
  progress from the looks of it. I hate to comment too harshly on
  something that may already be an established standard, so my
  comments are only meant as an observation:
  
  It seems to me that the SAC XML format only half-divorces itself
  from fixed-format files due to the naming scheme for the header
  elements you've provided. Essentially, you've got just two entity
  names inside of <trace> that have no semantic quality to them: 'h'
  and 'd'.
  
  The nature of XML is such that the entities tend to me more
  grouped and self descriptive as far as their names go. So instead
  of having <h name="STEL">, why could it not instead be <STEL> ?
  To indicate this as a subcomponent of a SAC header of a SAC trace,
  you'd form a hierarchy:
  
  <sacdataset>
  <trace>
  <header>
  <stel>
  
  Even better would be to just call it <elevation>, maybe with a
  reference to its SAC field abbreviation as an attribute:
  <elevation id="STEL">.
  
  The reason for the comment is not just for human readability, but
  for the notion that many XML parsers will treat such elements as
  objects, carrying the element name with them. Having your fields
  broken down into meaningful names means that your objects will be
  more independent and have stronger encapsulation properties.
  
  If your example format is already set in stone, then please
  continue with what works. I just felt the floor was open to
  address some naming aesthetics for consideration. I can imagine
  that writing a parsing engine for XML in Fortran is headache
  enough, so I don't want to cause you a migraine on top of it.
  
  Cheers,
  
  -Rob
  
  On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
  
  Dear All -
  
  I designed and implemented a subroutine interface to SAC XML
  datasets in the latest release of MacSAC. This message is to
  make you aware of the design ideas for architectural comment. I
  think that it shows the way forward to
  how SAC can move away from from a purely binary data format to
  one that
  embraces current practice in structuring and delivering data.
  
  A test program illustrates the concepts. Here is Fortran source
  code of
  an actual program used for testing during development:
  
  George Helffrich
  george<at>geology.bristol.ac.uk
  
  George Helffrich
  george<at>geology.bristol.ac.uk
  
  _______________________________________________
  sac-dev mailing list
  sac-dev<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/sac-dev
  
  _______________________________________________
  sac-dev mailing list
  sac-dev<at>iris.washington.edu
  http://www.iris.washington.edu/mailman/listinfo/sac-dev
  
  George Helffrich
  george<at>geology.bristol.ac.uk

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: Re: Subroutine interface to SAC XML datasets

Connect