Thread: Subroutine interface to SAC XML datasets

Started: 2008-01-30 21:05:41
Last activity: 2008-01-31 16:38:25
Topics: SAC Developers
George Helffrich
2008-01-30 21:05:41
Dear All -

I designed and implemented a subroutine interface to SAC XML datasets
in the latest release of MacSAC. This message is to make you aware of
the design ideas for architectural comment. I think that it shows the
way forward to
how SAC can move away from from a purely binary data format to one that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran source code
of
an actual program used for testing during development:
---
C Test program to try out XML query of datasets.
parameter (nmax=20, ndat=4096)
character fn*128, ftype(nmax)*8, fname(nmax)*32, tn*16, name*64
integer qsacxml, rsac1xml, rsac2xml
real data(ndat)

C Test qsacxml

call getarg(1,fn)
if (fn.eq.' ') stop '**No file name given.'

ntmx = qsacxml(fn,nmax,ftype,fname)
write(*,*) 'Got result of ',ntmx,' from qsacxml.'

if (ntmx.gt.0) then
do i=1,min(ntmx,nmax)
ix = index(ftype(i),' ')-1
iy = index(fname(i),' ')-1
write(*,*) 'File ',i,' type ',ftype(i)(1:ix),' ',
& fname(i)(1:iy)
enddo
endif

C Test rsacxml

call getarg(2,tn)
if (tn.ne.' ') then
read(tn,*,iostat=ios) nt
if (ios.ne.0) stop '**Bad trace number.'
if (nt.gt.ntmx) stop '**Off end of traces in file.'
if (ftype(nt) .ne. 'ITIME') stop '**Trace not time series'
call rsacxml(fn,'ITIME',nt,name,ndat,data,data,nerr)
if (nerr.eq.0) then
call getnhv('npts',npts,nerr)
if (nerr.ne.0) stop '**Bad NPTS return from GETNHV.'
write(*,*) 'File ',nt,', named ',name(1:index(name,' ')),
& 'NPTS ',npts
write(*,*) 'First 4 points: ',(data(i),i=1,4)
else
write(*,*) 'NERR ',nerr,' from rsacxml.'
endif
endif
end
-----
There are two routines used here. One, qsacxml, queries the content of
a SAC
XML dataset. It returns three items: 1) number of traces; 2) type of
each
trace; 3) name of each trace. The return of the type and name can be
suppressed by providing zero-length arrays to return the information.

rsacxml reads a particular trace from the dataset. It checks that the
type
of the trace is appropriate and returns data from it if so. Once a
trace is
read, one gets the header information using the standard header access
functions.

There is an analogous function, akin to wsac0, to write a trace into a
dataset. The data can be either compressed (base-64 encoded) or plain
text.

This replicates present SAC rsac1/rsac2/wsac0/wsac1 functionality. One
can
imagine more sophisticated interfaces, possibly Fortran-90 based, that
would
return allocated structures bearing an entire dataset's worth of data.

I'm reporting this to stimulate some discussion on where people see
SAC's data
format going. You also have a test-bed to learn where the XML method's
strengths and weaknesses lie. A key advantage of an XML-based structure
is that it is extensible, both in the expressiveness of header
information
and in the types of data one can put into it. It is also indifferent to
the location of the data: it can read XML from a URL as well as from a
local file. The structure could solve Tim Ahern's problem of long IEEE
real data, as well as the lack of time precision that Bob Herrmann noted
in SAC's present header variable structure. And, of course, it solves
the endianness problems that is our sad legacy to the next generation of
seismic software developers.

A URL with an XML dataset is at
http://www1.gly.bris.ac.uk/MacSAC/korea.sds
for tryout, if you wish to.

George Helffrich


  • Robert Casey
    2008-01-31 00:12:22

    Hi George-

    An interesting effort with SAC XML and you've made a lot of progress
    from the looks of it. I hate to comment too harshly on something that
    may already be an established standard, so my comments are only meant
    as an observation:

    It seems to me that the SAC XML format only half-divorces itself from
    fixed-format files due to the naming scheme for the header elements
    you've provided. Essentially, you've got just two entity names inside
    of <trace> that have no semantic quality to them: 'h' and 'd'.

    The nature of XML is such that the entities tend to me more grouped
    and self descriptive as far as their names go. So instead of having
    <h name="STEL">, why could it not instead be <STEL> ? To indicate
    this as a subcomponent of a SAC header of a SAC trace, you'd form a
    hierarchy:

    <sacdataset>
    <trace>
    <header>
    <stel>

    Even better would be to just call it <elevation>, maybe with a
    reference to its SAC field abbreviation as an attribute: <elevation
    id="STEL">.

    The reason for the comment is not just for human readability, but for
    the notion that many XML parsers will treat such elements as objects,
    carrying the element name with them. Having your fields broken down
    into meaningful names means that your objects will be more independent
    and have stronger encapsulation properties.

    If your example format is already set in stone, then please continue
    with what works. I just felt the floor was open to address some
    naming aesthetics for consideration. I can imagine that writing a
    parsing engine for XML in Fortran is headache enough, so I don't want
    to cause you a migraine on top of it.

    Cheers,

    -Rob

    On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

    Dear All -

    I designed and implemented a subroutine interface to SAC XML
    datasets in the latest release of MacSAC. This message is to make
    you aware of the design ideas for architectural comment. I think
    that it shows the way forward to
    how SAC can move away from from a purely binary data format to one
    that
    embraces current practice in structuring and delivering data.

    A test program illustrates the concepts. Here is Fortran source
    code of
    an actual program used for testing during development:

    • George Helffrich
      2008-01-31 16:38:25
      Dear Rob -

      The XML DTD is versioned, and one could imagine defining a new DTD
      with alternative element groupings that would reflect the data
      structure. One can obsess in describing data details, and one view
      won't necessarily coincide with another person or community's view.
      Google Earth's KML comes to mind -- unbelievably baroque for putting
      points on a map for a seismologist, but probably glibly expressive for
      GISers.

      Another view to take of the data is programming semantics, however.
      Programmers see 1) header variables that are peeked and poked at; 2)
      data. That was the view I took of the present DTD definition.

      On 31 Jan 2008, at 00:12, Robert Casey wrote:


      Hi George-

      An interesting effort with SAC XML and you've made a lot of progress
      from the looks of it. I hate to comment too harshly on something that
      may already be an established standard, so my comments are only meant
      as an observation:

      It seems to me that the SAC XML format only half-divorces itself from
      fixed-format files due to the naming scheme for the header elements
      you've provided. Essentially, you've got just two entity names inside
      of <trace> that have no semantic quality to them: 'h' and 'd'.

      The nature of XML is such that the entities tend to me more grouped
      and self descriptive as far as their names go. So instead of having
      <h name="STEL">, why could it not instead be <STEL> ? To indicate
      this as a subcomponent of a SAC header of a SAC trace, you'd form a
      hierarchy:

      <sacdataset>
      <trace>
      <header>
      <stel>

      Even better would be to just call it <elevation>, maybe with a
      reference to its SAC field abbreviation as an attribute: <elevation
      id="STEL">.

      The reason for the comment is not just for human readability, but for
      the notion that many XML parsers will treat such elements as objects,
      carrying the element name with them. Having your fields broken down
      into meaningful names means that your objects will be more independent
      and have stronger encapsulation properties.

      If your example format is already set in stone, then please continue
      with what works. I just felt the floor was open to address some
      naming aesthetics for consideration. I can imagine that writing a
      parsing engine for XML in Fortran is headache enough, so I don't want
      to cause you a migraine on top of it.

      Cheers,

      -Rob

      On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

      Dear All -

      I designed and implemented a subroutine interface to SAC XML
      datasets in the latest release of MacSAC. This message is to make
      you aware of the design ideas for architectural comment. I think
      that it shows the way forward to
      how SAC can move away from from a purely binary data format to one
      that
      embraces current practice in structuring and delivering data.

      A test program illustrates the concepts. Here is Fortran source
      code of
      an actual program used for testing during development:

      George Helffrich
      george<at>geology.bristol.ac.uk


09:21:36 v.01697673