SAGE: Thread: Subroutine interface to SAC XML datasets

Started: 2008-01-30 21:05:41

Last activity: 2008-01-31 16:38:25

Topics: SAC Developers

George Helffrich

Subroutine interface to SAC XML datasets

2008-01-30 21:05:41

Dear All -

I designed and implemented a subroutine interface to SAC XML datasets
in the latest release of MacSAC. This message is to make you aware of
the design ideas for architectural comment. I think that it shows the
way forward to
how SAC can move away from from a purely binary data format to one that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran source code
of
an actual program used for testing during development:
---
C Test program to try out XML query of datasets.
parameter (nmax=20, ndat=4096)
character fn*128, ftype(nmax)*8, fname(nmax)*32, tn*16, name*64
integer qsacxml, rsac1xml, rsac2xml
real data(ndat)

C Test qsacxml

call getarg(1,fn)
if (fn.eq.' ') stop '**No file name given.'

ntmx = qsacxml(fn,nmax,ftype,fname)
write(*,*) 'Got result of ',ntmx,' from qsacxml.'

if (ntmx.gt.0) then
do i=1,min(ntmx,nmax)
ix = index(ftype(i),' ')-1
iy = index(fname(i),' ')-1
write(*,*) 'File ',i,' type ',ftype(i)(1:ix),' ',
& fname(i)(1:iy)
enddo
endif

C Test rsacxml

call getarg(2,tn)
if (tn.ne.' ') then
read(tn,*,iostat=ios) nt
if (ios.ne.0) stop '**Bad trace number.'
if (nt.gt.ntmx) stop '**Off end of traces in file.'
if (ftype(nt) .ne. 'ITIME') stop '**Trace not time series'
call rsacxml(fn,'ITIME',nt,name,ndat,data,data,nerr)
if (nerr.eq.0) then
call getnhv('npts',npts,nerr)
if (nerr.ne.0) stop '**Bad NPTS return from GETNHV.'
write(*,*) 'File ',nt,', named ',name(1:index(name,' ')),
& 'NPTS ',npts
write(*,*) 'First 4 points: ',(data(i),i=1,4)
else
write(*,*) 'NERR ',nerr,' from rsacxml.'
endif
endif
end
-----
There are two routines used here. One, qsacxml, queries the content of
a SAC
XML dataset. It returns three items: 1) number of traces; 2) type of
each
trace; 3) name of each trace. The return of the type and name can be
suppressed by providing zero-length arrays to return the information.

rsacxml reads a particular trace from the dataset. It checks that the
type
of the trace is appropriate and returns data from it if so. Once a
trace is
read, one gets the header information using the standard header access
functions.

There is an analogous function, akin to wsac0, to write a trace into a
dataset. The data can be either compressed (base-64 encoded) or plain
text.

This replicates present SAC rsac1/rsac2/wsac0/wsac1 functionality. One
can
imagine more sophisticated interfaces, possibly Fortran-90 based, that
would
return allocated structures bearing an entire dataset's worth of data.

I'm reporting this to stimulate some discussion on where people see
SAC's data
format going. You also have a test-bed to learn where the XML method's
strengths and weaknesses lie. A key advantage of an XML-based structure
is that it is extensible, both in the expressiveness of header
information
and in the types of data one can put into it. It is also indifferent to
the location of the data: it can read XML from a URL as well as from a
local file. The structure could solve Tim Ahern's problem of long IEEE
real data, as well as the lack of time precision that Bob Herrmann noted
in SAC's present header variable structure. And, of course, it solves
the endianness problems that is our sad legacy to the next generation of
seismic software developers.

A URL with an XML dataset is at
http://www1.gly.bris.ac.uk/MacSAC/korea.sds
for tryout, if you wish to.

George Helffrich

Robert Casey

Re: Subroutine interface to SAC XML datasets

2008-01-31 00:12:22

Hi George-

An interesting effort with SAC XML and you've made a lot of progress
from the looks of it. I hate to comment too harshly on something that
may already be an established standard, so my comments are only meant
as an observation:

It seems to me that the SAC XML format only half-divorces itself from
fixed-format files due to the naming scheme for the header elements
you've provided. Essentially, you've got just two entity names inside
of <trace> that have no semantic quality to them: 'h' and 'd'.

The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:

<sacdataset>
<trace>
<header>
<stel>

Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.

The reason for the comment is not just for human readability, but for
the notion that many XML parsers will treat such elements as objects,
carrying the element name with them. Having your fields broken down
into meaningful names means that your objects will be more independent
and have stronger encapsulation properties.

If your example format is already set in stone, then please continue
with what works. I just felt the floor was open to address some
naming aesthetics for consideration. I can imagine that writing a
parsing engine for XML in Fortran is headache enough, so I don't want
to cause you a migraine on top of it.

Cheers,

-Rob

On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:

Dear All -

I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.

A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:
- George Helffrich
  
  Re: Subroutine interface to SAC XML datasets
  
  2008-01-31 16:38:25
  
  Dear Rob -
  
  The XML DTD is versioned, and one could imagine defining a new DTD
  with alternative element groupings that would reflect the data
  structure. One can obsess in describing data details, and one view
  won't necessarily coincide with another person or community's view.
  Google Earth's KML comes to mind -- unbelievably baroque for putting
  points on a map for a seismologist, but probably glibly expressive for
  GISers.
  
  Another view to take of the data is programming semantics, however.
  Programmers see 1) header variables that are peeked and poked at; 2)
  data. That was the view I took of the present DTD definition.
  
  On 31 Jan 2008, at 00:12, Robert Casey wrote:
  
  Hi George-
  
  An interesting effort with SAC XML and you've made a lot of progress
  from the looks of it. I hate to comment too harshly on something that
  may already be an established standard, so my comments are only meant
  as an observation:
  
  It seems to me that the SAC XML format only half-divorces itself from
  fixed-format files due to the naming scheme for the header elements
  you've provided. Essentially, you've got just two entity names inside
  of <trace> that have no semantic quality to them: 'h' and 'd'.
  
  The nature of XML is such that the entities tend to me more grouped
  and self descriptive as far as their names go. So instead of having
  <h name="STEL">, why could it not instead be <STEL> ? To indicate
  this as a subcomponent of a SAC header of a SAC trace, you'd form a
  hierarchy:
  
  <sacdataset>
  <trace>
  <header>
  <stel>
  
  Even better would be to just call it <elevation>, maybe with a
  reference to its SAC field abbreviation as an attribute: <elevation
  id="STEL">.
  
  The reason for the comment is not just for human readability, but for
  the notion that many XML parsers will treat such elements as objects,
  carrying the element name with them. Having your fields broken down
  into meaningful names means that your objects will be more independent
  and have stronger encapsulation properties.
  
  If your example format is already set in stone, then please continue
  with what works. I just felt the floor was open to address some
  naming aesthetics for consideration. I can imagine that writing a
  parsing engine for XML in Fortran is headache enough, so I don't want
  to cause you a migraine on top of it.
  
  Cheers,
  
  -Rob
  
  On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
  
  Dear All -
  
  I designed and implemented a subroutine interface to SAC XML
  datasets in the latest release of MacSAC. This message is to make
  you aware of the design ideas for architectural comment. I think
  that it shows the way forward to
  how SAC can move away from from a purely binary data format to one
  that
  embraces current practice in structuring and delivering data.
  
  A test program illustrates the concepts. Here is Fortran source
  code of
  an actual program used for testing during development:
  
  George Helffrich
  george<at>geology.bristol.ac.uk

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: Subroutine interface to SAC XML datasets

Connect