Dear All -
I designed and implemented a subroutine interface to SAC XML datasets
in the latest release of MacSAC. This message is to make you aware of
the design ideas for architectural comment. I think that it shows the
way forward to
how SAC can move away from from a purely binary data format to one that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source code
of
an actual program used for testing during development:
---
C Test program to try out XML query of datasets.
parameter (nmax=20, ndat=4096)
character fn*128, ftype(nmax)*8, fname(nmax)*32, tn*16, name*64
integer qsacxml, rsac1xml, rsac2xml
real data(ndat)
C Test qsacxml
call getarg(1,fn)
if (fn.eq.' ') stop '**No file name given.'
ntmx = qsacxml(fn,nmax,ftype,fname)
write(*,*) 'Got result of ',ntmx,' from qsacxml.'
if (ntmx.gt.0) then
do i=1,min(ntmx,nmax)
ix = index(ftype(i),' ')-1
iy = index(fname(i),' ')-1
write(*,*) 'File ',i,' type ',ftype(i)(1:ix),' ',
& fname(i)(1:iy)
enddo
endif
C Test rsacxml
call getarg(2,tn)
if (tn.ne.' ') then
read(tn,*,iostat=ios) nt
if (ios.ne.0) stop '**Bad trace number.'
if (nt.gt.ntmx) stop '**Off end of traces in file.'
if (ftype(nt) .ne. 'ITIME') stop '**Trace not time series'
call rsacxml(fn,'ITIME',nt,name,ndat,data,data,nerr)
if (nerr.eq.0) then
call getnhv('npts',npts,nerr)
if (nerr.ne.0) stop '**Bad NPTS return from GETNHV.'
write(*,*) 'File ',nt,', named ',name(1:index(name,' ')),
& 'NPTS ',npts
write(*,*) 'First 4 points: ',(data(i),i=1,4)
else
write(*,*) 'NERR ',nerr,' from rsacxml.'
endif
endif
end
-----
There are two routines used here. One, qsacxml, queries the content of
a SAC
XML dataset. It returns three items: 1) number of traces; 2) type of
each
trace; 3) name of each trace. The return of the type and name can be
suppressed by providing zero-length arrays to return the information.
rsacxml reads a particular trace from the dataset. It checks that the
type
of the trace is appropriate and returns data from it if so. Once a
trace is
read, one gets the header information using the standard header access
functions.
There is an analogous function, akin to wsac0, to write a trace into a
dataset. The data can be either compressed (base-64 encoded) or plain
text.
This replicates present SAC rsac1/rsac2/wsac0/wsac1 functionality. One
can
imagine more sophisticated interfaces, possibly Fortran-90 based, that
would
return allocated structures bearing an entire dataset's worth of data.
I'm reporting this to stimulate some discussion on where people see
SAC's data
format going. You also have a test-bed to learn where the XML method's
strengths and weaknesses lie. A key advantage of an XML-based structure
is that it is extensible, both in the expressiveness of header
information
and in the types of data one can put into it. It is also indifferent to
the location of the data: it can read XML from a URL as well as from a
local file. The structure could solve Tim Ahern's problem of long IEEE
real data, as well as the lack of time precision that Bob Herrmann noted
in SAC's present header variable structure. And, of course, it solves
the endianness problems that is our sad legacy to the next generation of
seismic software developers.
A URL with an XML dataset is at
http://www1.gly.bris.ac.uk/MacSAC/korea.sds
for tryout, if you wish to.
George Helffrich
I designed and implemented a subroutine interface to SAC XML datasets
in the latest release of MacSAC. This message is to make you aware of
the design ideas for architectural comment. I think that it shows the
way forward to
how SAC can move away from from a purely binary data format to one that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source code
of
an actual program used for testing during development:
---
C Test program to try out XML query of datasets.
parameter (nmax=20, ndat=4096)
character fn*128, ftype(nmax)*8, fname(nmax)*32, tn*16, name*64
integer qsacxml, rsac1xml, rsac2xml
real data(ndat)
C Test qsacxml
call getarg(1,fn)
if (fn.eq.' ') stop '**No file name given.'
ntmx = qsacxml(fn,nmax,ftype,fname)
write(*,*) 'Got result of ',ntmx,' from qsacxml.'
if (ntmx.gt.0) then
do i=1,min(ntmx,nmax)
ix = index(ftype(i),' ')-1
iy = index(fname(i),' ')-1
write(*,*) 'File ',i,' type ',ftype(i)(1:ix),' ',
& fname(i)(1:iy)
enddo
endif
C Test rsacxml
call getarg(2,tn)
if (tn.ne.' ') then
read(tn,*,iostat=ios) nt
if (ios.ne.0) stop '**Bad trace number.'
if (nt.gt.ntmx) stop '**Off end of traces in file.'
if (ftype(nt) .ne. 'ITIME') stop '**Trace not time series'
call rsacxml(fn,'ITIME',nt,name,ndat,data,data,nerr)
if (nerr.eq.0) then
call getnhv('npts',npts,nerr)
if (nerr.ne.0) stop '**Bad NPTS return from GETNHV.'
write(*,*) 'File ',nt,', named ',name(1:index(name,' ')),
& 'NPTS ',npts
write(*,*) 'First 4 points: ',(data(i),i=1,4)
else
write(*,*) 'NERR ',nerr,' from rsacxml.'
endif
endif
end
-----
There are two routines used here. One, qsacxml, queries the content of
a SAC
XML dataset. It returns three items: 1) number of traces; 2) type of
each
trace; 3) name of each trace. The return of the type and name can be
suppressed by providing zero-length arrays to return the information.
rsacxml reads a particular trace from the dataset. It checks that the
type
of the trace is appropriate and returns data from it if so. Once a
trace is
read, one gets the header information using the standard header access
functions.
There is an analogous function, akin to wsac0, to write a trace into a
dataset. The data can be either compressed (base-64 encoded) or plain
text.
This replicates present SAC rsac1/rsac2/wsac0/wsac1 functionality. One
can
imagine more sophisticated interfaces, possibly Fortran-90 based, that
would
return allocated structures bearing an entire dataset's worth of data.
I'm reporting this to stimulate some discussion on where people see
SAC's data
format going. You also have a test-bed to learn where the XML method's
strengths and weaknesses lie. A key advantage of an XML-based structure
is that it is extensible, both in the expressiveness of header
information
and in the types of data one can put into it. It is also indifferent to
the location of the data: it can read XML from a URL as well as from a
local file. The structure could solve Tim Ahern's problem of long IEEE
real data, as well as the lack of time precision that Bob Herrmann noted
in SAC's present header variable structure. And, of course, it solves
the endianness problems that is our sad legacy to the next generation of
seismic software developers.
A URL with an XML dataset is at
http://www1.gly.bris.ac.uk/MacSAC/korea.sds
for tryout, if you wish to.
George Helffrich
-
Hi George-
An interesting effort with SAC XML and you've made a lot of progress
from the looks of it. I hate to comment too harshly on something that
may already be an established standard, so my comments are only meant
as an observation:
It seems to me that the SAC XML format only half-divorces itself from
fixed-format files due to the naming scheme for the header elements
you've provided. Essentially, you've got just two entity names inside
of <trace> that have no semantic quality to them: 'h' and 'd'.
The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:
<sacdataset>
<trace>
<header>
<stel>
Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.
The reason for the comment is not just for human readability, but for
the notion that many XML parsers will treat such elements as objects,
carrying the element name with them. Having your fields broken down
into meaningful names means that your objects will be more independent
and have stronger encapsulation properties.
If your example format is already set in stone, then please continue
with what works. I just felt the floor was open to address some
naming aesthetics for consideration. I can imagine that writing a
parsing engine for XML in Fortran is headache enough, so I don't want
to cause you a migraine on top of it.
Cheers,
-Rob
On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
Dear All -
I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:
-
Dear Rob -
The XML DTD is versioned, and one could imagine defining a new DTD
with alternative element groupings that would reflect the data
structure. One can obsess in describing data details, and one view
won't necessarily coincide with another person or community's view.
Google Earth's KML comes to mind -- unbelievably baroque for putting
points on a map for a seismologist, but probably glibly expressive for
GISers.
Another view to take of the data is programming semantics, however.
Programmers see 1) header variables that are peeked and poked at; 2)
data. That was the view I took of the present DTD definition.
On 31 Jan 2008, at 00:12, Robert Casey wrote:
Hi George-
George Helffrich
An interesting effort with SAC XML and you've made a lot of progress
from the looks of it. I hate to comment too harshly on something that
may already be an established standard, so my comments are only meant
as an observation:
It seems to me that the SAC XML format only half-divorces itself from
fixed-format files due to the naming scheme for the header elements
you've provided. Essentially, you've got just two entity names inside
of <trace> that have no semantic quality to them: 'h' and 'd'.
The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:
<sacdataset>
<trace>
<header>
<stel>
Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.
The reason for the comment is not just for human readability, but for
the notion that many XML parsers will treat such elements as objects,
carrying the element name with them. Having your fields broken down
into meaningful names means that your objects will be more independent
and have stronger encapsulation properties.
If your example format is already set in stone, then please continue
with what works. I just felt the floor was open to address some
naming aesthetics for consideration. I can imagine that writing a
parsing engine for XML in Fortran is headache enough, so I don't want
to cause you a migraine on top of it.
Cheers,
-Rob
On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
Dear All -
I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:
george<at>geology.bristol.ac.uk
-