Dear All -
The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.
Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is more
concise in the latter case.
On 31 Jan 2008, at 09:34, James Wookey wrote:
george<at>geology.bristol.ac.uk
The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.
Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is more
concise in the latter case.
On 31 Jan 2008, at 09:34, James Wookey wrote:
Hi Rob, George;George Helffrich
I can see the point that the data format currently proposed is a terse
one, basically a minimalist description of a set of SAC traces. This
does have some significant advantages: it is efficient in file size,
and it provides a direct connection to the header variables which,
after all, SAC users (as well as programmers) still have to refer to
by their short name. I don't think we should go the KML route, which
as George says, makes my eyes water with all the detail that is
required. As a 'consultation' format for SACML it is well designed, as
it is simple to understand and is conceptually very close to the
binary SAC format, and George has done sterling work implementing it.
However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay the
price of adopting a verbose format like XML (and I think we should) we
might as well try to reap some of the rewards, and also build enough
flexibility into the format to allow incorporation of future things
(even if they are currently ignored by the current input routines -
the ability to do that is one of the advantages of XML). It seems to
me that one thing worth considering is structuring the header. So one
possible format might look like:
<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>
This has the advantage of still being easy to 'one-sweep' read with
event-driven parsers like SAX (because you simply ignore the container
elements), plus providing a more object-oriented format for use with
parsers like DOM or xpath. We might also want to include/allow a
subgrouping of traces within the file: a <tracegroup> container
element for example.
Cheers,
James
On 31 Jan 2008, at 08:38, George Helffrich wrote:
Dear Rob -
The XML DTD is versioned, and one could imagine defining a new DTD
with alternative element groupings that would reflect the data
structure. One can obsess in describing data details, and one view
won't necessarily coincide with another person or community's view.
Google Earth's KML comes to mind -- unbelievably baroque for putting
points on a map for a seismologist, but probably glibly expressive
for GISers.
Another view to take of the data is programming semantics, however.
Programmers see 1) header variables that are peeked and poked at; 2)
data. That was the view I took of the present DTD definition.
On 31 Jan 2008, at 00:12, Robert Casey wrote:
Hi George-George Helffrich
An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:
It seems to me that the SAC XML format only half-divorces itself
from fixed-format files due to the naming scheme for the header
elements you've provided. Essentially, you've got just two entity
names inside of <trace> that have no semantic quality to them: 'h'
and 'd'.
The nature of XML is such that the entities tend to me more grouped
and self descriptive as far as their names go. So instead of having
<h name="STEL">, why could it not instead be <STEL> ? To indicate
this as a subcomponent of a SAC header of a SAC trace, you'd form a
hierarchy:
<sacdataset>
<trace>
<header>
<stel>
Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute: <elevation
id="STEL">.
The reason for the comment is not just for human readability, but
for the notion that many XML parsers will treat such elements as
objects, carrying the element name with them. Having your fields
broken down into meaningful names means that your objects will be
more independent and have stronger encapsulation properties.
If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to address
some naming aesthetics for consideration. I can imagine that
writing a parsing engine for XML in Fortran is headache enough, so I
don't want to cause you a migraine on top of it.
Cheers,
-Rob
On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
Dear All -
I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to make
you aware of the design ideas for architectural comment. I think
that it shows the way forward to
how SAC can move away from from a purely binary data format to one
that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:
george<at>geology.bristol.ac.uk
george<at>geology.bristol.ac.uk
-
George
I think moving away from the original header names might be a good idea.
The header names such at stlo, stla and kstnm can be cryptic upon
first encounter.
I would suggest more readable names
<station latitude="" longitude="" elevation="" name="" network=""></
station>
or
<station>
<latitude></latitude>
<name></name>
....
</station>
Also, would it be possible to include references to the sac binary
files, as they are now, into the xml format.
<sacdataset>
<trace file="PAS.CI.BHZ.SAC" />
<trace file="PAS.CI.BHE.SAC" />
<trace file="PAS.CI.BHN.SAC" />
<trace id="HRV.IU.BHZ.SAC">
<header >
...
</header>
<trace>
...
</trace>
</trace>
</sacdataset>
Which would allow for storing the data in either the current format
(binary) or in the xml file.
Cheers,
Brian
On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:
Dear All -
The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-
served by the present SAC file structure, is more complete response
information.
Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is
more concise in the latter case.
On 31 Jan 2008, at 09:34, James Wookey wrote:
Hi Rob, George;
George Helffrich
I can see the point that the data format currently proposed is a
terse one, basically a minimalist description of a set of SAC
traces. This does have some significant advantages: it is
efficient in file size, and it provides a direct connection to the
header variables which, after all, SAC users (as well as
programmers) still have to refer to by their short name. I don't
think we should go the KML route, which as George says, makes my
eyes water with all the detail that is required. As a
'consultation' format for SACML it is well designed, as it is
simple to understand and is conceptually very close to the binary
SAC format, and George has done sterling work implementing it.
However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay
the price of adopting a verbose format like XML (and I think we
should) we might as well try to reap some of the rewards, and also
build enough flexibility into the format to allow incorporation of
future things (even if they are currently ignored by the current
input routines - the ability to do that is one of the advantages
of XML). It seems to me that one thing worth considering is
structuring the header. So one possible format might look like:
<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>
This has the advantage of still being easy to 'one-sweep' read
with event-driven parsers like SAX (because you simply ignore the
container elements), plus providing a more object-oriented format
for use with parsers like DOM or xpath. We might also want to
include/allow a subgrouping of traces within the file: a
<tracegroup> container element for example.
Cheers,
James
On 31 Jan 2008, at 08:38, George Helffrich wrote:
Dear Rob -
The XML DTD is versioned, and one could imagine defining a new
DTD with alternative element groupings that would reflect the
data structure. One can obsess in describing data details, and
one view won't necessarily coincide with another person or
community's view. Google Earth's KML comes to mind --
unbelievably baroque for putting points on a map for a
seismologist, but probably glibly expressive for GISers.
Another view to take of the data is programming semantics,
however. Programmers see 1) header variables that are peeked and
poked at; 2) data. That was the view I took of the present DTD
definition.
On 31 Jan 2008, at 00:12, Robert Casey wrote:
Hi George-
George Helffrich
An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:
It seems to me that the SAC XML format only half-divorces
itself from fixed-format files due to the naming scheme for the
header elements you've provided. Essentially, you've got just
two entity names inside of <trace> that have no semantic quality
to them: 'h' and 'd'.
The nature of XML is such that the entities tend to me more
grouped and self descriptive as far as their names go. So
instead of having <h name="STEL">, why could it not instead be
<STEL> ? To indicate this as a subcomponent of a SAC header of
a SAC trace, you'd form a hierarchy:
<sacdataset>
<trace>
<header>
<stel>
Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute:
<elevation id="STEL">.
The reason for the comment is not just for human readability,
but for the notion that many XML parsers will treat such
elements as objects, carrying the element name with them.
Having your fields broken down into meaningful names means that
your objects will be more independent and have stronger
encapsulation properties.
If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to
address some naming aesthetics for consideration. I can imagine
that writing a parsing engine for XML in Fortran is headache
enough, so I don't want to cause you a migraine on top of it.
Cheers,
-Rob
On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
Dear All -
I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to
make you aware of the design ideas for architectural comment.
I think that it shows the way forward to
how SAC can move away from from a purely binary data format to
one that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran
source code of
an actual program used for testing during development:
george<at>geology.bristol.ac.uk
george<at>geology.bristol.ac.uk
_______________________________________________
sac-dev mailing list
sac-dev<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/sac-dev
-
Dear Brian -
Indeed, header field names are cryptic and they could be improved on.
With a clearer idea of the intended users and/or viewers of the
elements, the name scheme should clarify.
I'm not sure about <trace file="PAS.CI.BHN.SAC" />, but it is an
interesting idea. It is analogous to a Unix symbolic link, with
similar semantic confusion (stat/lstat). Here it raises the issues of
1) whether a dataset is entirely self-contained; and 2) what it means
to "write" a dataset that contains links as well as trace data.
On 31 Jan 2008, at 14:46, Brian Savage wrote:
George
George Helffrich
I think moving away from the original header names might be a good
idea.
The header names such at stlo, stla and kstnm can be cryptic upon
first encounter.
I would suggest more readable names
<station latitude="" longitude="" elevation="" name=""
network=""></station>
or
<station>
<latitude></latitude>
<name></name>
....
</station>
Also, would it be possible to include references to the sac binary
files, as they are now, into the xml format.
<sacdataset>
<trace file="PAS.CI.BHZ.SAC" />
<trace file="PAS.CI.BHE.SAC" />
<trace file="PAS.CI.BHN.SAC" />
<trace id="HRV.IU.BHZ.SAC">
<header >
...
</header>
<trace>
...
</trace>
</trace>
</sacdataset>
Which would allow for storing the data in either the current format
(binary) or in the xml file.
Cheers,
Brian
On Jan 31, 2008, at 4:59 AM , George Helffrich wrote:
Dear All -
_______________________________________________
The key idea here, which is a good one, is subgroupings of the
information in the header: 1) station information; 2) event
information; 3) data characteristics. A fourth item, not well-served
by the present SAC file structure, is more complete response
information.
Whether you express header information by <stel>500</stel> or <h
name="stel">500</h> is a stylistic choice. The DTD description is
more concise in the latter case.
On 31 Jan 2008, at 09:34, James Wookey wrote:
Hi Rob, George;
George Helffrich
I can see the point that the data format currently proposed is a
terse one, basically a minimalist description of a set of SAC
traces. This does have some significant advantages: it is efficient
in file size, and it provides a direct connection to the header
variables which, after all, SAC users (as well as programmers) still
have to refer to by their short name. I don't think we should go the
KML route, which as George says, makes my eyes water with all the
detail that is required. As a 'consultation' format for SACML it is
well designed, as it is simple to understand and is conceptually
very close to the binary SAC format, and George has done sterling
work implementing it.
However, in the longer term, I can also see the value in a limited
expansion of the structure of SACML, if it is going to represent a
large step forward in the SAC file format. If we are going to pay
the price of adopting a verbose format like XML (and I think we
should) we might as well try to reap some of the rewards, and also
build enough flexibility into the format to allow incorporation of
future things (even if they are currently ignored by the current
input routines - the ability to do that is one of the advantages of
XML). It seems to me that one thing worth considering is structuring
the header. So one possible format might look like:
<sacdataset>
<trace>
<header>
<station>
<kstnm>TEST</kstnm>
<stla>40</stla>
...
</station>
<event>
<evla>-20</evla>
...
</event>
<trace_info>
<delta>0.05</delta>
...
</trace_info>
</header>
<data>
...
</data>
</trace>
</sacdataset>
This has the advantage of still being easy to 'one-sweep' read with
event-driven parsers like SAX (because you simply ignore the
container elements), plus providing a more object-oriented format
for use with parsers like DOM or xpath. We might also want to
include/allow a subgrouping of traces within the file: a
<tracegroup> container element for example.
Cheers,
James
On 31 Jan 2008, at 08:38, George Helffrich wrote:
Dear Rob -
The XML DTD is versioned, and one could imagine defining a new DTD
with alternative element groupings that would reflect the data
structure. One can obsess in describing data details, and one view
won't necessarily coincide with another person or community's view.
Google Earth's KML comes to mind -- unbelievably baroque for
putting points on a map for a seismologist, but probably glibly
expressive for GISers.
Another view to take of the data is programming semantics,
however. Programmers see 1) header variables that are peeked and
poked at; 2) data. That was the view I took of the present DTD
definition.
On 31 Jan 2008, at 00:12, Robert Casey wrote:
Hi George-
George Helffrich
An interesting effort with SAC XML and you've made a lot of
progress from the looks of it. I hate to comment too harshly on
something that may already be an established standard, so my
comments are only meant as an observation:
It seems to me that the SAC XML format only half-divorces itself
from fixed-format files due to the naming scheme for the header
elements you've provided. Essentially, you've got just two entity
names inside of <trace> that have no semantic quality to them: 'h'
and 'd'.
The nature of XML is such that the entities tend to me more
grouped and self descriptive as far as their names go. So instead
of having <h name="STEL">, why could it not instead be <STEL> ?
To indicate this as a subcomponent of a SAC header of a SAC trace,
you'd form a hierarchy:
<sacdataset>
<trace>
<header>
<stel>
Even better would be to just call it <elevation>, maybe with a
reference to its SAC field abbreviation as an attribute:
<elevation id="STEL">.
The reason for the comment is not just for human readability, but
for the notion that many XML parsers will treat such elements as
objects, carrying the element name with them. Having your fields
broken down into meaningful names means that your objects will be
more independent and have stronger encapsulation properties.
If your example format is already set in stone, then please
continue with what works. I just felt the floor was open to
address some naming aesthetics for consideration. I can imagine
that writing a parsing engine for XML in Fortran is headache
enough, so I don't want to cause you a migraine on top of it.
Cheers,
-Rob
On Jan 30, 2008, at 5:05 AM, George Helffrich wrote:
Dear All -
I designed and implemented a subroutine interface to SAC XML
datasets in the latest release of MacSAC. This message is to
make you aware of the design ideas for architectural comment. I
think that it shows the way forward to
how SAC can move away from from a purely binary data format to
one that
embraces current practice in structuring and delivering data.
A test program illustrates the concepts. Here is Fortran source
code of
an actual program used for testing during development:
george<at>geology.bristol.ac.uk
george<at>geology.bristol.ac.uk
_______________________________________________
sac-dev mailing list
sac-dev<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/sac-dev
sac-dev mailing list
sac-dev<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/sac-dev
george<at>geology.bristol.ac.uk
-