SAGE: Thread: sacswap's byte swapping bug

Started: 2008-08-31 04:33:29

Last activity: 2008-09-01 02:07:21

Topics: SAC Developers

Kuang He

2008-08-31 04:33:29

Hi,

I found a problem with sacswap. Although nowadays with sac's ability
to read both big- and little-endian sac files, sacswap must be less
often used, some people I know are still using it since some of the
legacy software such as pssac [3] can only deal with big-endian sac
files.

PROBLEM:

I used sacswap on my linux system to convert a random sac file vel.sac
(little-endian) to vel.sac.swap (big-endian), and then from
vel.sac.swap (big-endian) to vel.sac.swap.swap (little-endian). By
doing a binary file comparison, it turns out that files vel.sac and
vel.sac.swap.swap are not the same. I read files vel.sac and
vel.sac.swap into matlab and found that 8 or so data points from them
were not the same, i.e. the differences were first introduced in the
first sacswap conversion. I've confirmed that all the file headers
from the two files are the same, and NPTS of vel.sac is more than
10,000, so the error rate is pretty small.

The sac file I used is here, in case you want to try it:
http://maxwell.phys.uconn.edu/~icrazy/sac/vel.sac

To narrow the problem down, I extracted one of the numbers that was
changed after doing sacswap conversions and made a small program to
repeat the problem. The program is here:
http://maxwell.phys.uconn.edu/~icrazy/sac/endian1.c

I found that the culprit is this function float_swap() below, which
according to [1] is not a good way to implement the byte swapping
feature.

float float_swap(char cbuf[]) {
union {
char cval[4];
float fval;
} f_union;

f_union.cval[3] = cbuf[0];
f_union.cval[2] = cbuf[1];
f_union.cval[1] = cbuf[2];
f_union.cval[0] = cbuf[3];
return(f_union.fval);
}

This program endian1.c takes four bytes (0xff, 0xae, 0x47, and 0xc5)
as input and then swap the bytes (1st with 4th, 2nd with 3rd). The
correct output should be:

ff ae 47 c5 --> c5 47 ae ff

However, in my system (Pentium 4 CPU + GCC 4.2.3), using different
optimization parameters with gcc, I got different results:

$ gcc endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ee ff
(wrong, "ae" was mysteriously changed to "ee")

$ gcc -O2 endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ee ff
(wrong, "ae" was mysteriously changed to "ee")

$ gcc -O3 endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ae ff
(correct)

I also found that this behavior is system dependent. I think it might
only be dependent on the gcc version, but I'm listing the CPU as well.

Systems which only give correct byte swapping results when using gcc -O3:
- Intel(R) Pentium(R) 4 CPU 2.40GHz
gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
- AMD Athlon(tm) MP 2800+ (2133MHz)
gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52)

Systems which give correct byte swapping results not matter what:
- Intel(R) Xeon(R) CPU E5320 @ 1.86GHz
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
- Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

FIX:

Ref [2] (page 9) gives some example byte swapping macros, which in my
opinion are equivalent to the byteswap() function in sac's source code
src/ucf/byteswap.c . I took this function and adapted my program
endian1.c to endian_byteswap.c below:

http://maxwell.phys.uconn.edu/~icrazy/sac/endian_byteswap.c

and everything works!

I also did a complete update of sacswap:

http://maxwell.phys.uconn.edu/~icrazy/sac/sacswap.c

using byteswap(), while trying to keep as much original code intact as possible.

P.S. I don't know about the attachment policy of this mailing list. If
needed (maybe for archiving purposes), I can attach all the program
files mentioned here in a reply post.

[1] http://www.power.org/devcon/07/Session_Downloads/PADC07_McQuaid_PA_Wrobel_Heinz_20070910.pdf

[2] Endianness White Paper
http://www.intel.com/design/intarch/papers/endian.pdf

[3] pssac
http://www.eas.slu.edu/People/LZhu/downloads/pssac.tar

Best regards,

--
Kuang He
Department of Physics
University of Connecticut
Storrs, CT 06269-3046

Tel: +1.860.486.4919
Web: http://www.phys.uconn.edu/~he/

George Helffrich

Re: sacswap's byte swapping bug

2008-08-31 20:06:23

Dear Huang -

Good work tracking this down. What it shows is that you are running
into compiler problems! Even though your reference [1] says that using
unions to swap is bad, there is actually nothing wrong with it. It
might be inefficient on hardware, and, though [1] does not say why,
there are two possible problems that could arise: 1) loss of
efficiency in swapping due to sign extension; 2) alignment problems in
the union. I suspect that the problem you are having is due to botched
optimization and/or sign extension problems.

Can you make the following test? Change your routine like so, and run
your tests:

float float_swap(unsigned char cbuf[]) {
union {
unsigned char cval[4];
float fval;
} f_union;

f_union.cval[3] = cbuf[0];
f_union.cval[2] = cbuf[1];
f_union.cval[1] = cbuf[2];
f_union.cval[0] = cbuf[3];
return(f_union.fval);
}

That rules out sign extension as a contributor to the problems.

On 31 Aug 2008, at 02:33, Kuang He wrote:

Hi,

I found a problem with sacswap. Although nowadays with sac's ability
to read both big- and little-endian sac files, sacswap must be less
often used, some people I know are still using it since some of the
legacy software such as pssac [3] can only deal with big-endian sac
files.

PROBLEM:

I used sacswap on my linux system to convert a random sac file vel.sac
(little-endian) to vel.sac.swap (big-endian), and then from
vel.sac.swap (big-endian) to vel.sac.swap.swap (little-endian). By
doing a binary file comparison, it turns out that files vel.sac and
vel.sac.swap.swap are not the same. I read files vel.sac and
vel.sac.swap into matlab and found that 8 or so data points from them
were not the same, i.e. the differences were first introduced in the
first sacswap conversion. I've confirmed that all the file headers
from the two files are the same, and NPTS of vel.sac is more than
10,000, so the error rate is pretty small.

The sac file I used is here, in case you want to try it:
http://maxwell.phys.uconn.edu/~icrazy/sac/vel.sac

To narrow the problem down, I extracted one of the numbers that was
changed after doing sacswap conversions and made a small program to
repeat the problem. The program is here:
http://maxwell.phys.uconn.edu/~icrazy/sac/endian1.c

I found that the culprit is this function float_swap() below, which
according to [1] is not a good way to implement the byte swapping
feature.

float float_swap(char cbuf[]) {
union {
char cval[4];
float fval;
} f_union;

f_union.cval[3] = cbuf[0];
f_union.cval[2] = cbuf[1];
f_union.cval[1] = cbuf[2];
f_union.cval[0] = cbuf[3];
return(f_union.fval);
}

This program endian1.c takes four bytes (0xff, 0xae, 0x47, and 0xc5)
as input and then swap the bytes (1st with 4th, 2nd with 3rd). The
correct output should be:

ff ae 47 c5 --> c5 47 ae ff

However, in my system (Pentium 4 CPU + GCC 4.2.3), using different
optimization parameters with gcc, I got different results:

$ gcc endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ee ff
(wrong, "ae" was mysteriously changed to "ee")

$ gcc -O2 endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ee ff
(wrong, "ae" was mysteriously changed to "ee")

$ gcc -O3 endian1.c -o endian1 && ./endian1
ff ae 47 c5 --> c5 47 ae ff
(correct)

I also found that this behavior is system dependent. I think it might
only be dependent on the gcc version, but I'm listing the CPU as well.

Systems which only give correct byte swapping results when using gcc
-O3:
- Intel(R) Pentium(R) 4 CPU 2.40GHz
gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
- AMD Athlon(tm) MP 2800+ (2133MHz)
gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52)

Systems which give correct byte swapping results not matter what:
- Intel(R) Xeon(R) CPU E5320 @ 1.86GHz
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
- Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz
gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

FIX:

Ref [2] (page 9) gives some example byte swapping macros, which in my
opinion are equivalent to the byteswap() function in sac's source code
src/ucf/byteswap.c . I took this function and adapted my program
endian1.c to endian_byteswap.c below:

http://maxwell.phys.uconn.edu/~icrazy/sac/endian_byteswap.c

and everything works!

I also did a complete update of sacswap:

http://maxwell.phys.uconn.edu/~icrazy/sac/sacswap.c

using byteswap(), while trying to keep as much original code intact as
possible.

P.S. I don't know about the attachment policy of this mailing list. If
needed (maybe for archiving purposes), I can attach all the program
files mentioned here in a reply post.

[1]
http://www.power.org/devcon/07/Session_Downloads/
PADC07_McQuaid_PA_Wrobel_Heinz_20070910.pdf

[2] Endianness White Paper
http://www.intel.com/design/intarch/papers/endian.pdf

[3] pssac
http://www.eas.slu.edu/People/LZhu/downloads/pssac.tar

Best regards,

--
Kuang He
Department of Physics
University of Connecticut
Storrs, CT 06269-3046

Tel: +1.860.486.4919
Web: http://www.phys.uconn.edu/~he/
_______________________________________________
sac-dev mailing list
sac-dev<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/sac-dev

George Helffrich
george<at>geology.bristol.ac.uk
- Kuang He
  
  Re: sacswap's byte swapping bug
  
  2008-08-31 17:45:21
  
  On Sun, Aug 31, 2008 at 8:06 AM, George Helffrich
  <george<at>geology.bristol.ac.uk> wrote:
  
  Dear Huang -
  
  Good work tracking this down. What it shows is that you are running
  into compiler problems! Even though your reference [1] says that using
  unions to swap is bad, there is actually nothing wrong with it. It might be
  
  Dear George,
  
  I also suspected that it could be a compiler problem. However, I
  tested endian1.c on another Pentium 4 machine with Visual C 6.0, and
  it always gave wrong answer. I know that maybe both gcc and VC6 have
  problems with union implementations, but chances are small.
  
  inefficient on hardware, and, though [1] does not say why, there are two
  possible problems that could arise: 1) loss of efficiency in swapping due
  to sign extension; 2) alignment problems in the union. I suspect that the
  problem you are having is due to botched optimization and/or sign extension
  problems.
  
  Can you make the following test? Change your routine like so, and
  run your tests:
  
  float float_swap(unsigned char cbuf[]) {
  union {
  unsigned char cval[4];
  float fval;
  } f_union;
  
  f_union.cval[3] = cbuf[0];
  f_union.cval[2] = cbuf[1];
  f_union.cval[1] = cbuf[2];
  f_union.cval[0] = cbuf[3];
  return(f_union.fval);
  }
  
  That rules out sign extension as a contributor to the problems.
  
  So I made the test, and nothing changed.
  
  $ diff -u endian1.c endian1-unsigned.c
  --- endian1.c 2008-08-30 19:10:29.000000000 -0400
  +++ endian1-unsigned.c 2008-08-31 10:00:45.000000000 -0400
  @@ -1,9 +1,9 @@
  #include <stdio.h>
  #include <stdlib.h>
  
  -float float_swap(char cbuf[]) {
  +float float_swap(unsigned char cbuf[]) {
  union {
  - char cval[4];
  + unsigned char cval[4];
  float fval;
  } f_union;
  
  @@ -17,7 +17,7 @@
  int main()
  {
  float f;
  - char cbuf[4];
  + unsigned char cbuf[4];
  unsigned char *p;
  
  cbuf[0] = 0xff;
  
  $ gcc -Wall ./endian1-unsigned.c -o endian1-unsigned && ./endian1-unsigned
  ff ae 47 c5 --> c5 47 ee ff
  (wrong result)
  
  $ gcc -O3 -Wall ./endian1-unsigned.c -o endian1-unsigned && ./endian1-unsigned
  ff ae 47 c5 --> c5 47 ae ff
  (correct result)
  
  I also did a sizeof() of the above union, and it is 4 bytes on my system.
  
  Best regards,
  
  --
  Kuang He
  Department of Physics
  University of Connecticut
  Storrs, CT 06269-3046
  
  Tel: +1.860.486.4919
  Web: http://www.phys.uconn.edu/~he/
  - George Helffrich
    
    Re: sacswap's byte swapping bug
    
    2008-08-31 23:56:03
    
    Dear Huang -
    
    Thanks for making the check. Your test program is quite simple and
    straightforward.
    
    I was able to reproduce the problem on an Intel Mac mini (gcc 4.0.1).
    It turns out that the code to swap is fine. If you change the floats
    to ints, then the byte swapping works OK.
    
    Here's what the problem is, however: your number is an IEEE NaN!
    When it is returned by the function and stored, then the bit pattern
    changes. The reason the pattern changes is that your number is a
    signaling NaN. A processor is allowed to clear the flag when a
    floating point number is stored. That's the reason that the bit
    changes. Depending on whether a floating point store or a fixed point
    store or move is used, the result can differ depending on processor
    and/or code optimization level.
    
    So there's no problem with the byte swapping code.
    
    On 31 Aug 2008, at 15:45, Kuang He wrote:
    
    On Sun, Aug 31, 2008 at 8:06 AM, George Helffrich
    <george<at>geology.bristol.ac.uk> wrote:
    
    Dear Huang -
    
    Good work tracking this down. What it shows is that you are
    running
    into compiler problems! Even though your reference [1] says that
    using
    unions to swap is bad, there is actually nothing wrong with it. It
    might be
    
    Dear George,
    
    I also suspected that it could be a compiler problem. However, I
    tested endian1.c on another Pentium 4 machine with Visual C 6.0, and
    it always gave wrong answer. I know that maybe both gcc and VC6 have
    problems with union implementations, but chances are small.
    
    inefficient on hardware, and, though [1] does not say why, there are
    two
    possible problems that could arise: 1) loss of efficiency in
    swapping due
    to sign extension; 2) alignment problems in the union. I suspect
    that the
    problem you are having is due to botched optimization and/or sign
    extension
    problems.
    
    Can you make the following test? Change your routine like so,
    and
    run your tests:
    
    float float_swap(unsigned char cbuf[]) {
    union {
    unsigned char cval[4];
    float fval;
    } f_union;
    
    f_union.cval[3] = cbuf[0];
    f_union.cval[2] = cbuf[1];
    f_union.cval[1] = cbuf[2];
    f_union.cval[0] = cbuf[3];
    return(f_union.fval);
    }
    
    That rules out sign extension as a contributor to the problems.
    
    So I made the test, and nothing changed.
    
    $ diff -u endian1.c endian1-unsigned.c
    --- endian1.c 2008-08-30 19:10:29.000000000 -0400
    +++ endian1-unsigned.c 2008-08-31 10:00:45.000000000 -0400
    @@ -1,9 +1,9 @@
    #include <stdio.h>
    #include <stdlib.h>
    
    -float float_swap(char cbuf[]) {
    +float float_swap(unsigned char cbuf[]) {
    union {
    - char cval[4];
    + unsigned char cval[4];
    float fval;
    } f_union;
    
    @@ -17,7 +17,7 @@
    int main()
    {
    float f;
    - char cbuf[4];
    + unsigned char cbuf[4];
    unsigned char *p;
    
    cbuf[0] = 0xff;
    
    $ gcc -Wall ./endian1-unsigned.c -o endian1-unsigned &&
    ./endian1-unsigned
    ff ae 47 c5 --> c5 47 ee ff
    (wrong result)
    
    $ gcc -O3 -Wall ./endian1-unsigned.c -o endian1-unsigned &&
    ./endian1-unsigned
    ff ae 47 c5 --> c5 47 ae ff
    (correct result)
    
    I also did a sizeof() of the above union, and it is 4 bytes on my
    system.
    
    Best regards,
    
    --
    Kuang He
    Department of Physics
    University of Connecticut
    Storrs, CT 06269-3046
    
    Tel: +1.860.486.4919
    Web: http://www.phys.uconn.edu/~he/
    _______________________________________________
    sac-dev mailing list
    sac-dev<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/sac-dev
    
    George Helffrich
    george<at>geology.bristol.ac.uk
    
    Kuang He
    
    Re: sacswap's byte swapping bug
    
    2008-08-31 20:41:24
    
    Dear George,
    
    Thanks for the enlightenment, I did not think of NaN before and was
    not aware of fact that the number NaN is allowed to be changed by the
    processor.
    
    Just to clarify, the my input number specified in endian1.c (listed
    below) under little-endian systems is not NaN, and it actually
    represents the number 0xC547AEFF. After byte swapping, this number
    will become 0xFFAE7A54, and appear to be signalling NaN to a little
    endian system.
    
    cbuf[0] = 0xff;
    cbuf[1] = 0xae;
    cbuf[2] = 0x47;
    cbuf[3] = 0xc5;
    
    Still, I think the original sacswap program can lead to a very
    undesirable effect, that is, after using it, the values from sac files
    may change. A simple way to demonstrate it is again to use the sac
    file I've provided:
    
    http://maxwell.phys.uconn.edu/~icrazy/sac/vel.sac
    
    First, we byte swap it:
    
    $ sacswap vel.sac
    sacswap: Writing vel.sac.swap with npts = 14054 native => non-native
    
    Then we convert the sac files to ascii files in order to make a text comparison:
    
    $ cat sac2ascii
    CONVERT FROM SAC vel.sac TO ALPHA vel.sac.txt
    CONVERT FROM SAC vel.sac.swap TO ALPHA vel.sac.swap.txt
    quit
    
    $ sac sac2ascii
    
    If we have used the original version of sacswap in the above byte
    swapping step, files vel.sac.txt and vel.sac.swap.txt would differ on
    my system (and on some other systems as well, but not all of them).
    
    $ diff -u vel.sac.txt vel.sac.swap.txt
    --- vel.sac.txt 2008-08-31 13:06:19.000000000 -0400
    +++ vel.sac.swap.txt 2008-08-31 13:06:19.000000000 -0400
    @@ -9,7 +9,7 @@
    -12345 -12345 -12345 -12345 -12345
    -12345 -12345 -12345 -12345 -12345
    2848.977 304.7868 107.2769 25.62435 -12345
    - -12345 -711.2866 0 0 -12345
    + -12345 -711.2802 0 0 -12345
    -12345 -12345 -12345 -12345 -12345
    -12345 -12345 -12345 -12345 -12345
    2008 133 6 30 54
    @@ -1767,7 +1767,7 @@
    -442.2065 -627.9093 -794.7451 -949.0807 -1086.678
    -1217.899 -1345.818 -1483.826 -1629.344 -1787.957
    -1955.329 -2136.53 -2321.403 -2509.167 -2689.645
    - -2865.094 -3031.295 -3194.937 -3348.609 -3491.529
    + -2865.094 -3031.295 -3198.937 -3348.609 -3491.529
    -3615.532 -3724.585 -3811.694 -3886.186 -3950.19
    -4018.139 -4088.735 -4161.889 -4223.314 -4275.219
    -4315.033 -4355.229 -4397.055 -4444.013 -4481.068
    @@ -1781,7 +1781,7 @@
    -2900.382 -2733.463 -2557.34 -2380.768 -2199.411
    -2018.65 -1835.886 -1663.406 -1504.461 -1375.213
    -1279.549 -1231.151 -1227.254 -1269.1 -1342.068
    - -1444.266 -1564.187 -1702.966 -1845.73 -1985.935
    + -1444.266 -1566.187 -1702.966 -1845.73 -1985.935
    -2108.854 -2215.427 -2302.428 -2382.116 -2451.169
    -2512.34 -2555.602 -2583.942 -2590.573 -2583.463
    -2562.272 -2535.475 -2494.733 -2445.612 -2387.887
    @@ -1982,7 +1982,7 @@
    -80449.42 -80341.84 -80221.92 -80085.12 -79943.02
    -79792.08 -79642 -79486.1 -79331.84 -79174.38
    -79021.8 -78862.32 -78700.26 -78527.55 -78352.4
    - -78166.28 -77976.15 -77775.91 -77575.99 -77370.91
    + -78166.28 -77976.15 -77775.91 -77703.99 -77370.91
    -77165.37 -76944.98 -76712.09 -76458.49 -76193.1
    -75908.02 -75610.84 -75292.95 -74960.52 -74601.54
    -74220.37 -73809.9 -73374.59 -72901.31 -72395.22
    @@ -2074,8 +2074,8 @@
    49460.56 49606.18 49739.93 49870.56 49992.12
    50113.37 50231.81 50355.56 50468.37 50569.93
    50647.43 50709.93 50754 50788.06 50804
    - 50807.43 50786.81 50750.87 50695.25 50621.5
    - 50518.37 50394.31 50243.06 50076.5 49892.75
    + 50807.43 50786.81 50750.87 50695.25 50685.5
    + 50518.37 50394.31 50243.06 50140.5 49892.75
    49698.37 49488.37 49277.12 49060.56 48854.31
    48656.5 48478.37 48318.68 48187.75 48076.5
    47991.19 47920.56 47864 47810.25 47769.94
    @@ -2084,12 +2084,12 @@
    47841.5 47901.19 47964.31 48041.5 48120.25
    48206.5 48290.25 48380.87 48463.53 48546.96
    48624.78 48707.28 48787.9 48879.15 48974
    - 49080.4 49190.4 49311.5 49432.28 49558.37
    + 49080.4 49190.4 49375.5 49432.28 49558.37
    49682.12 49812.43 49940.4 50073.37 50203.21
    50335.71 50457.28 50570.56 50662.43 50738.21
    50786.5 50814.46 50813.06 50793.68 50747.2
    50680.64 50579.23 50446.34 50269.93 50061.26
    - 49816.5 49547.98 49246.26 48914.78 48539.07
    + 49880.5 49547.98 49246.26 48914.78 48539.07
    48130.72 47686.66 47220.05 46724.55 46213.8
    45684.16 45148.57 44598.93 44042.48 43467.42
    42878.2 42263.68 41629.63 40966.04 40282.76
    @@ -2655,7 +2655,7 @@
    -14297.84 -14457.91 -14600.05 -14759.44 -14899.54
    -15057.7 -15197.83 -15354.5 -15491.26 -15648.15
    -15789.54 -15950.98 -16095.28 -16259.3 -16402.67
    - -16560.03 -16695.9 -16844.77 -16969.5 -17109.72
    + -16560.03 -16695.9 -16844.77 -17001.5 -17109.72
    -17225.52 -17350.89 -17453.12 -17570.5 -17664.18
    -17769.41 -17850.37 -17944.39 -18012.98 -18092.75
    -18143.65 -18201.23 -18230.68 -18269.39 -18278.61
    
    One can of course argue that the difference in values of the changed
    data are not great, but it is certainly not desirable for users to
    have their data changed without telling them. However, if we have used
    the updated version of sacswap I've provided before in the above byte
    swapping step, files vel.sac.txt and vel.sac.swap.txt will always be
    the same (at least it is like this according to my own tests).
    
    Therefore, to eliminate the possible unwitting data changing effects,
    I think we should update the original sacswap program to one that uses
    byte_swap(). One can still argue that it might be easier to just add a
    "-O3" parameter when compiling the original sacswap program, which
    seems to eliminate the problem withing touching the source code, but
    this trick can not be guaranteed to work on all the platforms.
    
    Best regards,
    
    --
    Kuang He
    Department of Physics
    University of Connecticut
    Storrs, CT 06269-3046
    
    Tel: +1.860.486.4919
    Web: http://www.phys.uconn.edu/~he/
    
    On Sun, Aug 31, 2008 at 11:56 AM, George Helffrich
    <george<at>gly.bris.ac.uk> wrote:
    
    Dear Huang -
    
    Thanks for making the check. Your test program is quite simple and
    straightforward.
    
    I was able to reproduce the problem on an Intel Mac mini (gcc 4.0.1).
    It turns out that the code to swap is fine. If you change the floats to
    ints, then the byte swapping works OK.
    
    Here's what the problem is, however: your number is an IEEE NaN!
    When it is returned by the function and stored, then the bit pattern
    changes. The reason the pattern changes is that your number is a signaling
    NaN. A processor is allowed to clear the flag when a floating point number
    is stored. That's the reason that the bit changes. Depending on whether a
    floating point store or a fixed point store or move is used, the result can
    differ depending on processor and/or code optimization level.
    
    So there's no problem with the byte swapping code.
    
    George Helffrich
    
    Re: sacswap's byte swapping bug
    
    2008-09-01 02:07:21
    
    Dear Kuang -
    
    If you demand that type of compatibility, then the straightforward way
    to change sacswap would be to change the type of float_swap (and the
    type in the union) to int, and store the swapped value as an int (with
    a suitable cast). That would ensure against any bit changes due to NaN
    behavior on intermediate machine architectures. Can you provide a diff
    to sacswap.c do that?
    
    On 31 Aug 2008, at 18:41, Kuang He wrote:
    
    Dear George,
    
    Thanks for the enlightenment, I did not think of NaN before and was
    not aware of fact that the number NaN is allowed to be changed by the
    processor.
    
    Just to clarify, the my input number specified in endian1.c (listed
    below) under little-endian systems is not NaN, and it actually
    represents the number 0xC547AEFF. After byte swapping, this number
    will become 0xFFAE7A54, and appear to be signalling NaN to a little
    endian system.
    
    cbuf[0] = 0xff;
    cbuf[1] = 0xae;
    cbuf[2] = 0x47;
    cbuf[3] = 0xc5;
    
    Still, I think the original sacswap program can lead to a very
    undesirable effect, that is, after using it, the values from sac files
    may change. A simple way to demonstrate it is again to use the sac
    file I've provided:
    
    http://maxwell.phys.uconn.edu/~icrazy/sac/vel.sac
    
    First, we byte swap it:
    
    $ sacswap vel.sac
    sacswap: Writing vel.sac.swap with npts = 14054 native => non-native
    
    Then we convert the sac files to ascii files in order to make a text
    comparison:
    
    $ cat sac2ascii
    CONVERT FROM SAC vel.sac TO ALPHA vel.sac.txt
    CONVERT FROM SAC vel.sac.swap TO ALPHA vel.sac.swap.txt
    quit
    
    $ sac sac2ascii
    
    If we have used the original version of sacswap in the above byte
    swapping step, files vel.sac.txt and vel.sac.swap.txt would differ on
    my system (and on some other systems as well, but not all of them).
    
    $ diff -u vel.sac.txt vel.sac.swap.txt
    --- vel.sac.txt 2008-08-31 13:06:19.000000000 -0400
    +++ vel.sac.swap.txt 2008-08-31 13:06:19.000000000 -0400
    @@ -9,7 +9,7 @@
    -12345 -12345 -12345 -12345
    -12345
    -12345 -12345 -12345 -12345
    -12345
    2848.977 304.7868 107.2769 25.62435
    -12345
    - -12345 -711.2866 0 0
    -12345
    + -12345 -711.2802 0 0
    -12345
    -12345 -12345 -12345 -12345
    -12345
    -12345 -12345 -12345 -12345
    -12345
    2008 133 6 30 54
    @@ -1767,7 +1767,7 @@
    -442.2065 -627.9093 -794.7451 -949.0807
    -1086.678
    -1217.899 -1345.818 -1483.826 -1629.344
    -1787.957
    -1955.329 -2136.53 -2321.403 -2509.167
    -2689.645
    - -2865.094 -3031.295 -3194.937 -3348.609
    -3491.529
    + -2865.094 -3031.295 -3198.937 -3348.609
    -3491.529
    -3615.532 -3724.585 -3811.694 -3886.186
    -3950.19
    -4018.139 -4088.735 -4161.889 -4223.314
    -4275.219
    -4315.033 -4355.229 -4397.055 -4444.013
    -4481.068
    @@ -1781,7 +1781,7 @@
    -2900.382 -2733.463 -2557.34 -2380.768
    -2199.411
    -2018.65 -1835.886 -1663.406 -1504.461
    -1375.213
    -1279.549 -1231.151 -1227.254 -1269.1
    -1342.068
    - -1444.266 -1564.187 -1702.966 -1845.73
    -1985.935
    + -1444.266 -1566.187 -1702.966 -1845.73
    -1985.935
    -2108.854 -2215.427 -2302.428 -2382.116
    -2451.169
    -2512.34 -2555.602 -2583.942 -2590.573
    -2583.463
    -2562.272 -2535.475 -2494.733 -2445.612
    -2387.887
    @@ -1982,7 +1982,7 @@
    -80449.42 -80341.84 -80221.92 -80085.12
    -79943.02
    -79792.08 -79642 -79486.1 -79331.84
    -79174.38
    -79021.8 -78862.32 -78700.26 -78527.55
    -78352.4
    - -78166.28 -77976.15 -77775.91 -77575.99
    -77370.91
    + -78166.28 -77976.15 -77775.91 -77703.99
    -77370.91
    -77165.37 -76944.98 -76712.09 -76458.49
    -76193.1
    -75908.02 -75610.84 -75292.95 -74960.52
    -74601.54
    -74220.37 -73809.9 -73374.59 -72901.31
    -72395.22
    @@ -2074,8 +2074,8 @@
    49460.56 49606.18 49739.93 49870.56
    49992.12
    50113.37 50231.81 50355.56 50468.37
    50569.93
    50647.43 50709.93 50754 50788.06
    50804
    - 50807.43 50786.81 50750.87 50695.25
    50621.5
    - 50518.37 50394.31 50243.06 50076.5
    49892.75
    + 50807.43 50786.81 50750.87 50695.25
    50685.5
    + 50518.37 50394.31 50243.06 50140.5
    49892.75
    49698.37 49488.37 49277.12 49060.56
    48854.31
    48656.5 48478.37 48318.68 48187.75
    48076.5
    47991.19 47920.56 47864 47810.25
    47769.94
    @@ -2084,12 +2084,12 @@
    47841.5 47901.19 47964.31 48041.5
    48120.25
    48206.5 48290.25 48380.87 48463.53
    48546.96
    48624.78 48707.28 48787.9 48879.15
    48974
    - 49080.4 49190.4 49311.5 49432.28
    49558.37
    + 49080.4 49190.4 49375.5 49432.28
    49558.37
    49682.12 49812.43 49940.4 50073.37
    50203.21
    50335.71 50457.28 50570.56 50662.43
    50738.21
    50786.5 50814.46 50813.06 50793.68
    50747.2
    50680.64 50579.23 50446.34 50269.93
    50061.26
    - 49816.5 49547.98 49246.26 48914.78
    48539.07
    + 49880.5 49547.98 49246.26 48914.78
    48539.07
    48130.72 47686.66 47220.05 46724.55
    46213.8
    45684.16 45148.57 44598.93 44042.48
    43467.42
    42878.2 42263.68 41629.63 40966.04
    40282.76
    @@ -2655,7 +2655,7 @@
    -14297.84 -14457.91 -14600.05 -14759.44
    -14899.54
    -15057.7 -15197.83 -15354.5 -15491.26
    -15648.15
    -15789.54 -15950.98 -16095.28 -16259.3
    -16402.67
    - -16560.03 -16695.9 -16844.77 -16969.5
    -17109.72
    + -16560.03 -16695.9 -16844.77 -17001.5
    -17109.72
    -17225.52 -17350.89 -17453.12 -17570.5
    -17664.18
    -17769.41 -17850.37 -17944.39 -18012.98
    -18092.75
    -18143.65 -18201.23 -18230.68 -18269.39
    -18278.61
    
    One can of course argue that the difference in values of the changed
    data are not great, but it is certainly not desirable for users to
    have their data changed without telling them. However, if we have used
    the updated version of sacswap I've provided before in the above byte
    swapping step, files vel.sac.txt and vel.sac.swap.txt will always be
    the same (at least it is like this according to my own tests).
    
    Therefore, to eliminate the possible unwitting data changing effects,
    I think we should update the original sacswap program to one that uses
    byte_swap(). One can still argue that it might be easier to just add a
    "-O3" parameter when compiling the original sacswap program, which
    seems to eliminate the problem withing touching the source code, but
    this trick can not be guaranteed to work on all the platforms.
    
    Best regards,
    
    --
    Kuang He
    Department of Physics
    University of Connecticut
    Storrs, CT 06269-3046
    
    Tel: +1.860.486.4919
    Web: http://www.phys.uconn.edu/~he/
    
    On Sun, Aug 31, 2008 at 11:56 AM, George Helffrich
    <george<at>gly.bris.ac.uk> wrote:
    
    Dear Huang -
    
    Thanks for making the check. Your test program is quite
    simple and
    straightforward.
    
    I was able to reproduce the problem on an Intel Mac mini (gcc
    4.0.1).
    It turns out that the code to swap is fine. If you change the
    floats to
    ints, then the byte swapping works OK.
    
    Here's what the problem is, however: your number is an IEEE
    NaN!
    When it is returned by the function and stored, then the bit pattern
    changes. The reason the pattern changes is that your number is a
    signaling
    NaN. A processor is allowed to clear the flag when a floating point
    number
    is stored. That's the reason that the bit changes. Depending on
    whether a
    floating point store or a fixed point store or move is used, the
    result can
    differ depending on processor and/or code optimization level.
    
    So there's no problem with the byte swapping code.
    
    _______________________________________________
    sac-dev mailing list
    sac-dev<at>iris.washington.edu
    http://www.iris.washington.edu/mailman/listinfo/sac-dev
    
    George Helffrich
    george<at>geology.bristol.ac.uk
    
    Kuang He
    
    Re: sacswap's byte swapping bug
    
    2008-08-31 21:25:26
    
    Dear George,
    
    Hmm, while I still think the previous way of doing it using
    byte_swap() is correct, using your way certainly would generate a much
    smaller diff file.
    
    The new diff file is attached, and also can be found at:
    
    http://maxwell.phys.uconn.edu/~icrazy/sac/sacswap.c.diff
    
    The original diff file using byte_swap() is also put here for comparison:
    
    http://maxwell.phys.uconn.edu/~icrazy/sac/sacswap.c.diff2
    
    Best regards,
    
    --
    Kuang He
    Department of Physics
    University of Connecticut
    Storrs, CT 06269-3046
    
    Tel: +1.860.486.4919
    Web: http://www.phys.uconn.edu/~he/
    
    On Sun, Aug 31, 2008 at 2:07 PM, George Helffrich <george<at>gly.bris.ac.uk> wrote:
    
    Dear Kuang -
    
    If you demand that type of compatibility, then the straightforward
    way to change sacswap would be to change the type of float_swap (and the
    type in the union) to int, and store the swapped value as an int (with a
    suitable cast). That would ensure against any bit changes due to NaN
    behavior on intermediate machine architectures. Can you provide a diff to
    sacswap.c do that?
    
    On 31 Aug 2008, at 18:41, Kuang He wrote:
    
    Dear George,
    
    Thanks for the enlightenment, I did not think of NaN before and was
    not aware of fact that the number NaN is allowed to be changed by the
    processor.
    
    Just to clarify, the my input number specified in endian1.c (listed
    below) under little-endian systems is not NaN, and it actually
    represents the number 0xC547AEFF. After byte swapping, this number
    will become 0xFFAE7A54, and appear to be signalling NaN to a little
    endian system.
    
    cbuf[0] = 0xff;
    cbuf[1] = 0xae;
    cbuf[2] = 0x47;
    cbuf[3] = 0xc5;
    
    Still, I think the original sacswap program can lead to a very
    undesirable effect, that is, after using it, the values from sac files
    may change. A simple way to demonstrate it is again to use the sac
    file I've provided:
    [ ... snipped ...]
    Therefore, to eliminate the possible unwitting data changing effects,
    I think we should update the original sacswap program to one that uses
    byte_swap(). One can still argue that it might be easier to just add a
    "-O3" parameter when compiling the original sacswap program, which
    seems to eliminate the problem withing touching the source code, but
    this trick can not be guaranteed to work on all the platforms.
    
    Best regards,
    
    --
    Kuang He
    
    Attachments
    
    sacswap.c.diff

SAGE ingests, curates, and distributes geoscience data

DATA AT SAGE

DATA INGESTION

DATA ANALYTICS

DATA ACCESS

DOWNLOADABLE SOFTWARE

SUPPORT

SAGE operates, provides, and maintains geoscience instrumentation

Permanent Networks

Portable Instrumentation

USArray

Community Engagement

Collaborations

New Directions

Past Projects

SAGE provides a wide range of education, workforce, and outreach resources

LEARNING & TEACHING RESOURCES

LEARNING OPPORTUNITIES

PUBLIC OUTREACH

EXPLORE EARTHQUAKE DATA

ABOUT SAGE

COMMUNITY

EVENTS

PUBLICATIONS

SAGE ORGANIZATION

Thread: sacswap's byte swapping bug

Attachments

Connect