double -> text -> double

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ole Nielsby

    double -> text -> double

    First, sorry if this is off-topic, not strictly being a C++ issue.
    I could not find a ng on numerics or serialization and I figure
    this ng is the closest I can get.

    Now the question:

    I want to serialize doubles in human-readable decimal form
    and be sure I get the exact same binary values when I read
    them back. (Right now, I don't care about NaN, infinities etc.)

    In essense, this boils down to converting a double to a
    (large) integer mantissa and a decimal exponent, and back,
    so that 3.1416 would be represented as {31416, -4}.

    I wrote a converter that always calculates the scaling
    separately and does it exactly the same way when reading
    and writing, using up to 17-digit mantiassa which should
    be sufficient precision. I then tested it with "dirty" numbers
    generated by trigonometry and exp functions, and found
    that it seems to work OK for exponents in the range +/-20
    approximately, but outside of that range, a few percent of
    the numbers come out different.

    Does anybody know of an algorithm that is known to
    work?

    I tried an algorithm that, when converting double->text,
    would convert back to double and try to adjust the mantissa
    if this double was different from the original, but this only
    made things worse.

    Regards/Ole Nielsby


  • Victor Bazarov

    #2
    Re: double -> text -> double

    Ole Nielsby wrote:
    First, sorry if this is off-topic, not strictly being a C++ issue.
    I could not find a ng on numerics or serialization and I figure
    this ng is the closest I can get.
    It's good enough. Every language will have a solution, I am guessing,
    but it would be language-specific.
    Now the question:
    >
    I want to serialize doubles in human-readable decimal form
    and be sure I get the exact same binary values when I read
    them back. (Right now, I don't care about NaN, infinities etc.)
    >
    Output more digits than the precision of the 'double'. See the
    'std::numeric_l imits' template.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask


    Comment

    • Noah Roberts

      #3
      Re: double -> text -> double


      Ole Nielsby wrote:
      I wrote a converter that always calculates the scaling
      separately and does it exactly the same way when reading
      and writing, using up to 17-digit mantiassa which should
      be sufficient precision. I then tested it with "dirty" numbers
      generated by trigonometry and exp functions, and found
      that it seems to work OK for exponents in the range +/-20
      approximately, but outside of that range, a few percent of
      the numbers come out different.
      How are you comparing results?

      How are you doing the conversion from two ints into a double?

      Maybe there is room in those two things for some minor errors.

      Comment

      • Geo

        #4
        Re: double -> text -> double


        Ole Nielsby wrote:
        First, sorry if this is off-topic, not strictly being a C++ issue.
        I could not find a ng on numerics or serialization and I figure
        this ng is the closest I can get.
        >
        Now the question:
        >
        I want to serialize doubles in human-readable decimal form
        and be sure I get the exact same binary values when I read
        them back. (Right now, I don't care about NaN, infinities etc.)
        >
        In essense, this boils down to converting a double to a
        (large) integer mantissa and a decimal exponent, and back,
        so that 3.1416 would be represented as {31416, -4}.
        >
        I wrote a converter that always calculates the scaling
        separately and does it exactly the same way when reading
        and writing, using up to 17-digit mantiassa which should
        be sufficient precision. I then tested it with "dirty" numbers
        generated by trigonometry and exp functions, and found
        that it seems to work OK for exponents in the range +/-20
        approximately, but outside of that range, a few percent of
        the numbers come out different.
        >
        Does anybody know of an algorithm that is known to
        work?
        >
        I tried an algorithm that, when converting double->text,
        would convert back to double and try to adjust the mantissa
        if this double was different from the original, but this only
        made things worse.
        >
        Regards/Ole Nielsby
        Do you mean on the same platform? Are you reading and writing from the
        same application?
        If so then it probably is possible, though you would need to outpout
        the full precision for the number. Saving in binary would be better.

        If you want to transfer the data between different platforms then it
        may not be possible. The two sets of hardware may not be able to
        exactly represent an identical set of doubles, you will have to live
        with some loss of accuracy. Anyway you should be prepared for this in
        all your double calculations, 'exact' with floating point numbers is
        not a very meaningful concept.

        Comment

        • Tommi Höynälänmaa

          #5
          Re: double -> text -> double

          Floating point numbers that are not in the BCD format are usually stored
          so that the mantissa and the exponent are binary numbers and the
          exponent is a power of 2 instead of a power of 10.

          AFAIK the mantissa is not usually binary integer but a fractional binary
          number. I.e. if you have a 64-bit mantissa with binary representation B
          the real value of the mantissa is something like B/(2^64).

          Suppose that we have a floating point number R, the binary
          representation of the mantissa of R is B (an unsigned integer) and the
          exponent of R is E. Suppose that the width of the mantissa is W bits.
          Then R = S * B/(2^W) * (2^E) where S=+-1 is the sign of the number.

          We can write R = S * B * 2^(E-W).

          If E-W >= 0 then R is an integer. We may find the largest nonnegative
          integer E' so that R is divisible by 10^E' and represent R as
          R = S * M' * 10 ^E'.

          If E-W < 0 we can write

          R = S * (B * 5^(W-E)) * 10^(E-W)

          where W-E 0 and 5^(W-E) is an integer.
          If it is necessary we may check if B * 5^(W-E) is divisible by some
          power of 10 and write

          R = S * (B * 5^(W-E) * 10^(-E')) * 10^(E-W+E')

          where B * 5^(W-E) is divisible by 10^E' and E' is a nonnegative integer.

          You should check the details of the floating point format that you are
          using. AFAIK the exponent is not always represented in 2's complement
          representation and it is possible that you have to add 1 in front of the
          binary representation B of the mantissa (the real value of the
          mantissa would be S*(1 + B/(2^W))).

          See also



          --
          Tommi Höynälänmaa
          sähköposti / e-mail: tommi.hoynalanm aa@iki.fi
          kotisivu / homepage: http://www.iki.fi/tohoyn/

          Comment

          • Tommi Höynälänmaa

            #6
            Re: double -&gt; text -&gt; double

            If you do not need to have the serialized number in a human readable
            format (and you use floating point numbers whose exponent has base of 2,
            such as IEEE) it is far more easier and more efficient to serialize the
            number so that the base of the exponent is not converted from 2 to 10.

            So if you have R = S * (1 + B / (2^W)) * E you only need to print the
            integers S, B, and E (and W if it is not assumed to be a constant).
            These integers can also be printed in hexadecimal format.

            --
            Tommi Höynälänmaa
            sähköposti / e-mail: tommi.hoynalanm aa@iki.fi
            kotisivu / homepage: http://www.iki.fi/tohoyn/

            Comment

            • Tommi Höynälänmaa

              #7
              Re: double -&gt; text -&gt; double

              If you do not need to have the serialized number in a human readable
              format (and you use floating point numbers whose exponent has base of 2,
              such as IEEE) it is far more easier and more efficient to serialize the
              number so that the base of the exponent is not converted from 2 to 10.

              So if you have R = S * (1 + B / (2^W)) * 2^E you only need to print the
              integers S, B, and E (and W if it is not assumed to be a constant).
              These integers can also be printed in hexadecimal format.

              --
              Tommi Höynälänmaa
              sähköposti / e-mail: tommi.hoynalanm aa@iki.fi
              kotisivu / homepage: http://www.iki.fi/tohoyn/

              Comment

              • Andrew Koenig

                #8
                Re: double -&gt; text -&gt; double

                "Ole Nielsby" <ole.nielsby@sn ailmail.dkwrote in message
                news:456db67f$0 $49204$14726298 @news.sunsite.d k...
                I want to serialize doubles in human-readable decimal form
                and be sure I get the exact same binary values when I read
                them back. (Right now, I don't care about NaN, infinities etc.)
                In essense, this boils down to converting a double to a
                (large) integer mantissa and a decimal exponent, and back,
                so that 3.1416 would be represented as {31416, -4}.
                I wrote a converter that always calculates the scaling
                separately and does it exactly the same way when reading
                and writing, using up to 17-digit mantiassa which should
                be sufficient precision. I then tested it with "dirty" numbers
                generated by trigonometry and exp functions, and found
                that it seems to work OK for exponents in the range +/-20
                approximately, but outside of that range, a few percent of
                the numbers come out different.
                Does anybody know of an algorithm that is known to
                work?
                Such an algorithm exists, but it's not easy.

                If I remember correctly, the IEEE 754 floating-point standard requires that
                when you convert a character string to floating-point, the result must be
                equal to what you would get if you correctly rounded the infinite-precision
                representation of that character string. When you convert a floating-point
                value to a string with enough digits, the result must be within 0.47 LSB of
                the exact binary value. This latter constraint guarantees that converting a
                floating-point number to character and back to floating-point will give you
                exactly the same result, provided that there are enough digits in the
                character version. Proving that the constraint was sufficient was Jerome
                Coonen's PhD thesis, which suggests how difficult the problem is.

                So if your implementation meets the IEEE 754 standard, the problem is easy
                to solve :-)

                If it doesn't meet the standard, you have to figure it out yourself. Either
                you have to implement something that's as good as the standard, which isn't
                easy, or you're going to have to come up with another way of doing it that
                you can prove is as good, which is even harder.


                Comment

                • Erik Wikström

                  #9
                  Re: double -&gt; text -&gt; double

                  On 2006-11-29 17:34, Ole Nielsby wrote:
                  First, sorry if this is off-topic, not strictly being a C++ issue.
                  I could not find a ng on numerics or serialization and I figure
                  this ng is the closest I can get.
                  >
                  Now the question:
                  >
                  I want to serialize doubles in human-readable decimal form
                  and be sure I get the exact same binary values when I read
                  them back. (Right now, I don't care about NaN, infinities etc.)
                  Perhaps I'm missing something here but for each value of a double there
                  exists a real number, so step 1 would be to output all of the double as
                  text (base 10 is nice but any would do). Step 2 would then be to read
                  the double in again. If you have written the exact value of the double
                  then when parsing the text into a double there should exist only one
                  possible representation of that value which is the one that ought to be
                  chosen.

                  You could run into trouble when reading in a double if there exists no
                  exact representation for it (as others have pointed out) but since the
                  value was a double from the beginning an exact representation must exist.

                  I've thrown together a small program that does just this using
                  stringstreams to convert to and from strings which seems to work. It's
                  not something I'm proud of (put together from pieces of code from other
                  projects and some found on the net) but it should give you an idea of
                  how to do it. Of course, this depends on the stringstreams to correctly
                  translate from double to text and back again, if they don't you have a
                  problem, but I expect that any compliant implementation can do this
                  correctly.

                  Code here: http://www.chalmers.it/~eriwik/main.cpp

                  --
                  Erik Wikström

                  Comment

                  • Ole Nielsby

                    #10
                    Re: double -&gt; text -&gt; double

                    Andrew Koenig <ark@acm.orgwro te:
                    "Ole Nielsby" <ole.nielsby@sn ailmail.dkwrote in message
                    news:456db67f$0 $49204$14726298 @news.sunsite.d k...
                    >
                    >I want to serialize doubles in human-readable decimal form
                    >and be sure I get the exact same binary values when I read
                    >them back. (Right now, I don't care about NaN, infinities etc.)
                    >
                    >In essense, this boils down to converting a double to a
                    >(large) integer mantissa and a decimal exponent, and back,
                    >so that 3.1416 would be represented as {31416, -4}.
                    >
                    >I wrote a converter [...] but [...] a few percent of
                    >the numbers come out different.
                    >
                    >Does anybody know of an algorithm that is known to
                    >work?
                    >
                    Such an algorithm exists, but it's not easy.
                    >
                    If I remember correctly, the IEEE 754 floating-point standard requires
                    that when you convert a character string to floating-point, the result
                    must be equal to what you would get if you correctly rounded the
                    infinite-precision representation of that character string. When you
                    convert a floating-point value to a string with enough digits, the result
                    must be within 0.47 LSB of the exact binary value. This latter constraint
                    guarantees that converting a floating-point number to character and back
                    to floating-point will give you exactly the same result, provided that
                    there are enough digits in the character version. Proving that the
                    constraint was sufficient was Jerome Coonen's PhD thesis, which suggests
                    how difficult the problem is.
                    >
                    So if your implementation meets the IEEE 754 standard, the problem is easy
                    to solve :-)
                    >
                    If it doesn't meet the standard, you have to figure it out yourself.
                    Either you have to implement something that's as good as the standard,
                    which isn't easy, or you're going to have to come up with another way of
                    doing it that you can prove is as good, which is even harder.
                    The setting is this: I am implementing a homebrew fp language (PILS)
                    by writing an interpreter in C++. Like Lisp, simple data can be serialized
                    by outputting them in the syntax of the language. It is important that
                    this doesn't change numbers, i.e. if a number is printed and re-read by
                    the same process, it must be the same.

                    The current implementation is in VC8/Win32 and stores numbers as
                    double, i.e. 64 bit fpu format. The precision model is set to "high"
                    which means the FPU uses 64 bit mantissa internally, but the mantissa
                    is rounded to 52 bits when stored in a double variable. I use up to 18
                    digit integers, which should be a few digits more than required for a
                    52 bit mantissa.

                    To isolate the precision issues from formatting details, I wrote
                    a small class that does the conversion to/from a long long
                    mantiassa and a decimal exponent.

                    My conversion class looks as follows (please bear with my
                    less-than-perfect C++ habits, I took up C++ to implement
                    PILS because it seems next to impossible to interface asm
                    to .NET...). Note: the power of 10 to multiply or divide is
                    constructed naively by mulitiplying tens; this is not the
                    optimal solution for large exponents, but this shouldn't
                    make the numbers differ - the scale is constructed in
                    the same way for reading/writing.


                    class FloatSplit {
                    public:
                    long long mantissa;
                    long exponent;
                    double get(); //FloatSplit -double
                    void set(double value); //double -FloatSplit
                    };

                    double FloatSplit::get () //after reading a number, convert to double
                    {
                    double scale = 1;
                    double value = (double)mantiss a;
                    if (exponent 0) {
                    // Naive scale computation
                    for (long e = 0; e < exponent; e++) scale *= 10;
                    value = mantissa * scale;
                    }
                    else if (exponent < 0) {
                    // Naive scale computation
                    for (long e = 0; e exponent; e--) scale *= 10;
                    value = mantissa / scale;
                    }
                    return value;
                    }

                    void FloatSplit::set (double value) //Split the value for writing
                    {
                    mantissa = (long long)value;
                    exponent = 0;
                    if ((double)mantis sa == value) return; /*integral values*/
                    double absValue = value < 0 ? -value : value;
                    double scale = 1;
                    if (absValue >= 1e18) {
                    exponent++;
                    // Naive scale computation
                    scale *= 10;
                    while (absValue / scale >= 1e18 && exponent < 1000) {
                    exponent++;
                    // Naive scale computation
                    scale *= 10;
                    }
                    mantissa = (long long)(absValue / scale);
                    /* try to adjust mantissa - disabled, made things worse */
                    // if (absValue (double)mantiss a * scale) mantissa++;
                    // if (absValue < (double)mantiss a * scale) mantissa--;
                    }
                    else if (absValue < 1e17) {
                    while (absValue * scale < 1e17
                    && absValue != (double)mantiss a / scale
                    && exponent -1000)
                    {
                    // Naive scale computation
                    scale *= 10;
                    exponent--;
                    mantissa = (long long)(absValue * scale);
                    /* try to adjust mantissa - disabled, made things worse */
                    // if (absValue (double)mantiss a / scale) mantissa++;
                    // if (absValue < (double)mantiss a / scale) mantissa--;
                    }
                    }
                    if (value < 0) mantissa = -mantissa;
                    }

                    I tested like this:
                    for (int i = -300; i <= 300; i++) testConvert(exp (i) * sin(i));
                    and it failed for i = -278, -210, 61, 109, 129, 144, 160, 161,
                    167, 172, 187, 200, 209, 220, 223, 245, 249, 253, 259, 262, 269,
                    280, 299, 300.

                    ---end---


                    Comment

                    • Kai-Uwe Bux

                      #11
                      Re: double -&gt; text -&gt; double

                      Ole Nielsby wrote:
                      First, sorry if this is off-topic, not strictly being a C++ issue.
                      I could not find a ng on numerics or serialization and I figure
                      this ng is the closest I can get.
                      >
                      Now the question:
                      >
                      I want to serialize doubles in human-readable decimal form
                      and be sure I get the exact same binary values when I read
                      them back. (Right now, I don't care about NaN, infinities etc.)
                      Try something like this:

                      #include <limits>
                      #include <sstream>
                      #include <string>
                      #include <stdexcept>
                      #include <cmath>
                      #include <iomanip>

                      template < typename Float >
                      std::string to_string ( Float f ) {
                      std::stringstre am in;
                      unsigned long const digits =
                      static_cast< unsigned long >
                      ( - std::log( std::numeric_li mits<Float>::ep silon() )
                      / std::log( 10.0 ) );
                      if ( in << std::dec << std::setprecisi on(2+digits) << f ) {
                      return ( in.str() );
                      } else {
                      throw ( std::invalid_ar gument( "conversion float to string failed" ) );
                      }
                      }

                      template < typename Float >
                      Float to_float ( std::string const & str ) {
                      std::stringstre am out ( str );
                      Float result;
                      if ( out >result ) {
                      return ( result );
                      } else {
                      throw ( std::invalid_ar gument( "conversion string to float failed" ) );
                      }
                      }

                      #include <iostream>

                      int main ( void ) {
                      volatile double pi = 3.1415926585397 93234;
                      std::string rep = to_string( pi );
                      std::cout << rep << '\n';
                      volatile double x = to_float<double >( rep );
                      std::cout << ( x == pi ) << '\n';
                      }


                      Best

                      Kai-Uwe Bux

                      Comment

                      • Tommi Höynälänmaa

                        #12
                        Re: double -&gt; text -&gt; double

                        The errors you get may occur because you use floating point arithmetic
                        for handling the mantissa and the exponent. Try to extract the mantissa
                        and exponent as integer numbers out of the binary representation of the
                        floating point number and use integer arithmetics for them in
                        FloatSplit::set .
                        You should also use integer arithmetics in FloatSplit::Get .
                        If you divide an integer number with some power of 10 you may get a
                        number whose representation as binary number has an infinitely long
                        fractional part (i.e. something like 0.33333... with decimal numbers).
                        This causes small rounding errors when the result is represented as a
                        floating point number.

                        Note that 64-bit integers may not be sufficient for this. OTOH, the
                        width of the mantissa for IEEE double is less than 64 bits so it may be
                        possible to handle that with 64-bit integers.

                        See also




                        --
                        Tommi Höynälänmaa
                        sähköposti / e-mail: tommi.hoynalanm aa@iki.fi
                        kotisivu / homepage: http://www.iki.fi/tohoyn/

                        Comment

                        Working...