Hash function for float.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • shaanxxx

    Hash function for float.

    I wanted to write hash function float or double. Any suggestion would
    be appreciated.

  • Eric Sosman

    #2
    Re: Hash function for float.



    shaanxxx wrote On 08/23/06 11:32,:
    I wanted to write hash function float or double.
    Why?

    Have you ever heard the advice "Never compare floating-
    point values for equality?" I'll admit that "never" is too
    strong, but it's still true that you should very seldom
    compare floating-point values for equality. The problem is
    not with the values themselves, but with their sources: two
    computations that "ought" to produce the same result may in
    fact produce results that differ in a few low-order bits.

    Consequently, if you are using floating-point values as
    keys in a hash table -- a data structure that relies on the
    notion of exact equality -- you're in trouble. Calculate
    the square root of two as the key of some item and add it
    to the table. Later on, calculate the square root of eight
    and divide it by two (it "ought" to be sqrt(2), right?) and
    search the hash table for an item with that key -- there's
    an excellent chance you won't find the item.

    Floating-point values make terrible hash table keys.
    Any suggestion would
    be appreciated.
    Imperfect but easy to do:

    double d = sqrt(2.0);
    unsigned char *p = (unsigned char*)&d;
    /* Now compute a hash code for the array of
    * values p[0] through p[sizeof d - 1].
    */

    It's imperfect because there may be floating-point values
    that compare equal but have different bit patterns. Many
    systems provide "plus zero" and "minus zero," which look
    different but are equal. Some systems support "unnormaliz ed"
    numbers (not to be confused with "denormal" numbers), which
    allows some values to be represented in several forms (just
    as the square root of nine can be written as 3.0, 0.3e1,
    0.03e2, ...). Finally, many systems provide "NaN" values
    which never compare equal to anything, not even themselves,
    so you may have two bit-for-bit identical values that still
    aren't "==".

    --
    Eric.Sosman@sun .com

    Comment

    • shaanxxx

      #3
      Re: Hash function for float.


      Eric Sosman wrote:
      shaanxxx wrote On 08/23/06 11:32,:
      I wanted to write hash function float or double.
      >
      Why?
      >
      Have you ever heard the advice "Never compare floating-
      point values for equality?" I'll admit that "never" is too
      strong, but it's still true that you should very seldom
      compare floating-point values for equality. The problem is
      not with the values themselves, but with their sources: two
      computations that "ought" to produce the same result may in
      fact produce results that differ in a few low-order bits.
      >
      Consequently, if you are using floating-point values as
      keys in a hash table -- a data structure that relies on the
      notion of exact equality -- you're in trouble. Calculate
      the square root of two as the key of some item and add it
      to the table. Later on, calculate the square root of eight
      and divide it by two (it "ought" to be sqrt(2), right?) and
      search the hash table for an item with that key -- there's
      an excellent chance you won't find the item.
      >
      Floating-point values make terrible hash table keys.
      >
      Any suggestion would
      be appreciated.
      >
      Imperfect but easy to do:
      >
      double d = sqrt(2.0);
      unsigned char *p = (unsigned char*)&d;
      /* Now compute a hash code for the array of
      * values p[0] through p[sizeof d - 1].
      */
      >
      It's imperfect because there may be floating-point values
      that compare equal but have different bit patterns. Many
      systems provide "plus zero" and "minus zero," which look
      different but are equal. Some systems support "unnormaliz ed"
      numbers (not to be confused with "denormal" numbers), which
      allows some values to be represented in several forms (just
      as the square root of nine can be written as 3.0, 0.3e1,
      0.03e2, ...). Finally, many systems provide "NaN" values
      which never compare equal to anything, not even themselves,
      so you may have two bit-for-bit identical values that still
      aren't "==".
      >
      --
      Eric.Sosman@sun .com
      I have developed following macro for float comparisions based on
      limit.h

      #define FLOAT_EPSILON(a ,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
      FLT_EPSILON)
      #define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a ,b))

      Its working fine float comparision. I am wondering how long it will
      work fine.
      any suggestion ?

      Comment

      • Michael Mair

        #4
        Re: Hash function for float.

        shaanxxx schrieb:
        Eric Sosman wrote:
        >>shaanxxx wrote On 08/23/06 11:32,:
        >>
        >>>I wanted to write hash function float or double.
        >>
        > Why?
        >>
        > Have you ever heard the advice "Never compare floating-
        >>point values for equality?" I'll admit that "never" is too
        >>strong, but it's still true that you should very seldom
        >>compare floating-point values for equality. The problem is
        >>not with the values themselves, but with their sources: two
        >>computation s that "ought" to produce the same result may in
        >>fact produce results that differ in a few low-order bits.
        >>
        > Consequently, if you are using floating-point values as
        >>keys in a hash table -- a data structure that relies on the
        >>notion of exact equality -- you're in trouble. Calculate
        >>the square root of two as the key of some item and add it
        >>to the table. Later on, calculate the square root of eight
        >>and divide it by two (it "ought" to be sqrt(2), right?) and
        >>search the hash table for an item with that key -- there's
        >>an excellent chance you won't find the item.
        >>
        > Floating-point values make terrible hash table keys.
        >>
        >>>Any suggestion would
        >>>be appreciated.
        >>
        > Imperfect but easy to do:
        >>
        >> double d = sqrt(2.0);
        >> unsigned char *p = (unsigned char*)&d;
        >> /* Now compute a hash code for the array of
        >> * values p[0] through p[sizeof d - 1].
        >> */
        >>
        >>It's imperfect because there may be floating-point values
        >>that compare equal but have different bit patterns. Many
        >>systems provide "plus zero" and "minus zero," which look
        >>different but are equal. Some systems support "unnormaliz ed"
        >>numbers (not to be confused with "denormal" numbers), which
        >>allows some values to be represented in several forms (just
        >>as the square root of nine can be written as 3.0, 0.3e1,
        >>0.03e2, ...). Finally, many systems provide "NaN" values
        >>which never compare equal to anything, not even themselves,
        >>so you may have two bit-for-bit identical values that still
        >>aren't "==".
        >
        I have developed following macro for float comparisions based on
        limit.h
        >
        #define FLOAT_EPSILON(a ,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
        FLT_EPSILON)
        Forgets to take care of the sign. Try a=0.0F, b=-1e10...
        #define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a ,b))
        Consider
        #define FLT_APPROX_EQUA L(a,b, eps) \
        (2 * fabs((a) - (b)) <= (eps)*(fabs(a)+ fabs(b)))
        instead, with
        #define MY_FLT_EQUAL(a, b) \
        FLT_APPROX_EQUA L(a,b,MY_EPSILO N)
        where you explicitly define MY_EPSILON.
        Its working fine float comparision. I am wondering how long it will
        work fine.
        any suggestion ?
        Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
        not imply FLT_EQUAL(a, c); this can be detrimental for the
        efficiency of hashing as you become dependent on the order
        in which you enter a, b, c in the table as you can either have
        separate entries for a and c or the same.
        If you know more about how the floating point numbers you are
        trying to hash, there may be reliable equivalence classes.


        Cheers
        Michael
        --
        E-Mail: Mine is an /at/ gmx /dot/ de address.

        Comment

        • Eric Sosman

          #5
          Re: Hash function for float.

          shaanxxx wrote:
          Eric Sosman wrote:
          >
          >
          >>shaanxxx wrote On 08/23/06 11:32,:
          >>
          >>>I wanted to write hash function float or double.
          >>
          > Why?
          >
          I have developed following macro for float comparisions based on
          limit.h
          [...]
          You still haven't explained why you want to hash
          floating-point values. Using "approximat e equality"
          instead of "strict equality" is *not* going to make
          F-P keys work satisfactorily with hash tables.

          What are you trying to do?

          --
          Eric Sosman
          esosman@acm-dot-org.invalid

          Comment

          • Michel Bardiaux

            #6
            Re: Hash function for float.

            Eric Sosman wrote:
            shaanxxx wrote:
            >
            >Eric Sosman wrote:
            >>
            >>
            >>shaanxxx wrote On 08/23/06 11:32,:
            >>>
            >>>I wanted to write hash function float or double.
            >>>
            >> Why?
            >>
            >I have developed following macro for float comparisions based on
            >limit.h
            >[...]
            >
            You still haven't explained why you want to hash
            floating-point values. Using "approximat e equality"
            instead of "strict equality" is *not* going to make
            F-P keys work satisfactorily with hash tables.
            >
            What are you trying to do?
            >
            A long time ago (1986!) I had a perfectly good case of hashing floating
            point numbers: in a PROLOG implementation that used hash-consing, a
            technique in which there is only *ONE* copy of any list. Hence we needed
            say to find all lists beginning with 1.33.

            Greetings,
            --
            Michel Bardiaux
            R&D Director
            T +32 [0] 2 790 29 41
            F +32 [0] 2 790 29 02
            E mailto:mbardiau x@mediaxim.be

            Mediaxim NV/SA
            Vorstlaan 191 Boulevard du Souverain
            Brussel 1160 Bruxelles

            Comment

            • Eric Sosman

              #7
              Re: Hash function for float.

              Michel Bardiaux wrote:
              Eric Sosman wrote:
              >>
              > You still haven't explained why you want to hash
              >floating-point values. Using "approximat e equality"
              >instead of "strict equality" is *not* going to make
              >F-P keys work satisfactorily with hash tables.
              >>
              > What are you trying to do?
              >>
              A long time ago (1986!) I had a perfectly good case of hashing floating
              point numbers: in a PROLOG implementation that used hash-consing, a
              technique in which there is only *ONE* copy of any list. Hence we needed
              say to find all lists beginning with 1.33.
              The usual difficulty is that you may find all the lists
              beginning with 1.33, but miss the lists beginning with 1.2+.13
              and so on. That may be all right in some situations, but it's
              usually a drawback -- which is why I hope the O.P. will describe
              his intentions more fully.

              --
              Eric Sosman
              esosman@acm-dot-org.invalid

              Comment

              • shaanxxx

                #8
                Re: Hash function for float.

                You still haven't explained why you want to hash
                floating-point values. Using "approximat e equality"
                instead of "strict equality" is *not* going to make
                F-P keys work satisfactorily with hash tables.
                >
                What are you trying to do?

                you can implement logical join operator using
                Nested loop join
                Merge join
                Hash join.

                When you have floating point as join predicate, you cant hash join for
                that.
                I was thinking, if we have some good hash function, we can even use
                hash join for floating
                point.


                Thanks
                Shaan.

                Comment

                • shaanxxx

                  #9
                  Re: Hash function for float.


                  Eric Sosman wrote:
                  shaanxxx wrote:
                  >
                  Eric Sosman wrote:

                  >shaanxxx wrote On 08/23/06 11:32,:
                  >
                  >>I wanted to write hash function float or double.
                  >
                  Why?
                  I have developed following macro for float comparisions based on
                  limit.h
                  [...]
                  >
                  You still haven't explained why you want to hash
                  floating-point values. Using "approximat e equality"
                  instead of "strict equality" is *not* going to make
                  F-P keys work satisfactorily with hash tables.
                  >
                  What are you trying to do?
                  You still haven't explained why you want to hash
                  floating-point values. Using "approximat e equality"
                  instead of "strict equality" is *not* going to make
                  F-P keys work satisfactorily with hash tables.
                  What are you trying to do?
                  you can implement logical join operator using
                  Nested loop join
                  Merge join
                  Hash join.

                  When you have floating point as join predicate, you cant use hash join
                  for
                  that.
                  I was thinking, if we can have some good hash function, we can even use

                  hash join for floating point.

                  Thanks
                  Shaan.

                  Comment

                  • websnarf@gmail.com

                    #10
                    Re: Hash function for float.

                    Eric Sosman wrote:
                    shaanxxx wrote On 08/23/06 11:32,:
                    I wanted to write hash function float or double.
                    >
                    Why?
                    A hash is used to generate a map. Why can't the OP want a map from
                    floats?
                    Have you ever heard the advice "Never compare floating-
                    point values for equality?" I'll admit that "never" is too
                    strong, but it's still true that you should very seldom
                    compare floating-point values for equality.
                    That's fine if you are computationally heavy and if you require certain
                    numerical properties. But that might easily not be the case.
                    Any suggestion would
                    be appreciated.
                    >
                    Imperfect but easy to do:
                    >
                    double d = sqrt(2.0);
                    unsigned char *p = (unsigned char*)&d;
                    /* Now compute a hash code for the array of
                    * values p[0] through p[sizeof d - 1].
                    */
                    >
                    It's imperfect because there may be floating-point values
                    that compare equal but have different bit patterns.
                    That's right -- it can't do 0 so its wrong. Let's try this:

                    double d = ...;

                    int64_t m = INT64_C(0);
                    int e = 0, s;
                    figure out the Nans, and two infs and set s to 2, 3, and 4
                    respectively, otherwise:
                    s = d < 0;
                    if (s) d = -d;
                    e = ceil (log (d));
                    m = (INT64_MAX + 1.0) * d * exp (-e);
                    Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
                    bits of m.

                    Figuring out the NaNs, and Infs are platform specific, though C99
                    apparently has some support for deducing them. In any event, equality
                    of keys will imply equality of the hash, and assumes you have accuracy
                    problems only in the last few bits (on most systems.) But you can
                    ignore the infs and Nans in most cases.

                    That solution is not very fast, but its meant to be somewhat portable.
                    If you are going to go with platform specific solutions, then you can
                    use the pointer hacking method suggested above, except you would
                    predetect all the aliased representations , and hash from those static
                    constants instead.

                    --
                    Paul Hsieh
                    Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.



                    Comment

                    • Walter Roberson

                      #11
                      Re: Hash function for float.

                      In article <1156429602.899 984.44690@m79g2 000cwm.googlegr oups.com>,
                      shaanxxx <shaanxxx@yahoo .com(the original poster) wrote:
                      >you can implement logical join operator using
                      >Nested loop join
                      >Merge join
                      >Hash join.
                      >When you have floating point as join predicate, you cant hash join for
                      >that.
                      >I was thinking, if we have some good hash function, we can even use
                      >hash join for floating
                      >point.
                      I'm not certain what you mean by "logical join operator". Is that
                      the same as the set union operator?
                      --
                      All is vanity. -- Ecclesiastes

                      Comment

                      • Eric Sosman

                        #12
                        Re: Hash function for float.



                        websnarf@gmail. com wrote On 08/24/06 11:15,:
                        >
                        That's right -- it can't do 0 so its wrong. Let's try this:
                        >
                        double d = ...;
                        >
                        int64_t m = INT64_C(0);
                        int e = 0, s;
                        figure out the Nans, and two infs and set s to 2, 3, and 4
                        respectively, otherwise:
                        s = d < 0;
                        if (s) d = -d;
                        e = ceil (log (d));
                        m = (INT64_MAX + 1.0) * d * exp (-e);
                        Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
                        bits of m.
                        >
                        Figuring out the NaNs, and Infs are platform specific, though C99
                        apparently has some support for deducing them. In any event, equality
                        of keys will imply equality of the hash, and assumes you have accuracy
                        problems only in the last few bits (on most systems.) But you can
                        ignore the infs and Nans in most cases.
                        Don't you need a special case for zero in the above?

                        Also, it makes me uneasy to use so many inexact calculations
                        in developing a hash code -- both the transcendental functions
                        are necessarily inexact, and I don't think you can guarantee
                        that `INT64_MAX+1.0' will be accurate. (INT64_MAX probably has
                        no exact representation in double, so you need to rely on the
                        convert-and-add being carried out at higher precision.)

                        But the overall approach seems promising. You could use
                        frexp() instead of all that horsing around with logarithms
                        and exponentials, and you could use ldexp() to scale the fraction
                        without worrying about inaccuracy in the multiplier. You'd get
                        something like

                        double d = ...;
                        int e;
                        double f = frexp(d, &e);
                        int64_t m = ldexp(f, DBL_MANT_DIG);
                        /* now hash e and m */

                        (This assumes FLT_RADIX==2; maybe replacing DBL_MANT_DIG with
                        63 would be a simple way to duck the issue.)

                        As before, NaNs and Infs need special handling -- but
                        I think you've got a reasonably portable hash otherwise.

                        --
                        Eric.Sosman@sun .com

                        Comment

                        • shaanxxx

                          #13
                          Re: Hash function for float.


                          Walter Roberson wrote:
                          In article <1156429602.899 984.44690@m79g2 000cwm.googlegr oups.com>,
                          shaanxxx <shaanxxx@yahoo .com(the original poster) wrote:
                          >
                          you can implement logical join operator using
                          Nested loop join
                          Merge join
                          Hash join.
                          >
                          When you have floating point as join predicate, you cant hash join for
                          that.
                          I was thinking, if we have some good hash function, we can even use
                          hash join for floating
                          point.
                          >
                          I'm not certain what you mean by "logical join operator". Is that
                          the same as the set union operator?
                          join relational operator == selection(cross product)
                          That is not union operator.

                          Comment

                          Working...