Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.
Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.
Floating-point values make terrible hash table keys.
Any suggestion would
be appreciated.
Imperfect but easy to do:
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/
It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormaliz ed"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".
>
Why?
>
Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality. The problem is
not with the values themselves, but with their sources: two
computations that "ought" to produce the same result may in
fact produce results that differ in a few low-order bits.
>
Consequently, if you are using floating-point values as
keys in a hash table -- a data structure that relies on the
notion of exact equality -- you're in trouble. Calculate
the square root of two as the key of some item and add it
to the table. Later on, calculate the square root of eight
and divide it by two (it "ought" to be sqrt(2), right?) and
search the hash table for an item with that key -- there's
an excellent chance you won't find the item.
>
Floating-point values make terrible hash table keys.
>
Any suggestion would
be appreciated.
>
Imperfect but easy to do:
>
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/
>
It's imperfect because there may be floating-point values
that compare equal but have different bit patterns. Many
systems provide "plus zero" and "minus zero," which look
different but are equal. Some systems support "unnormaliz ed"
numbers (not to be confused with "denormal" numbers), which
allows some values to be represented in several forms (just
as the square root of nine can be written as 3.0, 0.3e1,
0.03e2, ...). Finally, many systems provide "NaN" values
which never compare equal to anything, not even themselves,
so you may have two bit-for-bit identical values that still
aren't "==".
>
-- Eric.Sosman@sun .com
I have developed following macro for float comparisions based on
limit.h
#define FLOAT_EPSILON(a ,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a ,b))
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?
>>>I wanted to write hash function float or double.
>>
> Why?
>>
> Have you ever heard the advice "Never compare floating-
>>point values for equality?" I'll admit that "never" is too
>>strong, but it's still true that you should very seldom
>>compare floating-point values for equality. The problem is
>>not with the values themselves, but with their sources: two
>>computation s that "ought" to produce the same result may in
>>fact produce results that differ in a few low-order bits.
>>
> Consequently, if you are using floating-point values as
>>keys in a hash table -- a data structure that relies on the
>>notion of exact equality -- you're in trouble. Calculate
>>the square root of two as the key of some item and add it
>>to the table. Later on, calculate the square root of eight
>>and divide it by two (it "ought" to be sqrt(2), right?) and
>>search the hash table for an item with that key -- there's
>>an excellent chance you won't find the item.
>>
> Floating-point values make terrible hash table keys.
>>
>>>Any suggestion would
>>>be appreciated.
>>
> Imperfect but easy to do:
>>
>> double d = sqrt(2.0);
>> unsigned char *p = (unsigned char*)&d;
>> /* Now compute a hash code for the array of
>> * values p[0] through p[sizeof d - 1].
>> */
>>
>>It's imperfect because there may be floating-point values
>>that compare equal but have different bit patterns. Many
>>systems provide "plus zero" and "minus zero," which look
>>different but are equal. Some systems support "unnormaliz ed"
>>numbers (not to be confused with "denormal" numbers), which
>>allows some values to be represented in several forms (just
>>as the square root of nine can be written as 3.0, 0.3e1,
>>0.03e2, ...). Finally, many systems provide "NaN" values
>>which never compare equal to anything, not even themselves,
>>so you may have two bit-for-bit identical values that still
>>aren't "==".
>
I have developed following macro for float comparisions based on
limit.h
>
#define FLOAT_EPSILON(a ,b) ( a b ? fabs(a) * FLT_EPSILON : fabs(b) *
FLT_EPSILON)
Forgets to take care of the sign. Try a=0.0F, b=-1e10...
#define FLT_EQUAL(a, b) (fabs((a)-(b)) <= FLOAT_EPSILON(a ,b))
Consider
#define FLT_APPROX_EQUA L(a,b, eps) \
(2 * fabs((a) - (b)) <= (eps)*(fabs(a)+ fabs(b)))
instead, with
#define MY_FLT_EQUAL(a, b) \
FLT_APPROX_EQUA L(a,b,MY_EPSILO N)
where you explicitly define MY_EPSILON.
Its working fine float comparision. I am wondering how long it will
work fine.
any suggestion ?
Yes: Be careful. FLT_EQUAL(a, b) and FLT_EQUAL(b, c) do
not imply FLT_EQUAL(a, c); this can be detrimental for the
efficiency of hashing as you become dependent on the order
in which you enter a, b, c in the table as you can either have
separate entries for a and c or the same.
If you know more about how the floating point numbers you are
trying to hash, there may be reliable equivalence classes.
Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
>>>I wanted to write hash function float or double.
>>
> Why?
>
I have developed following macro for float comparisions based on
limit.h
[...]
You still haven't explained why you want to hash
floating-point values. Using "approximat e equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
>>shaanxxx wrote On 08/23/06 11:32,:
>>>
>>>I wanted to write hash function float or double.
>>>
>> Why?
>>
>I have developed following macro for float comparisions based on
>limit.h
>[...]
>
You still haven't explained why you want to hash
floating-point values. Using "approximat e equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
>
What are you trying to do?
>
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
Greetings,
--
Michel Bardiaux
R&D Director
T +32 [0] 2 790 29 41
F +32 [0] 2 790 29 02
E mailto:mbardiau x@mediaxim.be
Mediaxim NV/SA
Vorstlaan 191 Boulevard du Souverain
Brussel 1160 Bruxelles
>>
> You still haven't explained why you want to hash
>floating-point values. Using "approximat e equality"
>instead of "strict equality" is *not* going to make
>F-P keys work satisfactorily with hash tables.
>>
> What are you trying to do?
>>
A long time ago (1986!) I had a perfectly good case of hashing floating
point numbers: in a PROLOG implementation that used hash-consing, a
technique in which there is only *ONE* copy of any list. Hence we needed
say to find all lists beginning with 1.33.
The usual difficulty is that you may find all the lists
beginning with 1.33, but miss the lists beginning with 1.2+.13
and so on. That may be all right in some situations, but it's
usually a drawback -- which is why I hope the O.P. will describe
his intentions more fully.
floating-point values. Using "approximat e equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
>
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
>shaanxxx wrote On 08/23/06 11:32,:
>
>>I wanted to write hash function float or double.
>
Why?
I have developed following macro for float comparisions based on
limit.h
[...]
>
You still haven't explained why you want to hash
floating-point values. Using "approximat e equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
>
What are you trying to do?
You still haven't explained why you want to hash
floating-point values. Using "approximat e equality"
instead of "strict equality" is *not* going to make
F-P keys work satisfactorily with hash tables.
What are you trying to do?
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
When you have floating point as join predicate, you cant use hash join
for
that.
I was thinking, if we can have some good hash function, we can even use
A hash is used to generate a map. Why can't the OP want a map from
floats?
Have you ever heard the advice "Never compare floating-
point values for equality?" I'll admit that "never" is too
strong, but it's still true that you should very seldom
compare floating-point values for equality.
That's fine if you are computationally heavy and if you require certain
numerical properties. But that might easily not be the case.
Any suggestion would
be appreciated.
>
Imperfect but easy to do:
>
double d = sqrt(2.0);
unsigned char *p = (unsigned char*)&d;
/* Now compute a hash code for the array of
* values p[0] through p[sizeof d - 1].
*/
>
It's imperfect because there may be floating-point values
that compare equal but have different bit patterns.
That's right -- it can't do 0 so its wrong. Let's try this:
double d = ...;
int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.
Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
That solution is not very fast, but its meant to be somewhat portable.
If you are going to go with platform specific solutions, then you can
use the pointer hacking method suggested above, except you would
predetect all the aliased representations , and hash from those static
constants instead.
In article <1156429602.899 984.44690@m79g2 000cwm.googlegr oups.com>,
shaanxxx <shaanxxx@yahoo .com(the original poster) wrote:
>you can implement logical join operator using
>Nested loop join
>Merge join
>Hash join.
>When you have floating point as join predicate, you cant hash join for
>that.
>I was thinking, if we have some good hash function, we can even use
>hash join for floating
>point.
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
--
All is vanity. -- Ecclesiastes
>
That's right -- it can't do 0 so its wrong. Let's try this:
>
double d = ...;
>
int64_t m = INT64_C(0);
int e = 0, s;
figure out the Nans, and two infs and set s to 2, 3, and 4
respectively, otherwise:
s = d < 0;
if (s) d = -d;
e = ceil (log (d));
m = (INT64_MAX + 1.0) * d * exp (-e);
Now incrementally hash the 3 bits of s, the (16?) bits of e, and the 64
bits of m.
>
Figuring out the NaNs, and Infs are platform specific, though C99
apparently has some support for deducing them. In any event, equality
of keys will imply equality of the hash, and assumes you have accuracy
problems only in the last few bits (on most systems.) But you can
ignore the infs and Nans in most cases.
Don't you need a special case for zero in the above?
Also, it makes me uneasy to use so many inexact calculations
in developing a hash code -- both the transcendental functions
are necessarily inexact, and I don't think you can guarantee
that `INT64_MAX+1.0' will be accurate. (INT64_MAX probably has
no exact representation in double, so you need to rely on the
convert-and-add being carried out at higher precision.)
But the overall approach seems promising. You could use
frexp() instead of all that horsing around with logarithms
and exponentials, and you could use ldexp() to scale the fraction
without worrying about inaccuracy in the multiplier. You'd get
something like
double d = ...;
int e;
double f = frexp(d, &e);
int64_t m = ldexp(f, DBL_MANT_DIG);
/* now hash e and m */
(This assumes FLT_RADIX==2; maybe replacing DBL_MANT_DIG with
63 would be a simple way to duck the issue.)
As before, NaNs and Infs need special handling -- but
I think you've got a reasonably portable hash otherwise.
In article <1156429602.899 984.44690@m79g2 000cwm.googlegr oups.com>,
shaanxxx <shaanxxx@yahoo .com(the original poster) wrote:
>
you can implement logical join operator using
Nested loop join
Merge join
Hash join.
>
When you have floating point as join predicate, you cant hash join for
that.
I was thinking, if we have some good hash function, we can even use
hash join for floating
point.
>
I'm not certain what you mean by "logical join operator". Is that
the same as the set union operator?
join relational operator == selection(cross product)
That is not union operator.
Comment