On Sun, 03 Aug 2008 14:54:26 -0700, Bernard.Mangay <fcmisc@googlem ail.com>
wrote:
The remainder is non zero due to rounding errors. How can I remove the
rounding errors?
The same way you deal with rounding errors with any floating point
calculation: don't expect exact results. Instead, any time you want to do
a comparison, you need to decide how close is "good enough" and
incorporate that into your comparison.
For example:
double result = 1.2 % 0.01;
if (result < 0.000001)
{
// might as well be zero!
}
On Aug 3, 11:01 pm, "Peter Duniho" <NpOeStPe...@nn owslpianmk.com>
wrote:
On Sun, 03 Aug 2008 14:54:26 -0700, Bernard.Mangay <fcm...@googlem ail.com
wrote:
>
The remainder is non zero due to rounding errors. How can I remove the
rounding errors?
>
The same way you deal with rounding errors with any floating point
calculation: don't expect exact results. Instead, any time you want todo
a comparison, you need to decide how close is "good enough" and
incorporate that into your comparison.
>
For example:
>
double result = 1.2 % 0.01;
>
if (result < 0.000001)
{
// might as well be zero!
}
I thought that might work, but sometimes it gives me a remainder of
0.099999999 which fails the fix you suggested.
int mult = 100000;
int remainder = (Int32)(1.2 * mult) % (Int32)(0.01 * mult);
I like nice clean integer zeros for control purposes ... but then the
Int32.MaxValue issue might enter onto the scene. Next up is Int64?
Another benefit >For speed of calculation integer math wins the race
by a *large* margin on calculation intensive operations. Perhaps most
noticeable when doing large data count summations.
Skeet's link is a very good read. The quantification and deep
understanding of error propagation is certainly not a simple topic.
Makes my head hurt in a hurry ... but if you have good math skills and
a love for extreme precision you will find the calculus and
statistical exploration of this topic perhaps of great personal
interest. Advanced engineering and physics courses often delve deeply
into this area.
On Mon, 04 Aug 2008 23:57:25 -0700, Tom <Thomas-911@earthlink.n etwrote:
[...]
Another benefit >For speed of calculation integer math wins the race
by a *large* margin on calculation intensive operations. Perhaps most
noticeable when doing large data count summations.
I guess that depends on what one means by "large margin" and "large data".
It's true that the floating point version winds up about half as fast as
the integer version (takes 90% longer). But on my computer, it can
perform 100,000,000 such calculations in about a second (plus a little for
floating point, minus a little for integer). That's 100 _million_.
And of course, you will only realize that sort of speed-up if the _only_
thing you are doing is calculating remainders. In a real-world scenario,
your i/o is going to account for the bulk of your processing time (even
memory i/o). You'd be hard-pressed to come up with a production example
where the difference is significant.
That's all not even counting the difficulty in making reasonable
judgements on performance differences using simple test programs. For
example, the program I wrote actually does the "is zero" calculation.
When I commented that part of the calculation out, the loop got _slower_,
not faster as one would expect. I check the IL, and confirmed that
removing the check had _only_ the direct effect of eliminating a portion
of the generated code. But something about the change in the method
causes it to run significantly slower _without_ the extra line of code in
it.
Performance tuning is very tricky, and often counter-intuitive.
The "transform into integer space" trick is good to know, but most code
should not be rewritten to use it.
(Note also, on top of all the other problems, that it only enjoys that
large an improvement when you only care about "equal to zero". If you
have to convert back to floating point, a lot of the speed is lost again
("only" about 40% faster in that case)).
My integer method is sort of a back door cheat. It avoids the real
issue of imprecise binary representation of rational numbers. If the
question was posed by a student who needs to convince their instructor
of sufficient understanding ... I much prefer your approach. If on the
other hand it is part of a control/descision block of code ... I
prefer the simplification my method represents and it is one I have
used several times. I consider double % double a trap of sorts that
introduces results often forgotten by those who have not used it
recently. Neither approach is bullet proof; however, the inconsistent
results of the % operator on floats or doubles is so insidious ... I
have a hard time convincing myself that it is a good feature within
C#.
What happens when one has >>
1.20000001 % 0.01
In the above case the 0.00000001 remainder might be very important?
So checks against the RHS value of 0.01 are not a good practice. In
this case the multiplier would need to be 100,000,000 and
Int32.MaxValue = 2,147,483,647 issues start to emerge.
Eventually even the 128 bit decimal representation fails to provide
sufficient accuracy. Irrational numbers such as 1/3'rd and Pi are not
accurately represented even with decimal utilization. I think it false
to make statements such as: "Usage of decimal type assures
_sufficient_ accuracy."
1) A thorough understanding of the entire mathematical methodology
used and its inherent precision is necessary. A deep understanding of
the problem domain is mandatory and this is not a trivial task. It
requires the researcher(s) (physicist, engineer, mathematician, etc.)
and the computer scientist to work together.
2) If you see float % float or double % double in code ... become
very concerned.
-----------------------------------------------
My observed speed enhancement converting to integers was associated
with huge stock data files where the data was fully loaded into ram to
avoid hard drive i/o speed issues. In this specific case there was a
lot of numerical summation work being done (integrations, averages,
statistics, etc.) in a dynamic tuning algorithm with lots of looping
within the optimization routine.
Once values such as 987.125 were converted to 987125 the rewritten
program ran in approximately 1/5'th the time as the program using
doubles for data storage. I was thrilled to have results in 12 minutes
vs 1 hour. Situations where such improvements can be had are rare ...
but in my case it involves code in continuous operation and is one of
the best performance enhancements I have stumbled onto. In the texts
and courses I have been exposed too ... the speed differences between
integer math and doubles (or floats) has been ignored. (I have never
learned assembly nor been exposed to the very low level compiler
optimization methods.)
I encourage those where performance is critical to evaluate the
possibility of using integers. The conversion is not a simple process.
It is tricky enough where I suggest side-by-side testing between
integer code and doubles code until all ambiguities are understood and
removed.
On Tue, 05 Aug 2008 17:27:54 -0700, Tom <Thomas-911@earthlink.n etwrote:
Hey Pete --
Hey Tom! :)
Seriously though...I don't disagree with the _philosophical_ points you've
made. However, I see the implications differently, at least to some
extent, and I do think your conclusions are at least a little off. I'll
try to touch on just a few points:
[...]
What happens when one has >>
>
1.20000001 % 0.01
>
In the above case the 0.00000001 remainder might be very important?
Nope. If you are using a value such as 0.00000001 as your threshold for
"essentiall y zero", then if you started with a value of "1.20000001 ",
that's considered, by definition, equal to 1.2.
You do have to pick the correct epsilon for your application, but once
you've done so, there are no worries such as the above. If the difference
between 1.20000001 and 1.2 is significant, then 0.00000001 is too large an
epsilon for your application, by definition.
So checks against the RHS value of 0.01 are not a good practice. In
this case the multiplier would need to be 100,000,000 and
Int32.MaxValue = 2,147,483,647 issues start to emerge.
>
Eventually even the 128 bit decimal representation fails to provide
sufficient accuracy. Irrational numbers such as 1/3'rd and Pi are not
accurately represented even with decimal utilization. I think it false
to make statements such as: "Usage of decimal type assures
_sufficient_ accuracy."
All of that is true. But it's also true if you are converting floating
point to integer for the purpose of the calculation. Doing that doesn't
avoid those problems.
[...]
2) If you see float % float or double % double in code ... become
very concerned.
No question. I'm just saying that the answer isn't necessarily to change
the code to convert to ints and back again. Such code could be less
readable, and/or more bug-prone (different kinds of overflow
possibilities, for example).
The fact is, I think if one finds oneself doing the remainder operation on
floating point values, it's possible that there's a more fundamental
problem. For example, one is probably using floating point when a
different type would be more appropriate (e.g. decimal, as Jon suggested
in this case).
Fixing a theoretical performance problem is always a bad idea anyway.
Make sure it's a _real_ performance problem before you fix it. But beyond
that, one can become distracted by the performance problem (the trees),
and fail to see a more significant design/correctness problem (the forest).
Yes, if you see a floating point remainder calculation, that's cause for
concern. But I don't think that rewriting the code to improve the
computational performance is really what one ought to be focusing on.
-----------------------------------------------
>
My observed speed enhancement converting to integers was associated
with huge stock data files where the data was fully loaded into ram to
avoid hard drive i/o speed issues.
Did that data fit entirely into the CPU cache?
If not, then you're still dealing with serious memory i/o problems (as I
mentioned before), and any _computational_ speedup (being just part of the
total performance cost) converting to integer calculations is unlikely to
be nearly as significant as the simple test I did might suggest.
[...]
Once values such as 987.125 were converted to 987125 the rewritten
program ran in approximately 1/5'th the time as the program using
doubles for data storage.
Assuming the _only_ change was from using floating point to integers, a 5x
improvement doesn't sound possible given my test.
Now, if you were converting to Int32 from double, per your example, _that_
would be significant. After all, it would have halved the size of your
data, resulting in much less memory i/o and better cache performance.
That _in conjunction with_ the less-than-2x computational improvement
might have accounted for a 5x improvement.
But note that converting from double to int is actually wrong, as those
are not the same sized storage or precision. You run the risk of
overflowing your ints. Or conversely, you could have just been using
floats and enjoyed the same memory bandwidth improvements that switching
from a 64-bit to 32-bit storage would have regardless.
I was thrilled to have results in 12 minutes
vs 1 hour. Situations where such improvements can be had are rare ...
Indeed they are. Like I said, your suggestion isn't completely useless.
I just think in the vast majority of cases, it's a premature optimization.
How about forcing integer % implementation?
>
int mult = 100000;
int remainder = (Int32)(1.2 * mult) % (Int32)(0.01 * mult);
Because it _doesn't solve the problem_.
The binary representation of .01 may be a little more or less than .01.
When you multiply by 100000, the result may be a little more or less than
1000.00. If it is less, when you cast to integer without any rounding, you
will get 999. Same goes for the other operand.
Furthermore, you are still doing more operations than necessary. You've
eliminated the FP division, but you have two FP multiplies. You're better
off (at least in the case of constants) multiplying by the reciprocal,
rounding, subtracting -- cheap FP operations, probably cheaper than integer
modulo -- and performing a equal-with-tolerance test.
I.e.
double x = 1.2;
const double divisor = 0.01;
const double divisor_recip = 1.0 / divisor; // evaluated at compile-time
double divided = x * divisor_recip;
double remainderoverdi visor = divided - Math.Round(divi ded); // may replace
with Math.Floor(divi ded + .5) if faster
if (Math.Abs(remai nderoverdivisor ) < (.0001 * divisor_recip)) ... //
compile-time expression again, introduce another "const" variable as needed
to force compile-time evaluation
Even when divisor is not known at compile-time, this method may still yield
considerable savings if the reciprocal operation can be moved outside a loop
for example.
>
I like nice clean integer zeros for control purposes ... but then the
Int32.MaxValue issue might enter onto the scene. Next up is Int64?
>
Another benefit >For speed of calculation integer math wins the race
by a *large* margin on calculation intensive operations. Perhaps most
noticeable when doing large data count summations.
>
Skeet's link is a very good read. The quantification and deep
understanding of error propagation is certainly not a simple topic.
Makes my head hurt in a hurry ... but if you have good math skills and
a love for extreme precision you will find the calculus and
statistical exploration of this topic perhaps of great personal
interest. Advanced engineering and physics courses often delve deeply
into this area.
>
-- Tom
Each of you brought out several points and I now see my flawed logic.
Even though I tested the code I suggested before submitting it ...
Ben's comments made me recollect using a nasty bit of home brewed
_illogic_ that picked the smallest delta between the multiplied value
and its floor and ceiling values to round up or down.
For hard coded rational numbers this floor and ceiling comparison is
not needed. At least I have not found a case that breaks it. But hard
coded rational numbers are fairly useless! As Ben points out ... the
usage of a variable within the logic with a slightly less than
representation breaks the method and some sort of rounding algorithm
is required. During my revisit of the logic and the problem domain I
stumbled upon the following Math.Round function that does a much
better job than my previous floor and ceiling hack:
double a = 1.2;
double b = 0.01;
int mult = 100000;
I liked the above _before_ exploring the decimal method Jon suggested.
I had read that decimal is simply a 128 bit restricted range double
like place holder. Not so!! The decimal features do appear to
precisely represent rational numbers within its restricted range. I
had not explored decimals before and find this very useful!
Thus a much better solution is >>
decimal remainder = 1.2m % 0.01m;
The above is not bullet proof ... but it is very good. Breaking it
involves exceeding the significant digits within the decimal struct.
For example:
decimal a = decimal.MaxValu e - decimal.One;
decimal b = 0.1234567890123 456789012345678 9m;
decimal c = a + b;
Outputting a produces:
792281625142643 37593543950334
The above b value is rounded up and produces:
0.1234567890123 45678901234569
Also, c == a due to exceeding the significant digit capability of the
decimal struct. No warning is generated. The least significant digits
simply disappear.
Of course the irrational issues still exist with decimal values. For
example:
decimal d = 1.0m / 3.0m;
decimal e = d + d + d;
The above produces an e value of:
9.9999999999999 99999999999999e-001
Thus decimal far exceeds the capacity of my Int32 hack and for all but
the most extreme cases it appears to work.
What remains bothersome for me are the issues associated with double %
double. If something as simple as the originally proposed question
produces unexpected results ... what use is there for having the
modulus operator perform on double or float values?
Pete's discussion of epsilon is spot on. But still one must choose an
epsilon! And this is no simple task. Without the choice of an
appropriate epsilon and the subsequent conditional testing Pete
details ... the thought of modulus applied to double and float values
makes me cringe.
My previous comments about speed enhancement using Int32 in place of
double was inappropriate. It distracts from the main issue of the
original question. As thrilled as I am with the performance increase
of my specific application ... it is a rarity and I should not have
interjected performance into this topic. I over react when less
appears to be more! In this case less being more was my false
perception and >more is more. decimal wins !!!!! ;)
--------------------------
P.S. Here's a nasty way to break the decimal modulus technique:
decimal dA = decimal.MaxValu e - decimal.One;
decimal dB = 0.1234567890123 456789012345678 9m;
decimal remainder = dA % dB;
The above produces the following error message that provides some
insight into the decimal modulus method:
Unhandled Exception: System.Overflow Exception:
Value was either too large or too small for a Decimal.
at System.Decimal. FCallDivide(Dec imal& result,
Decimal d1, Decimal d2)
at System.Decimal. Divide(Decimal d1, Decimal d2)
at System.Decimal. op_Division(Dec imal d1, Decimal d2)
at System.Decimal. Remainder(Decim al d1, Decimal d2)
at System.Decimal. op_Modulus(Deci mal d1, Decimal d2)
at Decimal_Test_Pr ogram.Program.M ain(String[] args)
in G:\...........
Perhaps all modulus operations should reside within try & catch logic?
On Thu, 07 Aug 2008 01:19:52 -0700, Tom <Thomas-911@earthlink.n etwrote:
[...]
What remains bothersome for me are the issues associated with double %
double. If something as simple as the originally proposed question
produces unexpected results ... what use is there for having the
modulus operator perform on double or float values?
A fair question, but I think the answer is simply along the lines of
"knowing the remainder of a division is just as useful in floating point
calculations as in others".
In particular, while the remainder operation is sometimes used to test for
divisibility, that's not its only purpose. You might simply use it to
accumulate error estimates, or track "left-overs" for some type of
calculation where the divisor is fixed, etc. In these contexts, the
rounding errors endemic to floating point aren't a problem, at least not
if they aren't a problem with respect to using floating point generally.
Testing for equality in the floating point domain is problematic, but
that's true whether one is using the remainder operator or not. It's true
that lots of people use floating point types without fully understanding
these issues, but I don't think that the remainder operator makes those
issues any more problematic than they already are. It's just that for
some reason, some people forget that the remainder operator applied to
floating point values yields a floating point value, with all of the same
behaviors that any floating point value has.
IMHO, one ought to be careful when using floating point, _always_.
Whether they are using the remainder operator or not. When misapplied,
the floating point data type can cause unexpected or even incorrect
results. This just happens to be one of many examples of where that might
happen.
Comment