problem with output of the program on different OS

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ben Bacarisse

    #61
    Re: problem with output of the program on different OS

    pereges <Broli00@gmail. comwrites:
    Bart wrote:
    >Been trying to investigate further, but's it's getting more complex
    >and Pelles' IDE is a right pain to use.
    >>
    >The figures for radar->E_scattered start to divulge at ray_count=463
    >(they are the same at ray_count=462).
    >>
    >The immediate culprit is r.pathlength, but this is traced to
    >intersect_tria ngle(). I stored the last det value calculated in
    >intersect_tria ngle, and the results were for ray_count=462, 463, 464:
    >>
    >0.550692 0.550692 0.329331 on main compilers
    >0.550692 0.329331 0.329331 on Pelles C
    >>
    >Why is the middle one different? Might be a different pattern of
    >calling intersect_trian gle(), which is where it starts to get
    >complicated.
    >>
    >Perhaps you should forget Pelles C. You might do a lot of work just to
    >discover some obscure bug in the compiler (on the other hand, you
    >might well find some undefined behaviour in your program and maybe all
    >the results were wrong).
    >
    I really don't know and now its not even the question of incorrect
    output on different operating systems. Now, when I run the code on
    linux, I get the same output everytime for the same set of input(this
    was a problem previously).
    This is not a good way to debug a program. Did you fix (or disprove)
    the apparent bug that I pointed out? Using uninitialised data would
    explain all the variation in results.

    The probability of finding a compiler bug in language X is roughly

    (epsilon + no. of years you've been programming in X)/N

    (for some small epsilon and large N) :-)

    --
    Ben.

    Comment

    • Bart

      #62
      Re: problem with output of the program on different OS

      On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
      On May 12, 8:06 pm, Bart <b...@freeuk.co mwrote:
      >
      I had a further look at Pelles C version (now that I can run it from
      the command line, I can make progress..)
      >
      With 3 compilers, ray 463 hits only triangle 654 of the 1200
      triangles.
      With Pelles C, ray 463 hits triangles 654 and 745, at equal distances.
      >
      I just added a raycount member to ray data structure and kept track of
      rays hitting triangles. I got some what different results. ray 463 is
      hitting triangle 943 at distance 1006.67112
      >
      a ray hitting two triangles at same distance is nothing but  ray
      hitting the edge shared by two triangles or the vertex.
      >
      Varying your EPSILON made no difference.
      >
      Well I  varied EPSILON and there was a little difference in output but
      still no where near the results obtained with other compilers.
      >
      So it starts to look more like some strange numeric problem; perhaps
      Pelles C does some calculations a little differently, or perhaps
      something more serious; this requires more investigation by someone
      with a lot more time!
      It could still be a bug in /your/ calculations which may depend too
      heavily on some very small value which is treated differently between
      different C systems.
      >
      maybe you are right but isn't it strange that PellesC is the only
      compiler giving a result like this. If there was a serious bug in the
      program, then there would have been atleast one compiler which could
      have given strange result ?
      I've narrowed the problem down, it can be illustrated by this code:

      double x=-10;
      double y=2.0/3.0;
      double z;
      unsigned int a=15;

      z=x+a*y;

      printf("%g + %u * %g = %g\n", x, a, y, z);

      The result (-10+15*(2/3)) should be inexact, and gives something like
      1e-15 on most compilers. But Pelles, strangely, gives the exact result
      of 0.0!

      I'm not going to investigate this further (except out of curiosity to
      see what machine code is generated).

      In your program, it gives funny near-zero values to some elements of
      pointinarray[], which is used to set up .origin of a ray among other
      things.

      And in one routine, intersect_trian gle(), you have *v = vector_dot.
      This *v is 0.0 in Pelles but something like -1e-15 in the others. You
      then reject the intersection when (*v<0.0) which then behaves
      differently between the compilers.

      (And the same pattern probably occuring in lots of places. So like I
      said, you are relying too much on values near zero).

      To fix: perhaps clamp those values in pointinarray to 0.0. And in
      if(*v<0.0), perhaps use
      if(*v<-EPSILON).
      What other languages in your opinion could have been used ? I was told
      C because speed was important to my application.
      I'm not the right person to ask; I tend to use my own rapid-
      development language. But perhaps Python or similar, anything that is
      comfortable with all these points and vectors and with not so many
      damn pointers!

      --
      Bartc

      Comment

      • Ben Bacarisse

        #63
        Re: problem with output of the program on different OS

        Bart <bc@freeuk.comw rites:
        On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
        <snip>
        >maybe you are right but isn't it strange that PellesC is the only
        >compiler giving a result like this. If there was a serious bug in the
        >program, then there would have been atleast one compiler which could
        >have given strange result ?
        No, you can't reason like that. It seems to me that there is a clear
        and obvious bug in the program (unless you are now working from a
        newer fixed version). Since I seem to shouting from the sidelines
        here let me rephrase it: the automatic variable (radar_detector radar)
        seems to be only partially initialised buy the function
        'initialize_rad ar'. This is quite sufficient to explain what you see.
        You don't need to blame a compiler.
        I've narrowed the problem down, it can be illustrated by this code:
        >
        double x=-10;
        double y=2.0/3.0;
        double z;
        unsigned int a=15;
        >
        z=x+a*y;
        >
        printf("%g + %u * %g = %g\n", x, a, y, z);
        >
        The result (-10+15*(2/3)) should be inexact, and gives something like
        1e-15 on most compilers. But Pelles, strangely, gives the exact result
        of 0.0!
        What is odd about that? My gcc does exactly the same. It
        seems entirely correct to me.

        Either way, it should not be the explanation. If the physics of the
        program are reasonable (and correct) the results will be roughly the
        same. Only the most ill-conditioned problems will diverge due such
        problems. If this is such a case (and I am pretty sure it is not) the
        solution lies not in the compiler that gives z = 1e-15 rather than 0
        but rather in a new algorithm that is more stable.

        --
        Ben.

        Comment

        • Bart

          #64
          Re: problem with output of the program on different OS

          On May 13, 1:39 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
          Bart <b...@freeuk.co mwrites:
          On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
          <snip>
          maybe you are right but isn't it strange that PellesC is the only
          compiler giving a result like this. If there was a serious bug in the
          program, then there would have been atleast one compiler which could
          have given strange result ?
          >
          No, you can't reason like that.  It seems to me that there is a clear
          and obvious bug in the program (unless you are now working from a
          newer fixed version).  Since I seem to shouting from the sidelines
          here let me rephrase it: the automatic variable (radar_detector radar)
          seems to be only partially initialised buy the function
          'initialize_rad ar'.  This is quite sufficient to explain what you see.
          You don't need to blame a compiler.
          One specific initialisation problem was mentioned upthread; there
          seemed to be others, but not directly affecting the Pelles C result I
          was tracing.
          >
          I've narrowed the problem down, it can be illustrated by this code:
          >
          double x=-10;
          double y=2.0/3.0;
          double z;
          unsigned int a=15;
          >
          z=x+a*y;
          >
          printf("%g + %u * %g = %g\n", x, a, y, z);
          >
          The result (-10+15*(2/3)) should be inexact, and gives something like
          1e-15 on most compilers. But Pelles, strangely, gives the exact result
          of 0.0!
          >
          What is odd about that?  My gcc does exactly the same.  It
          seems entirely correct to me.
          I would expect the answer to be wrong by one bit or so. My gcc/3.4.5
          gives -5e-16.

          I was intrigued in why 1 compiler out of 4 gave a different result,
          and tracked it down to this behaviour (it was also more interesting
          than what I was supposed to be doing..)
          >
          Either way, it should not be the explanation.  If the physics of the
          program are reasonable (and correct) the results will be roughly the
          same.  Only the most ill-conditioned problems will diverge due such
          problems.  If this is such a case (and I am pretty sure it is not) the
          solution lies not in the compiler that gives z = 1e-15 rather than 0
          but rather in a new algorithm that is more stable.
          Yes the program is unstable if it can give different results depending
          on whether one value is just one side of zero or the other. As it is I
          wouldn't now trust any of the results to be correct even if many
          concur.

          This is up to the OP now to fix the problems.

          --
          Bartc

          Comment

          • Ben Bacarisse

            #65
            Re: problem with output of the program on different OS

            Bart <bc@freeuk.comw rites:
            On May 13, 1:39 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
            >Bart <b...@freeuk.co mwrites:
            On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
            ><snip>
            >maybe you are right but isn't it strange that PellesC is the only
            >compiler giving a result like this. If there was a serious bug in the
            >program, then there would have been atleast one compiler which could
            >have given strange result ?
            >>
            >No, you can't reason like that.  It seems to me that there is a clear
            >and obvious bug in the program (unless you are now working from a
            >newer fixed version).  Since I seem to shouting from the sidelines
            >here let me rephrase it: the automatic variable (radar_detector radar)
            >seems to be only partially initialised buy the function
            >'initialize_ra dar'.  This is quite sufficient to explain what you see.
            >You don't need to blame a compiler.
            >
            One specific initialisation problem was mentioned upthread; there
            seemed to be others, but not directly affecting the Pelles C result I
            was tracing.
            I can't see how leaving parts of a complex number (that is used)
            uninitialised could not be affecting what you were tracing. Maybe I
            am missing what you are tracing, but the program seems use
            uninitialised data, specifically the real part of the E_scattered
            member and the imaginary part of E_incident member of the radar
            structure (as posted /three days ago/[1]).

            Now, the OP may have corrected that, and you may be using that
            corrected source, but there was no "OK, fixed" message from him so I
            suspect not. It is also possible that my reasoning is wrong, but then
            I'd expect a message saying "it's OK, I set it here" or "I never use
            the real part of E_scattered").
            I've narrowed the problem down, it can be illustrated by this code:
            >>
            double x=-10;
            double y=2.0/3.0;
            double z;
            unsigned int a=15;
            >>
            z=x+a*y;
            >>
            printf("%g + %u * %g = %g\n", x, a, y, z);
            >>
            The result (-10+15*(2/3)) should be inexact, and gives something like
            1e-15 on most compilers. But Pelles, strangely, gives the exact result
            of 0.0!
            >>
            >What is odd about that?  My gcc does exactly the same.  It
            >seems entirely correct to me.
            >
            I would expect the answer to be wrong by one bit or so. My gcc/3.4.5
            gives -5e-16.
            Mine is 4.2.3. The point is I think 0 is a correct and permitted
            answer. It is not at all strange (to me).

            This has got so surreal that I have just downloaded Pelles C. If I
            leave the bug in I get this output:

            Es: inf Ei: 1.112194e-04 RCS: inf

            If I correct it by setting the missing parts of the complex numbers to
            0 I get this:

            Es: 7.391785e-11 Ei: 1.112194e-04 RCS: 8.351773e+00

            which is exactly what gcc 4.2.3 and lcc-win32 give me. It seems to be
            a bug of the most ordinary nature.
            I was intrigued in why 1 compiler out of 4 gave a different result,
            and tracked it down to this behaviour
            I am far from sure that you have. Does the version you traced have
            uninitialised data and does the problem remain when you add the two
            extra zero initialisations ? If, so I will agree it looks odd, but so
            far it seems to be a common all-garden bug in the code.

            [1] Message-ID: <87wsm3eeqh.fsf @bsb.me.uk>

            --
            Ben.

            Comment

            • pereges

              #66
              Re: problem with output of the program on different OS

              On May 12, 10:44 pm, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
              pereges <Brol...@gmail. comwrites:
              Bart wrote:
              Been trying to investigate further, but's it's getting more complex
              and Pelles' IDE is a right pain to use.
              >
              The figures for radar->E_scattered start to divulge at ray_count=463
              (they are the same at ray_count=462).
              >
              The immediate culprit is r.pathlength, but this is traced to
              intersect_trian gle(). I stored the last det value calculated in
              intersect_trian gle, and the results were for ray_count=462, 463, 464:
              >
              0.550692 0.550692 0.329331 on main compilers
              0.550692 0.329331 0.329331 on Pelles C
              >
              Why is the middle one different? Might be a different pattern of
              calling intersect_trian gle(), which is where it starts to get
              complicated.
              >
              Perhaps you should forget Pelles C. You might do a lot of work just to
              discover some obscure bug in the compiler (on the other hand, you
              might well find some undefined behaviour in your program and maybe all
              the results were wrong).
              >
              I really don't know and now its not even the question of incorrect
              output on different operating systems. Now, when I run the code on
              linux, I get the same output everytime for the same set of input(this
              was a problem previously).
              >
              This is not a good way to debug a program. Did you fix (or disprove)
              the apparent bug that I pointed out? Using uninitialised data would
              explain all the variation in results.
              >
              The probability of finding a compiler bug in language X is roughly
              >
              (epsilon + no. of years you've been programming in X)/N
              >
              (for some small epsilon and large N) :-)

              Yes, I fixed the bug that you had pointed out( uninitialized members
              of E_incident and E_scattered) and other bugs that many people have
              pointed out. Because of these changes, I'm now getting a different and
              consistent result on most compilers. Other people are also getting the
              same result as me now. Strangely, only pellesC is reporting a
              different result which is similar to the result I had before the bugs
              were fixed. It doesn't seem to be affected by the changes.

              Comment

              • pereges

                #67
                Re: problem with output of the program on different OS

                On May 13, 8:46 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                If I correct it by setting the missing parts of the complex numbers to
                0 I get this:
                >
                Es: 7.391785e-11 Ei: 1.112194e-04 RCS: 8.351773e+00
                >
                which is exactly what gcc 4.2.3 and lcc-win32 give me. It seems to be
                a bug of the most ordinary nature.
                >
                I was intrigued in why 1 compiler out of 4 gave a different result,
                and tracked it down to this behaviour
                >
                I am far from sure that you have. Does the version you traced have
                uninitialised data and does the problem remain when you add the two
                extra zero initialisations ? If, so I will agree it looks odd, but so
                far it seems to be a common all-garden bug in the code.
                Are you really getting this output on PellesC after fixing the bug?
                What is the version ? I'm using PellesC 5.00.4 and it reports the same
                value 1.11e+01

                Comment

                • Bart

                  #68
                  Re: problem with output of the program on different OS

                  On May 13, 4:46 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                  Bart <b...@freeuk.co mwrites:
                  On May 13, 1:39 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                  Bart <b...@freeuk.co mwrites:
                  On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
                  I would expect the answer to be wrong by one bit or so. My gcc/3.4.5
                  gives -5e-16.
                  >
                  Mine is 4.2.3.  The point is I think 0 is a correct and permitted
                  answer.  It is not at all strange (to me).
                  >
                  This has got so surreal that I have just downloaded Pelles C.  If I
                  leave the bug in I get this output:
                  >
                  Es: inf Ei: 1.112194e-04 RCS: inf
                  I was also getting these weird results. I independently made the
                  initialisation change and got the results of 7.39... except for Pelles
                  which gave 7.08...;
                  >
                  If I correct it by setting the missing parts of the complex numbers to
                  0 I get this:
                  >
                  Es: 7.391785e-11 Ei: 1.112194e-04 RCS: 8.351773e+00
                  >
                  which is exactly what gcc 4.2.3 and lcc-win32 give me.  It seems to be
                  a bug of the most ordinary nature.
                  My Pelles and the OP's gave 7.08....

                  In fact my Pelles was an old V2.9, I just downloaded the new version
                  and it gave the same results, namely 7.08.
                  I was intrigued in why 1 compiler out of 4 gave a different result,
                  and tracked it down to this behaviour
                  >
                  I am far from sure that you have.  Does the version you traced have
                  uninitialised data and does the problem remain when you add the two
                  extra zero initialisations ?  If, so I will agree it looks odd, but so
                  far it seems to be a common all-garden bug in the code.
                  There's lots of other uninitialised data which may or may not be
                  affecting anything else. However the 7.08/7.39 discrepancy *was*
                  traced to these zero/near zero values.

                  But I've now changed a couple of things: < 0.0 to < (-EPSILON) and >
                  1.0 to (1+EPSILON).

                  *Now*, all my 4 compilers (5 including new Pelles) give the 7.08...
                  result.

                  (Whether that is right or not, I've still no idea.)


                  --
                  Bartc

                  Comment

                  • Chris Dollin

                    #69
                    Re: problem with output of the program on different OS

                    Richard wrote:
                    Richard Heathfield <rjh@see.sig.in validwrites:
                    >Debuggers are over-rated by some people, and can be a huge time sink if
                    >used flailingly.
                    >
                    What total and utter nonsense.
                    You're claiming that no-one over-rates debuggers, and that even if
                    used flailingly are never time-sinks. Both claims seem to be
                    implausible.

                    I think your knee is jerking.

                    --
                    /Questions? Answers! Answers? Questions!/ - Focus

                    Hewlett-Packard Limited registered office: Cain Road, Bracknell,
                    registered no: 690597 England Berks RG12 1HN

                    Comment

                    • Ben Bacarisse

                      #70
                      Re: problem with output of the program on different OS

                      pereges <Broli00@gmail. comwrites:
                      Yes, I fixed the bug that you had pointed out( uninitialized members
                      of E_incident and E_scattered) and other bugs that many people have
                      pointed out.
                      Did I miss the message where you explained that?

                      --
                      Ben.

                      Comment

                      • Ben Bacarisse

                        #71
                        Re: problem with output of the program on different OS

                        pereges <Broli00@gmail. comwrites:
                        On May 13, 8:46 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                        >
                        >If I correct it by setting the missing parts of the complex numbers to
                        >0 I get this:
                        >>
                        >Es: 7.391785e-11 Ei: 1.112194e-04 RCS: 8.351773e+00
                        >>
                        >which is exactly what gcc 4.2.3 and lcc-win32 give me. It seems to be
                        >a bug of the most ordinary nature.
                        >>
                        I was intrigued in why 1 compiler out of 4 gave a different result,
                        and tracked it down to this behaviour
                        >>
                        >I am far from sure that you have. Does the version you traced have
                        >uninitialise d data and does the problem remain when you add the two
                        >extra zero initialisations ? If, so I will agree it looks odd, but so
                        >far it seems to be a common all-garden bug in the code.
                        >
                        Are you really getting this output on PellesC after fixing the bug?
                        Yes.
                        What is the version ?
                        4.50.113
                        I'm using PellesC 5.00.4 and it reports the same value 1.11e+01
                        --
                        Ben.

                        Comment

                        • pereges

                          #72
                          Re: problem with output of the program on different OS

                          On May 13, 5:12 pm, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                          pereges <Brol...@gmail. comwrites:
                          Yes, I fixed the bug that you had pointed out( uninitialized members
                          of E_incident and E_scattered) and other bugs that many people have
                          pointed out.
                          >
                          Did I miss the message where you explained that?
                          >

                          I had missed your post in 2nd page but I think BartC also pointed out
                          that bug. I replied to him on page 2.

                          Comment

                          • Richard Heathfield

                            #73
                            Re: problem with output of the program on different OS

                            Chris Dollin said:
                            Richard wrote:
                            >
                            >Richard Heathfield <rjh@see.sig.in validwrites:
                            >
                            >>Debuggers are over-rated by some people, and can be a huge time sink if
                            >>used flailingly.
                            >>
                            >What total and utter nonsense.
                            >
                            You're claiming that no-one over-rates debuggers, and that even if
                            used flailingly are never time-sinks. Both claims seem to be
                            implausible.
                            Good Lord, that was months ago, wasn't it? Well, days ago, anyway. Is he
                            the tardy one, or are you? :-)
                            I think your knee is jerking.
                            <shrugYou expected something different? Remember that the above idiocy
                            hails from the same stable that can't distinguish between "I rarely do
                            such-and-such" and "I never do such-and-such", or between "I have on a few
                            rare occasions been able to do so-and-so" and "I can always do so-and-so".
                            Time spent arguing with such people is generally time wasted (which
                            doesn't necessarily mean it can't be fun).

                            --
                            Richard Heathfield <http://www.cpax.org.uk >
                            Email: -http://www. +rjh@
                            Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                            "Usenet is a strange place" - dmr 29 July 1999

                            Comment

                            • Ben Bacarisse

                              #74
                              Re: problem with output of the program on different OS

                              Bart <bc@freeuk.comw rites:
                              On May 13, 4:46 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                              >Bart <b...@freeuk.co mwrites:
                              On May 13, 1:39 am, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                              >Bart <b...@freeuk.co mwrites:
                              On May 12, 6:27 pm, pereges <Brol...@gmail. comwrote:
                              >
                              I would expect the answer to be wrong by one bit or so. My gcc/3.4.5
                              gives -5e-16.
                              >>
                              >Mine is 4.2.3.  The point is I think 0 is a correct and permitted
                              >answer.  It is not at all strange (to me).
                              >>
                              >This has got so surreal that I have just downloaded Pelles C.  If I
                              >leave the bug in I get this output:
                              >>
                              >Es: inf Ei: 1.112194e-04 RCS: inf
                              >
                              I was also getting these weird results. I independently made the
                              initialisation change
                              Ah. Vital info. Without that, I thought you and the OP were
                              discussing the behaviour of an undefined program!
                              and got the results of 7.39... except for Pelles
                              which gave 7.08...;
                              >
                              >>
                              >If I correct it by setting the missing parts of the complex numbers to
                              >0 I get this:
                              >>
                              >Es: 7.391785e-11 Ei: 1.112194e-04 RCS: 8.351773e+00
                              >>
                              >which is exactly what gcc 4.2.3 and lcc-win32 give me.  It seems to be
                              >a bug of the most ordinary nature.
                              >
                              My Pelles and the OP's gave 7.08....
                              >
                              In fact my Pelles was an old V2.9, I just downloaded the new version
                              and it gave the same results, namely 7.08.
                              >
                              I was intrigued in why 1 compiler out of 4 gave a different result,
                              and tracked it down to this behaviour
                              >>
                              >I am far from sure that you have.  Does the version you traced have
                              >uninitialise d data and does the problem remain when you add the two
                              >extra zero initialisations ?  If, so I will agree it looks odd, but so
                              >far it seems to be a common all-garden bug in the code.
                              >
                              There's lots of other uninitialised data which may or may not be
                              affecting anything else. However the 7.08/7.39 discrepancy *was*
                              traced to these zero/near zero values.
                              Then the problem (or the algorithm) may be unstable, but I am not
                              persuaded that the compiler was wrong to produce 0 for the arithmetic
                              test case you posted a while back.

                              --
                              Ben.

                              Comment

                              • Bart

                                #75
                                Re: problem with output of the program on different OS

                                On May 13, 1:58 pm, Ben Bacarisse <ben.use...@bsb .me.ukwrote:
                                Bart <b...@freeuk.co mwrites:
                                There's lots of other uninitialised data which may or may not be
                                affecting anything else. However the 7.08/7.39 discrepancy *was*
                                traced to these zero/near zero values.
                                >
                                Then the problem (or the algorithm) may be unstable, but I am not
                                persuaded that the compiler was wrong to produce 0 for the arithmetic
                                test case you posted a while back.
                                There was a /possibility/ of a compiler bug, but until I found one
                                reason for the discrepancy, we didn't know that for sure. Now it seems
                                Pelles was correct and the others wrong! (Because by chance Pelles
                                avoid the near-zero values that caused the indeterminancy later on.)

                                A proper fix is now up to the OP; I've already suggested care testing
                                values against /exactly/ 0.0 or 1.0.

                                --
                                Bartc

                                Comment

                                Working...