Computation slow with float than double.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Michele Guidolin

    Computation slow with float than double.

    Hello to everybody.

    I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
    dimensional grid of different size and type, I have some strange result
    when I change the computation from double to float.

    Here are the time of test with different grid SIZE and type:

    SIZE 128 256 512

    float 2.20s 2.76s 7.86s

    double 2.30s 2.47s 2.59s

    As you can see when the grid has a size of 256 node the code with float
    type increase the time drastically.

    What could be the problem? could be the cache? Should the float
    computation always fastest than double?

    Hope to receive an answer as soon as possible,
    Thanks

    Michele Guidolin.
    P.S.

    Here are some more information about the test:

    The code that I'm testing is this and it is the same for the double
    version (the constant are not 0.25f but 0.25).

    ------------- CODE -------------

    float u[SIZE][SIZE];
    float rhs[SIZE][SIZE];

    inline void gs_relax(int i,int j)
    {

    u[i][j] = ( rhs[i][j] +
    0.0f * u[i][j] +
    0.25f* u[i+1][j]+
    0.25f* u[i-1][j]+
    0.25f* u[i][j+1]+
    0.25f* u[i][j-1]);
    }

    void gs_step_fusion( )
    {
    int i,j;

    /* update the red points:
    */

    for(j=1; j<SIZE-1; j=j+2)
    {
    gs_relax(1,j);
    }
    for(i=2; i<SIZE-1; i++)
    {
    for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
    {
    gs_relax(i,j);
    gs_relax(i-1,j);
    }

    }
    for(j=1; j<SIZE-1; j=j+2)
    {
    gs_relax(SIZE-2,j);
    }

    }
    ---------------CODE--------------

    I'm testing this code on this machine:

    processor : 0
    vendor_id : GenuineIntel
    cpu family : 15
    model : 4
    model name : Intel(R) Pentium(R) 4 CPU 3.20GHz
    stepping : 1
    cpu MHz : 3192.311
    cache size : 1024 KB
    physical id : 0
    siblings : 2
    fdiv_bug : no
    hlt_bug : no
    f00f_bug : no
    coma_bug : no
    fpu : yes
    fpu_exception : yes
    cpuid level : 3
    wp : yes
    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
    mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
    monitor ds_cpl cid
    bogomips : 6324.22

    with Hyper threading enable on Linux 2.6.8.

    The compiler is gcc 3.4.4 and the flags are:
    CFLAGS = -g -O2 -funroll-loops -msse2 -march=pentium4 -Wall
  • those who know me have no need of my name

    #2
    Re: Computation slow with float than double.

    in comp.lang.c i read:
    [color=blue]
    >I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
    >dimensional grid[/color]
    [color=blue]
    >As you can see when the grid has a size of 256 node the code with float
    >type increase the time drastically.
    >
    >What could be the problem? could be the cache? Should the float
    >computation always fastest than double?[/color]

    most likely your system does all floating point computations using a
    precision greater than float, then reduces the result when the value must
    be stored, which happens more often as you increase the size of the table.

    --
    a signature

    Comment

    • Eric Sosman

      #3
      Re: Computation slow with float than double.



      Michele Guidolin wrote:[color=blue]
      > Hello to everybody.
      >
      > I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
      > dimensional grid of different size and type, I have some strange result
      > when I change the computation from double to float.
      >
      > Here are the time of test with different grid SIZE and type:
      >
      > SIZE 128 256 512
      >
      > float 2.20s 2.76s 7.86s
      >
      > double 2.30s 2.47s 2.59s
      >
      > As you can see when the grid has a size of 256 node the code with float
      > type increase the time drastically.[/color]

      I see a modest increase at 256 and a huge increase at 512.
      Have there been any transcription errors?

      I also see that the code you didn't show probably accounts
      for the lion's share of the running time, which casts suspicion
      on drawing too many conclusions from a couple of experiments.
      The running time of the posted code should increase (roughly)
      as the square of SIZE, so changing SIZE from 128 to 512 should
      inflate its running time by a factor of (about) sixteen. Yet
      this supposed sixteen-fold increase added only 0.29 seconds to
      the running time for "double;" a straightforward calculation
      (based on data of unknown accuracy, to be sure) suggests that
      the rest of the program accounts for 89% or more of the time
      in that case, and even more in the other two.

      ... and if such a large portion of the total time resides
      "elsewhere, " it would be unwise to draw too many conclusions
      until the contributions of "elsewhere" are better characterized,
      or better controlled for (e.g., by repeated experiment and
      statistical analysis).
      [color=blue]
      > What could be the problem? could be the cache? Should the float
      > computation always fastest than double?[/color]

      Cache might be a problem. So might alignment, or other
      competing processes on the machine. If you're reading the
      initial data from a file, perhaps one test paid the penalty of
      actually reading from the disk while the others benefitted from
      the file system's cache. Or maybe the disk is just beginning
      to go sour, and the O/S relocated an entire track of data in
      the middle of one test. Or maybe the phase of the moon wasn't
      propitious.

      Should float always be faster than double? No, the C language
      Standard is silent on matters of speed (which makes the entire
      discussion off-topic here, or at least slightly so). You've shown
      some puzzling data, but you need more data and more analysis to
      draw good conclusions, and the results you eventually get will
      most likely be relevant only to the system you got them on, and
      not to the C language. I'd suggest further experimentation , and
      a change to a newsgroup devoted to your system, where the experts
      on your system's quirks hang out.

      --
      Eric.Sosman@sun .com

      Comment

      • CBFalconer

        #4
        Re: Computation slow with float than double.

        Michele Guidolin wrote:[color=blue]
        >[/color]
        .... snip ...[color=blue]
        >
        > Here are the time of test with different grid SIZE and type:
        >
        > SIZE 128 256 512
        > float 2.20s 2.76s 7.86s
        > double 2.30s 2.47s 2.59s
        >
        > As you can see when the grid has a size of 256 node the code with
        > float type increase the time drastically.
        >
        > What could be the problem? could be the cache? Should the float
        > computation always fastest than double?[/color]

        C real computations are always done as doubles by default. When
        you specify floats you are primarily constricting the storage, and
        are causing float->double->float conversions to be done. These are
        eating up the time.

        --
        "If you want to post a followup via groups.google.c om, don't use
        the broken "Reply" link at the bottom of the article. Click on
        "show options" at the top of the article, then click on the
        "Reply" at the bottom of the article headers." - Keith Thompson


        Comment

        • Lawrence Kirby

          #5
          Re: Computation slow with float than double.

          On Mon, 06 Jun 2005 20:54:56 +0000, CBFalconer wrote:
          [color=blue]
          > Michele Guidolin wrote:[color=green]
          >>[/color]
          > ... snip ...[color=green]
          >>
          >> Here are the time of test with different grid SIZE and type:
          >>
          >> SIZE 128 256 512
          >> float 2.20s 2.76s 7.86s
          >> double 2.30s 2.47s 2.59s
          >>
          >> As you can see when the grid has a size of 256 node the code with
          >> float type increase the time drastically.
          >>
          >> What could be the problem? could be the cache? Should the float
          >> computation always fastest than double?[/color]
          >
          > C real computations are always done as doubles by default.[/color]

          That was true in K&R C, but not in standard C. An implementation CAN
          perform calculations in greater precision than the representation of the
          type but it is not required to.
          [color=blue]
          > When
          > you specify floats you are primarily constricting the storage, and
          > are causing float->double->float conversions to be done. These are
          > eating up the time.[/color]

          Perhaps. But on common architectures it is typically the case that float
          operations are performed using float precision or else loading/saving a
          float sized object in memory to/from a wider register is no more expensing
          than a double sized object in memory.

          Lawrence







          Comment

          • Eric Sosman

            #6
            Re: Computation slow with float than double.



            CBFalconer wrote:[color=blue]
            > Michele Guidolin wrote:
            >
            > ... snip ...
            >[color=green]
            >>Here are the time of test with different grid SIZE and type:
            >>
            >>SIZE 128 256 512
            >>float 2.20s 2.76s 7.86s
            >>double 2.30s 2.47s 2.59s
            >>
            >>As you can see when the grid has a size of 256 node the code with
            >>float type increase the time drastically.
            >>
            >>What could be the problem? could be the cache? Should the float
            >>computation always fastest than double?[/color]
            >
            >
            > C real computations are always done as doubles by default. When
            > you specify floats you are primarily constricting the storage, and
            > are causing float->double->float conversions to be done. These are
            > eating up the time.[/color]

            That was true in pre-Standard days, but ever since
            C89 the implementation has been allowed to use `float'
            arithmetic when only `float' operands are involved. Not
            all implementations do so (and I don't know whether the
            O.P.'s does), but it's no longer a certainty that the
            conversions are occurring. C99 6.3.1.8 or C89 3.2.1.5;
            I don't have the section number for C90.

            --
            Eric.Sosman@sun .com

            Comment

            • Christian Bau

              #7
              Re: Computation slow with float than double.

              In article <newscache$0vip hi$xhe$1@weblab .ucd.ie>,
              Michele Guidolin <"michele dot guidolin at ucd dot ie"> wrote:
              [color=blue]
              > Hello to everybody.
              >
              > I'm doing some benchmark about a red black Gauss Seidel algorithm with 2
              > dimensional grid of different size and type, I have some strange result
              > when I change the computation from double to float.
              >
              > Here are the time of test with different grid SIZE and type:
              >
              > SIZE 128 256 512
              >
              > float 2.20s 2.76s 7.86s
              >
              > double 2.30s 2.47s 2.59s[/color]

              As a rule of thumb: Accessing array elements at a distance that is a
              large power of two is asking for trouble (performance wise).

              Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
              500?

              Comment

              • Michele Guidolin

                #8
                Re: Computation slow with float than double.

                Christian Bau wrote:[color=blue][color=green]
                >>
                >>SIZE 128 256 512
                >>
                >>float 2.20s 2.76s 7.86s
                >>
                >>double 2.30s 2.47s 2.59s[/color]
                >
                >
                > As a rule of thumb: Accessing array elements at a distance that is a
                > large power of two is asking for trouble (performance wise).
                >
                > Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
                > 500?[/color]


                OK! I tried some more test with different SIZE of grid, in the precedent
                message I forgot to say that the number of loop is proportional of SIZE
                of grid, but the different time between two different SIZE shouldn't be
                considerate realy proportonial.

                -------code ----
                ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));

                gettimeofday(&s ubmit_time, 0);

                for(iter=0; iter<ITERATIONS ; iter++)
                gs_step_fusion( );

                gettimeofday(&c omplete_time, 0);


                -------code -----

                Moreover the time considerer only the loop itself and not other things,
                like data initialization and print of result.

                The new time test are:

                SIZE 100 200 300 400 500 513
                Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
                Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s

                When I use a profiler it show me that the 95% of time is on this two
                function:

                for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
                {
                gs_relax(i,j); // 45%
                gs_relax(i-1,j); // 45%
                }

                So I still doesn't understand why the float version is going so slowy.
                Any help?

                Thaks for answer.

                Michele Guidolin

                Comment

                • Lawrence Kirby

                  #9
                  Re: Computation slow with float than double.

                  On Tue, 07 Jun 2005 11:46:54 +0100, Michele Guidolin wrote:
                  [color=blue]
                  > Christian Bau wrote:[color=green][color=darkred]
                  >>>
                  >>>SIZE 128 256 512
                  >>>
                  >>>float 2.20s 2.76s 7.86s
                  >>>
                  >>>double 2.30s 2.47s 2.59s[/color]
                  >>
                  >>
                  >> As a rule of thumb: Accessing array elements at a distance that is a
                  >> large power of two is asking for trouble (performance wise).
                  >>
                  >> Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
                  >> 500?[/color]
                  >
                  >
                  > OK! I tried some more test with different SIZE of grid, in the precedent
                  > message I forgot to say that the number of loop is proportional of SIZE
                  > of grid, but the different time between two different SIZE shouldn't be
                  > considerate realy proportonial.
                  >
                  > -------code ----
                  > ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));[/color]

                  It is better to do integer calculations in integer arithmetic if you can.
                  Anso consider that C only requires int to be able to represent number in
                  the range -32767 to 32767. So you might use something like

                  ITERATIONS = (1L << 28) / ((long)SIZE * SIZE);
                  [color=blue]
                  > gettimeofday(&s ubmit_time, 0);[/color]

                  gettimeofday() isn't standard C. You can use the standard clock() function
                  to measure CPU time used.
                  [color=blue]
                  > for(iter=0; iter<ITERATIONS ; iter++)
                  > gs_step_fusion( );
                  >
                  > gettimeofday(&c omplete_time, 0);
                  >
                  >
                  > -------code -----
                  >
                  > Moreover the time considerer only the loop itself and not other things,
                  > like data initialization and print of result.
                  >
                  > The new time test are:
                  >
                  > SIZE 100 200 300 400 500 513
                  > Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
                  > Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s
                  >
                  > When I use a profiler it show me that the 95% of time is on this two
                  > function:
                  >
                  > for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)[/color]

                  What is i? Is this an inner loop?
                  [color=blue]
                  > {
                  > gs_relax(i,j); // 45%
                  > gs_relax(i-1,j); // 45%[/color]

                  This suggests that you need to look in gs_relax to see what is happening.
                  [color=blue]
                  > }
                  > }
                  > So I still doesn't understand why the float version is going so slowy.
                  > Any help?[/color]

                  You have yet to show any code that accesses float or double data.

                  Lawrence

                  Comment

                  • Tim Prince

                    #10
                    Re: Computation slow with float than double.


                    "Michele Guidolin" <"michele dot guidolin at ucd dot ie"> wrote in message
                    news:newscache$ fsqqhi$o85$1@we blab.ucd.ie...[color=blue]
                    > Christian Bau wrote:[color=green][color=darkred]
                    >>>
                    >>>SIZE 128 256 512
                    >>>
                    >>>float 2.20s 2.76s 7.86s
                    >>>
                    >>>double 2.30s 2.47s 2.59s[/color]
                    >>
                    >>
                    >> As a rule of thumb: Accessing array elements at a distance that is a
                    >> large power of two is asking for trouble (performance wise).
                    >>
                    >> Any reason why you choose powers of two? Why not SIZE = 50, 100, 200,
                    >> 500?[/color]
                    >
                    >
                    > OK! I tried some more test with different SIZE of grid, in the precedent
                    > message I forgot to say that the number of loop is proportional of SIZE
                    > of grid, but the different time between two different SIZE shouldn't be
                    > considerate realy proportonial.
                    >
                    > -------code ----
                    > ITERATIONS = ((int)(pow(2.0, 28.0))/(pow((double)SI ZE,2.0)));
                    >
                    > gettimeofday(&s ubmit_time, 0);
                    >
                    > for(iter=0; iter<ITERATIONS ; iter++)
                    > gs_step_fusion( );
                    >
                    > gettimeofday(&c omplete_time, 0);
                    >
                    >
                    > -------code -----
                    >
                    > Moreover the time considerer only the loop itself and not other things,
                    > like data initialization and print of result.
                    >
                    > The new time test are:
                    >
                    > SIZE 100 200 300 400 500 513
                    > Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
                    > Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s
                    >
                    > When I use a profiler it show me that the 95% of time is on this two
                    > function:
                    >
                    > for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
                    > {
                    > gs_relax(i,j); // 45%
                    > gs_relax(i-1,j); // 45%
                    > }
                    >
                    > So I still doesn't understand why the float version is going so slowy.
                    > Any help?[/color]
                    I didn't like to attempt an answer, as I wasn't certain whether your options
                    invoke SSE code generation. Several other answers seemed to imply that
                    people thought so, but weren't certain. Maybe attacking the problem more
                    directly makes it off topic for c.l.c, but I've already seen plenty of
                    answers which don't look like pure Standard C information.
                    When you divide your grid more finely, are you running into gradual
                    underflow? If so, what happens when you invoke abrupt underflow, as
                    gcc -O2 -funroll-loops -march=pentium4 -mfpmath=sse -ffast-math
                    might do? Most compilers have gradual underflow on as a default, since it
                    is required according to IEEE standard, and turn it off either by a specific
                    option or as a part of some "fast" package.
                    Gradual underflow is quite slow on early P4 steppings, in case you didn't
                    believe this question could go far OFF TOPIC.


                    Comment

                    • Michele Guidolin

                      #11
                      Re: Computation slow with float than double.

                      Lawrence Kirby wrote:[color=blue][color=green]
                      >>Moreover the time considerer only the loop itself and not other things,
                      >>like data initialization and print of result.
                      >>
                      >>The new time test are:
                      >>
                      >>SIZE 100 200 300 400 500 513
                      >>Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
                      >>Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s
                      >>
                      >>When I use a profiler it show me that the 95% of time is on this two
                      >>function:
                      >>
                      >> for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)[/color]
                      >
                      >
                      > What is i? Is this an inner loop?
                      >
                      >[color=green]
                      >> {
                      >> gs_relax(i,j); // 45%
                      >> gs_relax(i-1,j); // 45%[/color]
                      >
                      >
                      > This suggests that you need to look in gs_relax to see what is happening.
                      >
                      >[color=green]
                      >> }
                      >> }
                      >>So I still doesn't understand why the float version is going so slowy.
                      >>Any help?[/color]
                      >
                      >
                      > You have yet to show any code that accesses float or double data.
                      >
                      > Lawrence[/color]

                      The gs_relax simply do a Gauss Seidel red black relaxion.
                      I already posted the code in the first message, but I post it again.
                      The double version is exactly the same (with the constant 0.25 and not
                      0.25f).

                      I realy don't understand why the float version is going so slowly whit a
                      SIZE > 300. Maybe gcc bug?
                      If someone has an idea will be very appreciate.
                      Thanks
                      Michele.

                      ------------- CODE -------------

                      float u[SIZE][SIZE];
                      float rhs[SIZE][SIZE];

                      inline void gs_relax(int i,int j)
                      {

                      u[i][j] = ( rhs[i][j] +
                      0.0f * u[i][j] +
                      0.25f* u[i+1][j]+
                      0.25f* u[i-1][j]+
                      0.25f* u[i][j+1]+
                      0.25f* u[i][j-1]);
                      }

                      void gs_step_fusion( )
                      {
                      int i,j;

                      /* update the red points:
                      */

                      for(j=1; j<SIZE-1; j=j+2)
                      {
                      gs_relax(1,j);
                      }
                      for(i=2; i<SIZE-1; i++)
                      {
                      for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
                      {
                      gs_relax(i,j);
                      gs_relax(i-1,j);
                      }

                      }
                      for(j=1; j<SIZE-1; j=j+2)
                      {
                      gs_relax(SIZE-2,j);
                      }

                      }
                      ---------------CODE--------------

                      Comment

                      • Christian Bau

                        #12
                        Re: Computation slow with float than double.

                        In article <newscache$r3vq hi$er6$1@weblab .ucd.ie>,
                        Michele Guidolin <"michele dot guidolin at ucd dot ie"> wrote:
                        [color=blue]
                        > I realy don't understand why the float version is going so slowly whit a
                        > SIZE > 300. Maybe gcc bug?[/color]

                        Is the "double" version at SIZE > 300 slow as well?

                        Slowness would be expected when things exceed cache size. 300x300 floats
                        would be 360,000 bytes. The "double" version should slow down a bit
                        earlier.

                        Comment

                        • Dik T. Winter

                          #13
                          Re: Computation slow with float than double.

                          In article <Qjgpe.928$Z44. 503@newssvr13.n ews.prodigy.com > "Tim Prince" <tprince@nospam computer.org> writes:
                          ....[color=blue]
                          > When you divide your grid more finely, are you running into gradual
                          > underflow?[/color]

                          That might very well be the case. In float that happens much earlier than
                          in double.
                          [color=blue]
                          > If so, what happens when you invoke abrupt underflow, as
                          > gcc -O2 -funroll-loops -march=pentium4 -mfpmath=sse -ffast-math
                          > might do?[/color]

                          Another option would be to shift the origin of the coordinate system.
                          [color=blue]
                          > Most compilers have gradual underflow on as a default, since it
                          > is required according to IEEE standard, and turn it off either by a specific
                          > option or as a part of some "fast" package.
                          > Gradual underflow is quite slow on early P4 steppings, in case you didn't
                          > believe this question could go far OFF TOPIC.[/color]

                          The main reason is that gradual underflow on most systems is not handled
                          by the processor, but by software. And that requires interrupts.
                          --
                          dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
                          home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

                          Comment

                          • robert.thorpe@antenova.com

                            #14
                            Re: Computation slow with float than double.

                            Michele Guidolin wrote:[color=blue]
                            > Lawrence Kirby wrote:[color=green][color=darkred]
                            > >>Moreover the time considerer only the loop itself and not other things,
                            > >>like data initialization and print of result.
                            > >>
                            > >>The new time test are:
                            > >>
                            > >>SIZE 100 200 300 400 500 513
                            > >>Float 2.17s 2.44s 3.35s 5.82s 8.37s 7.98s
                            > >>Double 2.32s 2.34s 2.57s 2.63s 2.63s 2.65s
                            > >>
                            > >>When I use a profiler it show me that the 95% of time is on this two
                            > >>function:
                            > >>
                            > >> for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)[/color]
                            > >
                            > >
                            > > What is i? Is this an inner loop?
                            > >
                            > >[color=darkred]
                            > >> {
                            > >> gs_relax(i,j); // 45%
                            > >> gs_relax(i-1,j); // 45%[/color]
                            > >
                            > >
                            > > This suggests that you need to look in gs_relax to see what is happening.
                            > >
                            > >[color=darkred]
                            > >> }
                            > >> }
                            > >>So I still doesn't understand why the float version is going so slowy.
                            > >>Any help?[/color]
                            > >
                            > >
                            > > You have yet to show any code that accesses float or double data.
                            > >
                            > > Lawrence[/color]
                            >
                            > The gs_relax simply do a Gauss Seidel red black relaxion.
                            > I already posted the code in the first message, but I post it again.
                            > The double version is exactly the same (with the constant 0.25 and not
                            > 0.25f).
                            >
                            > I realy don't understand why the float version is going so slowly whit a
                            > SIZE > 300. Maybe gcc bug?
                            > If someone has an idea will be very appreciate.
                            > Thanks
                            > Michele.
                            >
                            > ------------- CODE -------------
                            >
                            > float u[SIZE][SIZE];
                            > float rhs[SIZE][SIZE];
                            >
                            > inline void gs_relax(int i,int j)
                            > {
                            >
                            > u[i][j] = ( rhs[i][j] +
                            > 0.0f * u[i][j] +
                            > 0.25f* u[i+1][j]+
                            > 0.25f* u[i-1][j]+
                            > 0.25f* u[i][j+1]+
                            > 0.25f* u[i][j-1]);
                            > }
                            >
                            > void gs_step_fusion( )
                            > {
                            > int i,j;
                            >
                            > /* update the red points:
                            > */
                            >
                            > for(j=1; j<SIZE-1; j=j+2)
                            > {
                            > gs_relax(1,j);
                            > }
                            > for(i=2; i<SIZE-1; i++)
                            > {
                            > for(j=1+(i+1)%2 ; j<SIZE-1; j=j+2)
                            > {
                            > gs_relax(i,j);
                            > gs_relax(i-1,j);
                            > }
                            >
                            > }
                            > for(j=1; j<SIZE-1; j=j+2)
                            > {
                            > gs_relax(SIZE-2,j);
                            > }
                            >
                            > }
                            > ---------------CODE--------------[/color]

                            You may be much better asking this question on a gcc specific group
                            such as gnu.gcc.help. It may be an eccentricity of a specific GCC
                            version.

                            Also, do not test this function using all zeros in the arrays.
                            Floating point units often treat zero specially.

                            Comment

                            Working...