ultra-fast loop unrolling with g++ -O3

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • mark

    ultra-fast loop unrolling with g++ -O3

    Why does the following excerpt of trivial code execute so quickly?

    #include <stdio.h>
    #include <stdlib.h>
    int main(int argc, char *argv[]){
    static const int SIZE = 1000000;
    long nops = 0;
    int i, j;
    long int outer = atol(argv[1]);
    for(i=0; i < outer; i++){
    for(j=0; j < SIZE; j++){
    ++nops;
    // arr[j] = arr[j] + 1;
    } //for j
    } //for i
    printf("ran %ld ops\n", nops);
    } //main

    I compiled this with g++ -O3.
    When ran with 50000000000 as an argument, the nops variable is updated
    500000000000000 00 times. Including loop logic this should take
    forever on my 2ghz computer. Yet it runs instantly. I used input from
    the command line so that nops simply isn't pre-calculated.

    This came about when trying to speed-test C arrays with C++ vectors;
    originally the code had an array-update line in the center of the
    loops. The vector version was crawling versus the C array (both
    compiled with -O3).

    What compile/hardware magic is going on, and is it possible to speed
    up the vector with it?

  • Johannes Bauer

    #2
    Re: ultra-fast loop unrolling with g++ -O3

    mark schrieb:
    What compile/hardware magic is going on, and is it possible to speed
    up the vector with it?
    1. Your loop is optimized away.
    2. Probably not, unless you use vectory which don't do anything useful.
    Then yes.

    Regards,
    Johannes

    --
    "Wer etwas kritisiert muss es noch lange nicht selber besser können. Es
    reicht zu wissen, daß andere es besser können und andere es auch
    besser machen um einen Vergleich zu bringen." - Wolfgang Gerber
    in de.sci.electron ics <47fa8447$0$115 45$9b622d9e@new s.freenet.de>

    Comment

    • Richard Heathfield

      #3
      Re: ultra-fast loop unrolling with g++ -O3

      mark said:
      Why does the following excerpt of trivial code execute so quickly?
      >
      #include <stdio.h>
      #include <stdlib.h>
      int main(int argc, char *argv[]){
      static const int SIZE = 1000000;
      long nops = 0;
      int i, j;
      long int outer = atol(argv[1]);
      for(i=0; i < outer; i++){
      for(j=0; j < SIZE; j++){
      ++nops;
      // arr[j] = arr[j] + 1;
      } //for j
      } //for i
      printf("ran %ld ops\n", nops);
      } //main
      >
      I compiled this with g++ -O3.
      When ran with 50000000000 as an argument, the nops variable is updated
      500000000000000 00 times.
      Check the result of your atol call, using

      printf("%ld\n", outer);

      On my system, for your given input the output of that printf is: 2147483647

      That's still a pretty big number, but 50000000000 it ain't.
      I used input from
      the command line so that nops simply isn't pre-calculated.
      If your program is correct according to the language rules, the compiler is
      allowed to produce any object code at all, provided that it has the same
      effect on output. (If your program is *not* correct, the compiler doesn't
      even have /that/ restriction.) So, for example, the compiler is allowed to
      reason thusly:

      "Okay, so we start nops at 0. It's in a loop nest that is executed outer *
      SIZE times, and it's incremented once per iteration, and that's the only
      thing the loop nest does, so we can replace the whole loop nest with:

      nops = outer * SIZE;
      i = outer;
      j = SIZE;
      What compile/hardware magic is going on, and is it possible to speed
      up the vector with it?
      You might want to ask the vector thing in comp.lang.c++, where they know about
      things like that.

      --
      Richard Heathfield <http://www.cpax.org.uk >
      Email: -http://www. +rjh@
      Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
      "Usenet is a strange place" - dmr 29 July 1999

      Comment

      • santosh

        #4
        Re: ultra-fast loop unrolling with g++ -O3

        mark wrote:
        Why does the following excerpt of trivial code execute so quickly?
        >
        #include <stdio.h>
        #include <stdlib.h>
        int main(int argc, char *argv[]){
        static const int SIZE = 1000000;
        Better to use long to be safe.
        long nops = 0;
        unsigned long will give you even more range.
        int i, j;
        long int outer = atol(argv[1]);
        for(i=0; i < outer; i++){
        What if 'i' overflows. Make 'i' and 'j' unsigned long or long.
        for(j=0; j < SIZE; j++){
        ++nops;
        // arr[j] = arr[j] + 1;
        } //for j
        } //for i
        printf("ran %ld ops\n", nops);
        } //main
        >
        I compiled this with g++ -O3.
        When ran with 50000000000 as an argument, the nops variable is updated
        500000000000000 00 times. Including loop logic this should take
        forever on my 2ghz computer. Yet it runs instantly. I used input from
        the command line so that nops simply isn't pre-calculated.
        The compiler has optimised the loop away. It simply computes SIZE *
        outer and assigns the product to nops. The loop will be left untouched
        if you qualify nops with volatile.
        What compile/hardware magic is going on, and is it possible to speed
        up the vector with it?
        You'll have to ask in comp.lang.c++.

        Comment

        • CBFalconer

          #5
          Re: ultra-fast loop unrolling with g++ -O3

          mark wrote:
          >
          Why does the following excerpt of trivial code execute so quickly?
          >
          #include <stdio.h>
          #include <stdlib.h>
          int main(int argc, char *argv[]){
          static const int SIZE = 1000000;
          long nops = 0;
          int i, j;
          long int outer = atol(argv[1]);
          for(i=0; i < outer; i++){
          for(j=0; j < SIZE; j++){
          ++nops;
          // arr[j] = arr[j] + 1;
          } //for j
          } //for i
          printf("ran %ld ops\n", nops);
          } //main
          >
          I compiled this with g++ -O3.
          The optimizer can see that nothing done within the loops is
          preserved, other than the counter. It can also easily see that the
          inner loop is performed (SIZE * outer) times, so all it has to do
          is multiply SIZE by outer and add that to nops. If nothing
          overflows this doesn't take very long. Try using gcc rather than
          g++ to avoid C++ problems, and use -O0 (thats letter O followed by
          digit 0) to avoid any optimization.

          BTW, don't use // comments in Usenet, regardless of your compilers
          capabilities. They don't survive line wraps very well. Standard
          comments do much better.

          You should also return a value from an int function. Zero will do
          nicely.

          --
          [mail]: Chuck F (cbfalconer at maineline dot net)
          [page]: <http://cbfalconer.home .att.net>
          Try the download section.

          ** Posted from http://www.teranews.com **

          Comment

          Working...