Cannot optimize 64bit Linux code

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • legrape@gmail.com

    Cannot optimize 64bit Linux code

    I am porting a piece of C code to 64bit on Linux. I am using 64bit
    integers. It is a floating point intensive code and when I compile
    (gcc) on 64 bit machine, I don't see any runtime improvement when
    optimizing -O3. If I construct a small program I can get significant
    (>4x) speed improvement using -O3 versus -g. If I compile on a 32 bit
    machine, it runs 5x faster on the 64 bit machine than does the 64bit
    compiled code.

    It seems like something is inhibiting the optimization. Someone on
    comp.lang.fortr an suggested it might be an alignment problem. I am
    trying to go through and eliminate all 32 bit integers righ now (this
    is a pretty large hunk of code). But thought I would survey this
    group, in case it is something naive I am missing.

    Any opinion is welcomed. I really need this to run up to speed, and I
    need the big address space. Thanks in advance.

    Dick
  • Dick Dowell

    #2
    Re: Cannot optimize 64bit Linux code

    Thanks for all the hints and thoughts.

    My small program is:

    main()
    {
    struct timespec ts;
    double x,y;
    int i;
    long long n;
    n=15000000;
    n *= 10000;
    fprintf(stderr, "LONG %Ld\n",n);
    /*
    clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
    */
    printf(" _POSIX_THREAD_C PUTIME _POSIX_CPUTIME %d %d\n",
    _POSIX_THREAD_C PUTIME
    ,_POSIX_CPUTIME );
    clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
    n = ts.tv_nsec;
    fprintf(stderr, "Before %d sec %d nsec\n",ts.tv_s ec,ts.tv_nsec);
    fprintf(stderr, "Before %d sec %Ld nsec\n",ts.tv_s ec,ts.tv_nsec);
    y=3.3;
    for(i=0;i<11110 0000;i++) {
    x=sqrt(y);
    y += 1.0;
    }
    clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
    fprintf(stderr, "After %d sec %d nsec\n",ts.tv_s ec,ts.tv_nsec);
    fprintf(stderr, "After %d sec %Ld nsec\n",ts.tv_s ec,ts.tv_nsec-n);
    }

    It shows considerable improvement with -O3.

    I think the problem is something less esoteric than the cache,
    wordsize, etc. One thing I didn't say, I have multi threading loaded,
    though no new threads are created by these runs. I have tried a newer
    redhat, have not tried Intel compilers.

    Dick

    Comment

    • Dick Dowell

      #3
      Re: Cannot optimize 64bit Linux code

      I think I misspoke on my timer program. That one was used to attempt
      to measure thread time. You can remove the references to the timers
      and run it. It only shows about a 2x improvement on optimization.

      The large difference I have actually seen is 32bit compile on another
      machine, run on 64bit machine (12sec) versus 64bit code compiled on
      64bit machine (70sec).

      Sorry for the confusion.

      Dick

      Comment

      • Walter Roberson

        #4
        Re: Cannot optimize 64bit Linux code

        In article <ee716b06-2f24-487b-a22c-2128a12605da@s1 9g2000prg.googl egroups.com>,
        Dick Dowell <dick.dowell@av agotech.comwrot e:
        >Thanks for all the hints and thoughts.
        >My small program is:
        >main()
        >{
        struct timespec ts;
        double x,y;
        int i;
        long long n;
        n=15000000;
        n *= 10000;
        fprintf(stderr, "LONG %Ld\n",n);
        /*
        clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
        */
        printf(" _POSIX_THREAD_C PUTIME _POSIX_CPUTIME %d %d\n",
        >_POSIX_THREAD_ CPUTIME
        ,_POSIX_CPUTIME );
        clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
        n = ts.tv_nsec;
        fprintf(stderr, "Before %d sec %d nsec\n",ts.tv_s ec,ts.tv_nsec);
        fprintf(stderr, "Before %d sec %Ld nsec\n",ts.tv_s ec,ts.tv_nsec);
        y=3.3;
        for(i=0;i<11110 0000;i++) {
        x=sqrt(y);
        y += 1.0;
        }
        clock_gettime(C LOCK_THREAD_CPU TIME_ID, &ts);
        fprintf(stderr, "After %d sec %d nsec\n",ts.tv_s ec,ts.tv_nsec);
        fprintf(stderr, "After %d sec %Ld nsec\n",ts.tv_s ec,ts.tv_nsec-n);
        >}
        >It shows considerable improvement with -O3.
        You do not do anything with x after you compute it. Any good
        optimizer would optimize away the x=sqrt(y) statement. Once that
        is done, the optimizer could even eliminate the loop completely
        and replace it by y += 111100000. Compilers that did the one or
        both of these optimizations would result in much faster code than
        compilers that did not. Your problem might have nothing to do
        with 64 bit integers and everything to do with which optimizations
        the compiler performs.
        --
        "The human mind is so strangely capricious, that, when freed from
        the pressure of real misery, it becomes open and sensitive to the
        ideal apprehension of ideal calamities." -- Sir Walter Scott

        Comment

        • Dick Dowell

          #5
          Re: Cannot optimize 64bit Linux code

          Thanks for all the suggestions. I've discovered the ineffectiveness
          of optimization is data dependent. I managed to profile the code and
          78% of the runtime is spent in something called

          _mul [1] (from gprof output, the [1] just means #1 cpu user)

          Here's another line from gprof report
          granularity: each sample hit covers 4 byte(s) for 0.01% of 109.71
          seconds

          index % time self children called name
          <spontaneous>
          [1] 78.0 85.55 0.00 __mul [1]
          -----------------------------------------------

          Dick

          Comment

          Working...