How to make it faster?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • istillshine@gmail.com

    How to make it faster?

    When I used gprof to see which function consumed most running time, I
    identified the following one. sz was less than 5000 on average, but
    foo had been called about 1,000,000 times. I have tried using
    "register sum = 0.0" and saw some improvement. My question is how to
    improve foo further to make it faster.

    double foo(double *a, double *b, int sz)
    {

    double sum = 0.0;
    int i;

    for (i = 0; i < sz; i++)
    sum += a[i]*b[i];

    return sum;
    }
  • istillshine@gmail.com

    #2
    Re: How to make it faster?



    istillsh...@gma il.com wrote:
    When I used gprof to see which function consumed most running time, I
    identified the following one. sz was less than 5000 on average, but
    foo had been called about 1,000,000 times. I have tried using
    "register sum = 0.0" and saw some improvement. My question is how to
    improve foo further to make it faster.
    >
    double foo(double *a, double *b, int sz)
    {
    >
    double sum = 0.0;
    int i;
    >
    for (i = 0; i < sz; i++)
    sum += a[i]*b[i];
    >
    return sum;
    }
    Twice faster would help a lot.

    Comment

    • Malcolm McLean

      #3
      Re: How to make it faster?


      <istillshine@gm ail.comwrote in message
      double foo(double *a, double *b, int sz)
      {
      >
      double sum = 0.0;
      int i;
      >
      for (i = 0; i < sz; i++)
      sum += a[i]*b[i];
      >
      return sum;
      }
      Make a and b float arrays, and sum a float too. On some architectures that
      will give about a 2 * speedup. Of course you'll have to use single precision
      throughout the program.
      The code is so simple that there is unlikely to be any sort of other
      micro-optimisiation you can do. However it may be possible to call foo()
      less often by rearranging the algorithm.

      --
      Free games and programming goodies.


      Comment

      • Bartc

        #4
        Re: How to make it faster?


        <istillshine@gm ail.comwrote in message
        news:6db166af-feab-40ee-8ef9-c75d70128b24@k1 0g2000prm.googl egroups.com...
        >
        >
        istillsh...@gma il.com wrote:
        >When I used gprof to see which function consumed most running time, I
        >identified the following one. sz was less than 5000 on average, but
        >foo had been called about 1,000,000 times. I have tried using
        >"register sum = 0.0" and saw some improvement. My question is how to
        >improve foo further to make it faster.
        >>
        >double foo(double *a, double *b, int sz)
        >{
        >>
        > double sum = 0.0;
        > int i;
        >>
        > for (i = 0; i < sz; i++)
        > sum += a[i]*b[i];
        >>
        > return sum;
        >}
        >
        Twice faster would help a lot.
        Suggestions have already been made for improving the loop.

        Reducing the calls from 1000000 will help a lot more, for that the overall
        strategy needs to be looked at.

        Then there's the data: does sz need to be that high? Is there anything
        special about the a and b data (like mostly zeros or ones)?

        How accurate does the sum have to be? If a,b change slowly, perhaps sum only
        every other pair (then double the result)?

        Have you thought about using integers (fixed point)? This might need a
        global change.

        --
        Bart


        Comment

        • Eric Sosman

          #5
          Re: How to make it faster?

          istillshine@gma il.com wrote:
          When I used gprof to see which function consumed most running time, I
          identified the following one. sz was less than 5000 on average, but
          foo had been called about 1,000,000 times. I have tried using
          "register sum = 0.0" and saw some improvement. My question is how to
          improve foo further to make it faster.
          >
          double foo(double *a, double *b, int sz)
          {
          >
          double sum = 0.0;
          int i;
          >
          for (i = 0; i < sz; i++)
          sum += a[i]*b[i];
          >
          return sum;
          }
          With very old compilers you might get some improvement
          by "unrolling" the loop, but most compilers have been able
          to do that on their own for the past couple decades. As
          Willem suggests, examine the generated code to see whether
          the loop is unrolled already -- if it isn't, do so manually,
          but it probably will have been.

          Similarly, you could try rearrangements like

          double foo(double *a, double *b, int sz) {
          double sum = 0.0;
          while (--sz >= 0)
          sum += *a++ * *b++;
          return sum;
          }

          .... but compilers have been doing *that* for even longer
          than they've been unrolling loops. Adding `register' to
          the declarations of a,b,sz and possibly sum might help,
          but again: today's compilers are unlikely to overlook that
          sort of optimization.

          The big payoff -- or potential payoff, because it might
          not be feasible -- would come not from making foo() faster,
          but from calling it less often. You mention a million
          calls to compute dot-products of 5000-place vectors; are
          all those ten billion numbers really independent? If one
          pair is very similar to the next except perhaps for a small
          number of elements, maybe you could just make adjustments
          to the original dot-product instead of recomputing the whole
          thing from scratch:

          double dot = foo(a, b, sz);
          /* Make a change in the k'th place: */
          dot -= a[k] * b[k];
          a[k] = newa;
          b[k] = newb;
          dot += a[k] * b[k];

          If you do this a million times you'll need to worry about
          the propagation of small errors; consider using something
          like Kahan's summation formula (GIYF) for the adjustments.
          (You might also use Kahan's formula for the original sum;
          it's slower, but that might not be an issue if you can get
          a substantial reduction in foo() calls.)

          Or maybe there are many changes to a[] and b[] between
          foo() calls, but they tend to be clustered. In that case
          you might decompose both arrays into 100 or so segments,
          calculate the sum of each, make the changes, and then
          recalculate only the segments that changed. Taken to the
          extreme, this suggests turning the paired vectors into
          trees, with the a[i],b[i] at the leaves, the pairwise sums
          one level up, the four-way sums a level higher still, and
          so on until the grand total is at the root: Changing the
          k'th pair of numbers would then need only about thirteen
          additions for a 5000-element pair. (You could do the
          same thing with higher-degree trees, too.)

          Whatever the situation, I think it unlikely you'll get
          a twofold speedup by tweaking foo() itself; stepping back
          and looking at the wider problem may have more promise.

          --
          Eric.Sosman@sun .com

          Comment

          • Fei Liu

            #6
            Re: How to make it faster?

            istillshine@gma il.com wrote:
            When I used gprof to see which function consumed most running time, I
            identified the following one. sz was less than 5000 on average, but
            foo had been called about 1,000,000 times. I have tried using
            "register sum = 0.0" and saw some improvement. My question is how to
            improve foo further to make it faster.
            >
            double foo(double *a, double *b, int sz)
            {
            >
            double sum = 0.0;
            int i;
            >
            for (i = 0; i < sz; i++)
            sum += a[i]*b[i];
            >
            return sum;
            }
            Nobody has mentioned "loop unrolling"? Intel compilers also support
            embedded prefetch directives if your target platform is Intel based.

            Fei

            Comment

            • Walter Roberson

              #7
              Re: How to make it faster?

              In article <47FA92B5.60207 @gmail.com>, Fei Liu <fei.liu@gmail. comwrote:
              >Nobody has mentioned "loop unrolling"?
              Eric, Willem, and I all mentioned loop unrolling, with Eric and I
              mentioning it directly under that name and Willem mentioning the
              technique without giving it that name.
              --
              "Eightly percent of the people in the world are fools and the
              rest of us are in danger of contamination." -- Walter Matthau

              Comment

              • istillshine@gmail.com

                #8
                Re: How to make it faster?

                On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
                When I used gprof to see which function consumed most running time, I
                identified the following one. sz was less than 5000 on average, but
                foo had been called about 1,000,000 times. I have tried using
                "register sum = 0.0" and saw some improvement. My question is how to
                improve foo further to make it faster.
                >
                double foo(double *a, double *b, int sz)
                {
                >
                double sum = 0.0;
                int i;
                >
                for (i = 0; i < sz; i++)
                sum += a[i]*b[i];
                >
                return sum;
                >
                }
                Suppose in my application I know that a[i]*b[i] can only take a
                limited set of values. For examples, a[i] can only be 0, 0.25, 0.5,
                or 0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
                I thought I could pre-compute their products and fetch them given
                certain a[i] and b[i]. For example,

                if (a[i] == 0.25 && b[i] == 0.2)
                sum += 0.05;
                else if (a[i] == 0.25 && b[i] == 0.4)
                sum += 0.1;
                ....

                But this solution made running time longer. I speculated that
                conditional statements were more expensive than multiplication.

                Do you have any better solution?

                Comment

                • istillshine@gmail.com

                  #9
                  Re: How to make it faster?

                  On Apr 7, 1:39 pm, istillsh...@gma il.com wrote:
                  When I used gprof to see which function consumed most running time, I
                  identified the following one. sz was less than 5000 on average, but
                  foo had been called about 1,000,000 times. I have tried using
                  "register sum = 0.0" and saw some improvement. My question is how to
                  improve foo further to make it faster.
                  >
                  double foo(double *a, double *b, int sz)
                  {
                  >
                  double sum = 0.0;
                  int i;
                  >
                  for (i = 0; i < sz; i++)
                  sum += a[i]*b[i];
                  >
                  return sum;
                  >
                  }
                  Suppose in my application I know that a[i]*b[i] can only take a
                  limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
                  0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
                  I thought I could pre-compute their products and fetch them given
                  certain a[i] and b[i]. For example,

                  if (a[i] == 0.25 && b[i] == 0.2)
                  sum += 0.05;
                  else if (a[i] == 0.25 && b[i] == 0.4)
                  sum += 0.1;
                  ....

                  But the above solution made the running time considerably longer. I
                  speculated that conditional statements were more expensive than
                  multiplication.

                  Are better solutions possible?

                  Comment

                  • Richard Tobin

                    #10
                    Re: How to make it faster?

                    In article <8996f960-c348-43d7-9893-c7284975aaf4@k1 g2000prb.google groups.com>,
                    <istillshine@gm ail.comwrote:
                    >Suppose in my application I know that a[i]*b[i] can only take a
                    >limited set of values. For example, a[i] can only be 0, 0.25, 0.5, or
                    >0.75; b[i] can only be 0, 0.2, 0.4, 0.6, or 0.8.
                    >I thought I could pre-compute their products and fetch them given
                    >certain a[i] and b[i]. For example,
                    >
                    >if (a[i] == 0.25 && b[i] == 0.2)
                    sum += 0.05;
                    >else if (a[i] == 0.25 && b[i] == 0.4)
                    sum += 0.1;
                    >...
                    >
                    >But the above solution made the running time considerably longer. I
                    >speculated that conditional statements were more expensive than
                    >multiplication .
                    Yes, that's quite likely. I would expect a modern compiler to generate
                    good code for a simple loop of multiplications .

                    If the values are from some small set, perhaps it would be better not
                    to store them as floating point numbers. Can you use integers instead
                    by multiplying by a constant?

                    Or you could store the values as small integers even if they are not
                    simple multiples, and have a 2-dimensional array in which you look up
                    the answers.

                    Or perhaps you could split your problem into multiple threads.

                    Of course, if your sequence of multiplications is really so huge that
                    it is important to speed it up, you should also reconsider your
                    algorithm.

                    -- Richard
                    --
                    :wq

                    Comment

                    • Richard Tobin

                      #11
                      Re: How to make it faster?

                      In article <8996f960-c348-43d7-9893-c7284975aaf4@k1 g2000prb.google groups.com>,
                      <istillshine@gm ail.comwrote:
                      >I have tried using "register sum = 0.0" and saw some improvement.
                      Presumably you mean "register double sum = 0.0".

                      I'm surprised that register makes a difference. Are you telling
                      the compiler to optimise?

                      -- Richard
                      --
                      :wq

                      Comment

                      • istillshine@gmail.com

                        #12
                        Re: How to make it faster?

                        On Apr 7, 7:02 pm, rich...@cogsci. ed.ac.uk (Richard Tobin) wrote:
                        In article <8996f960-c348-43d7-9893-c7284975a...@k1 g2000prb.google groups.com>,
                        >
                        <istillsh...@gm ail.comwrote:
                        I have tried using "register sum = 0.0" and saw some improvement.
                        >
                        Presumably you mean "register double sum = 0.0".
                        Yes.
                        >
                        I'm surprised that register makes a difference. Are you telling
                        the compiler to optimise?
                        >
                        -- Richard
                        I did not set -O2 flag.


                        Comment

                        • Keith Thompson

                          #13
                          Re: How to make it faster?

                          istillshine@gma il.com writes:
                          On Apr 7, 7:02 pm, rich...@cogsci. ed.ac.uk (Richard Tobin) wrote:
                          >In article <8996f960-c348-43d7-9893-c7284975a...@k1 g2000prb.google groups.com>,
                          >>
                          > <istillsh...@gm ail.comwrote:
                          >I have tried using "register sum = 0.0" and saw some improvement.
                          >>
                          >Presumably you mean "register double sum = 0.0".
                          >
                          Yes.
                          >
                          >>
                          >I'm surprised that register makes a difference. Are you telling
                          >the compiler to optimise?
                          >
                          I did not set -O2 flag.
                          Or any other optimization flags?

                          You should use the maximum optimization level offered by your compiler
                          and measure the resulting performance before you even *think* about
                          optimizing the source code.

                          --
                          Keith Thompson (The_Other_Keit h) <kst-u@mib.org>
                          Nokia
                          "We must do something. This is something. Therefore, we must do this."
                          -- Antony Jay and Jonathan Lynn, "Yes Minister"

                          Comment

                          • user923005

                            #14
                            Re: How to make it faster?

                            On Apr 7, 9:30 pm, Keith Thompson <ks...@mib.orgw rote:
                            istillsh...@gma il.com writes:
                            On Apr 7, 7:02 pm, rich...@cogsci. ed.ac.uk (Richard Tobin) wrote:
                            In article <8996f960-c348-43d7-9893-c7284975a...@k1 g2000prb.google groups.com>,
                            >
                             <istillsh...@gm ail.comwrote:
                            I have tried using "register sum = 0.0" and saw some improvement.
                            >
                            Presumably you mean "register double sum = 0.0".
                            >
                            Yes.
                            >
                            I'm surprised that register makes a difference.  Are you telling
                            the compiler to optimise?
                            >
                            I did not set -O2 flag.
                            >
                            Or any other optimization flags?
                            >
                            You should use the maximum optimization level offered by your compiler
                            and measure the resulting performance before you even *think* about
                            optimizing the source code.
                            And you should run a profile after you decided that there is definite
                            necessity to speed things up.

                            If we do not know where the program is slow, we might speed up a
                            routine by a factor of 1000 and yet the speedup for the entire
                            application might not even be discernable with careful measurement.

                            Comment

                            • Richard Heathfield

                              #15
                              Re: How to make it faster?

                              user923005 said:

                              <snip>
                              And you should run a profile after you decided that there is definite
                              necessity to speed things up.
                              OP says he is using gprof, so I think he knows about this bit.

                              --
                              Richard Heathfield <http://www.cpax.org.uk >
                              Email: -http://www. +rjh@
                              Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
                              "Usenet is a strange place" - dmr 29 July 1999

                              Comment

                              Working...