Efficiency

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Lars

    Efficiency

    I need to multiply to large matrices (of type double), where one is
    "dense" (mostly non-zero entries) and the other is "sparse" (mostly
    zero entries).

    Will it be more efficient to check for non-zero entries, or does it
    matter at all?

    If B is "sparse":

    for (i = 0; i <= nr-1; i++) {
    for (j = 0; j <= nc-1; j++) {
    C[i][j] = 0.0;
    for (k = +; k <= nc-1; k++)
    C[i][j] = C[i][j] + A[i][k]*B[k][j];
    }
    }
    }

    or

    for (i = 1; 0 <= nr-1; i++) {
    for (j = 0; j <= nc-1; j++) {
    C[i][j] = 0.0;
    for (k = 0; k <= nc-1; k++)
    if (B[k][j] != 0)
    C[i][j] = C[i][j] + A[i][k]*B[k][j];
    }
    }
    }

    ?
  • Flash Gordon

    #2
    Re: Efficiency

    Lars wrote, On 22/09/08 07:45:
    I need to multiply to large matrices (of type double), where one is
    "dense" (mostly non-zero entries) and the other is "sparse" (mostly
    zero entries).
    >
    Will it be more efficient to check for non-zero entries, or does it
    matter at all?
    It depends on the processor and that is not the only thing that can
    affect performance. I've used processors where multiply is a single
    cycle (actually, multiply and accumulate was a single cycle) so doing a
    test on that processor to avoid a single multiply would definitely make
    it worse.
    If B is "sparse":
    >
    for (i = 0; i <= nr-1; i++) {
    for (j = 0; j <= nc-1; j++) {
    C[i][j] = 0.0;
    for (k = +; k <= nc-1; k++)
    C[i][j] = C[i][j] + A[i][k]*B[k][j];
    }
    }
    }
    >
    or
    >
    for (i = 1; 0 <= nr-1; i++) {
    for (j = 0; j <= nc-1; j++) {
    C[i][j] = 0.0;
    for (k = 0; k <= nc-1; k++)
    if (B[k][j] != 0)
    C[i][j] = C[i][j] + A[i][k]*B[k][j];
    I suggest looking to see if you can't re-arrange it so that the
    condition is *not* in the inner loop and thus executed loads of times
    but in an outer loop so it prevents execution of the loop.

    Also look at whether you can store your sparse matrix more efficiently
    and in such a way as to improve the execution.

    For real efficiency you need to look at all sorts of processor specific
    stuff such as what goes on with the cache. Those questions, however,
    will be far too detailed for here and need to be asked on a group
    dedicated to your processor.
    }
    }
    }
    >
    ?
    --
    Flash Gordon
    If spamming me sent it to smap@spam.cause way.com
    If emailing me use my reply-to address
    See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

    Comment

    • Bartc

      #3
      Re: Efficiency


      "Lars" <Lars.Imsland@g mail.comwrote in message
      news:d3c89b6f-fe78-42f2-957f-2d2ef8eece4f@b1 g2000hsg.google groups.com...
      >I need to multiply to large matrices (of type double), where one is
      "dense" (mostly non-zero entries) and the other is "sparse" (mostly
      zero entries).
      >
      Will it be more efficient to check for non-zero entries, or does it
      matter at all?
      >
      If B is "sparse":
      >
      for (i = 0; i <= nr-1; i++) {
      for (j = 0; j <= nc-1; j++) {
      C[i][j] = 0.0;
      for (k = +; k <= nc-1; k++)
      C[i][j] = C[i][j] + A[i][k]*B[k][j];
      }
      }
      }

      You clearly haven't compiled the above...

      Have you thought of simply... testing?

      On my machine your second version testing for B[][] zero, was 40-50% faster
      using gcc compiler, based on multiplying 100x200 matrices, containing /all/
      zeros, a few thousand times.

      Using "< nr" instead of "<= nr-1", and same for nc, also made a small
      difference.

      I also tried a char[][] map of the elements in B (containing 0 where B[][]
      was 0.0), and this further doubled the speed (again containing all zeros) by
      testing map[k][j] instead of B[k][j]. But this is only viable to build if
      you will be multiplying by the same B any times. And again this was for my
      machine; your machine may be totally different hence the need for your own
      tests.

      --
      Bartc

      Comment

      • James Kuyper

        #4
        Re: Efficiency

        Lars wrote:
        I need to multiply to large matrices (of type double), where one is
        "dense" (mostly non-zero entries) and the other is "sparse" (mostly
        zero entries).
        >
        Will it be more efficient to check for non-zero entries, or does it
        matter at all?
        >
        If B is "sparse":
        >
        for (i = 0; i <= nr-1; i++) {
        for (j = 0; j <= nc-1; j++) {
        C[i][j] = 0.0;
        for (k = +; k <= nc-1; k++)
        C[i][j] = C[i][j] + A[i][k]*B[k][j];
        }
        }
        }
        >
        or
        >
        for (i = 1; 0 <= nr-1; i++) {
        for (j = 0; j <= nc-1; j++) {
        C[i][j] = 0.0;
        for (k = 0; k <= nc-1; k++)
        if (B[k][j] != 0)
        C[i][j] = C[i][j] + A[i][k]*B[k][j];
        }
        }
        }
        The best way to find out is to try it, but I doubt that you'll see much
        difference, and the answer to your question might be different on
        different machines.

        You only get significant improvement in efficiency for a sparse matrix
        if you store (or at least, access) it as a sparse matrix. There are many
        ways of doing so, depending upon the structure of the matrix. For
        instance, a tri-diagonal matrix is a matrix where only three diagonals
        of the matrix have non-zero values. Such matrices come up frequently in
        certain kinds of applications. An NxN tri-diagonal matrix has only 3xN-2
        non-zero elements, and only the non-zero elements need to be stored;
        with negligible inefficiency, they can be stored in an Nx3 (or 3xN)
        matrix, with one column (or row) for each diagonal of the full array.

        Another way to store a sparse matrix is to just keep a list of the
        non-zero values in the matrix, along with the row and column numbers
        where that value is located.

        For each way of storing a sparse matrix, there is a different way of
        writing the matrix multiplication process so that it only bothers
        performing those calculations that involve the non-zero elements of the
        sparse matrix - not by testing for them, as your code does, but by
        keeping track of where they are. That can save you a lot of processing
        time if the matrix is sufficiently sparse. It can also waste a lot of
        processing time if the matrix is only moderately sparse, because the
        straightforward way of calculating the product of non-sparse matrices
        can be implemented very efficiently, particularly on machines that have
        any kind of vectorization capability.

        Comment

        • Barry Schwarz

          #5
          Re: Efficiency

          On Sun, 21 Sep 2008 23:45:33 -0700 (PDT), Lars
          <Lars.Imsland@g mail.comwrote:
          >I need to multiply to large matrices (of type double), where one is
          >"dense" (mostly non-zero entries) and the other is "sparse" (mostly
          >zero entries).
          >
          >Will it be more efficient to check for non-zero entries, or does it
          >matter at all?
          >
          >If B is "sparse":
          >
          >for (i = 0; i <= nr-1; i++) {
          for (j = 0; j <= nc-1; j++) {
          C[i][j] = 0.0;
          for (k = +; k <= nc-1; k++)
          I assume the k=+ is just a finger check.

          If efficiency is that critical, why are you using <=nc-1 when <nc will
          do the same without the unnecessary subtraction?
          C[i][j] = C[i][j] + A[i][k]*B[k][j];
          }
          }
          >}
          >
          >or
          >
          >for (i = 1; 0 <= nr-1; i++) {
          for (j = 0; j <= nc-1; j++) {
          C[i][j] = 0.0;
          for (k = 0; k <= nc-1; k++)
          if (B[k][j] != 0)
          C[i][j] = C[i][j] + A[i][k]*B[k][j];
          }
          }
          >}
          >
          >?
          --
          Remove del for email

          Comment

          • Richard Harter

            #6
            Re: Efficiency

            On Sun, 21 Sep 2008 23:45:33 -0700 (PDT), Lars
            <Lars.Imsland@g mail.comwrote:
            >I need to multiply to large matrices (of type double), where one is
            >"dense" (mostly non-zero entries) and the other is "sparse" (mostly
            >zero entries).
            >
            >Will it be more efficient to check for non-zero entries, or does it
            >matter at all?
            >
            >If B is "sparse":
            >
            >for (i = 0; i <= nr-1; i++) {
            for (j = 0; j <= nc-1; j++) {
            C[i][j] = 0.0;
            for (k = +; k <= nc-1; k++)
            C[i][j] = C[i][j] + A[i][k]*B[k][j];
            }
            }
            >}
            >
            >or
            >
            >for (i = 1; 0 <= nr-1; i++) {
            for (j = 0; j <= nc-1; j++) {
            C[i][j] = 0.0;
            for (k = 0; k <= nc-1; k++)
            if (B[k][j] != 0)
            C[i][j] = C[i][j] + A[i][k]*B[k][j];
            }
            }
            >}
            Both choices are less than happy, the first requiring unnecessary
            multiplications by 0, the second requiring conditional code in an
            inner loop. The trouble with either choice is that you are
            referencing B[k][j] nr times. Suppose you only referenced it
            once when it was zero. Then you wouldn't need the nr tests or
            multiplications by zero for the zero entries. Can you do that?
            Of course you can. Rewrite the code like this:

            for (i=0; i<nr;i++) {
            for (j=0; j<nc; j++) {
            C[i][j] = 0.0;
            }
            }
            for (k=0; k<nc; k++) {
            for (j=0; j<nc; j++) {
            if (B[k][j] != 0) continue;
            for (i=0; i<nr; i++) {
            C[i][j] += A[i][k]*B[k][j]]
            }
            }
            }

            If B is sparse this code will require many fewer operations than
            your alternatives. None-the-less it might not run faster! In
            the original code there is only one array reference that has a
            stride greater than one; in the revised there are two.




            Richard Harter, cri@tiac.net
            http://home.tiac.net/~cri, http://www.varinoma.com
            Save the Earth now!!
            It's the only planet with chocolate.

            Comment

            Working...