Efficiency

**Flash Gordon** · Sep 22 '08, 08:05 AM

Re: Efficiency

Lars wrote, On 22/09/08 07:45:

I need to multiply to large matrices (of type double), where one is
"dense" (mostly non-zero entries) and the other is "sparse" (mostly
zero entries).
>
Will it be more efficient to check for non-zero entries, or does it
matter at all?

It depends on the processor and that is not the only thing that can
affect performance. I've used processors where multiply is a single
cycle (actually, multiply and accumulate was a single cycle) so doing a
test on that processor to avoid a single multiply would definitely make
it worse.

If B is "sparse":
>
for (i = 0; i <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = +; k <= nc-1; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
}
>
or
>
for (i = 1; 0 <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = 0; k <= nc-1; k++)
if (B[k][j] != 0)
C[i][j] = C[i][j] + A[i][k]*B[k][j];

I suggest looking to see if you can't re-arrange it so that the
condition is *not* in the inner loop and thus executed loads of times
but in an outer loop so it prevents execution of the loop.

Also look at whether you can store your sparse matrix more efficiently
and in such a way as to improve the execution.

For real efficiency you need to look at all sorts of processor specific
stuff such as what goes on with the cache. Those questions, however,
will be far too detailed for here and need to be asked on a group
dedicated to your processor.

}
}
}
>
?

--
Flash Gordon
If spamming me sent it to smap@spam.cause way.com
If emailing me use my reply-to address
See the comp.lang.c Wiki hosted by me at http://clc-wiki.net/

**Bartc** · Sep 22 '08, 09:25 AM

Re: Efficiency

"Lars" <Lars.Imsland@g mail.comwrote in message
news:d3c89b6f-fe78-42f2-957f-2d2ef8eece4f@b1 g2000hsg.google groups.com...

>I need to multiply to large matrices (of type double), where one is
"dense" (mostly non-zero entries) and the other is "sparse" (mostly
zero entries).
>
Will it be more efficient to check for non-zero entries, or does it
matter at all?
>
If B is "sparse":
>
for (i = 0; i <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = +; k <= nc-1; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
}

You clearly haven't compiled the above...

Have you thought of simply... testing?

On my machine your second version testing for B[][] zero, was 40-50% faster
using gcc compiler, based on multiplying 100x200 matrices, containing /all/
zeros, a few thousand times.

Using "< nr" instead of "<= nr-1", and same for nc, also made a small
difference.

I also tried a char[][] map of the elements in B (containing 0 where B[][]
was 0.0), and this further doubled the speed (again containing all zeros) by
testing map[k][j] instead of B[k][j]. But this is only viable to build if
you will be multiplying by the same B any times. And again this was for my
machine; your machine may be totally different hence the need for your own
tests.

--
Bartc

**James Kuyper** · Sep 22 '08, 11:15 AM

Re: Efficiency

Lars wrote:

I need to multiply to large matrices (of type double), where one is
"dense" (mostly non-zero entries) and the other is "sparse" (mostly
zero entries).
>
Will it be more efficient to check for non-zero entries, or does it
matter at all?
>
If B is "sparse":
>
for (i = 0; i <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = +; k <= nc-1; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
}
>
or
>
for (i = 1; 0 <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = 0; k <= nc-1; k++)
if (B[k][j] != 0)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
}

The best way to find out is to try it, but I doubt that you'll see much
difference, and the answer to your question might be different on
different machines.

You only get significant improvement in efficiency for a sparse matrix
if you store (or at least, access) it as a sparse matrix. There are many
ways of doing so, depending upon the structure of the matrix. For
instance, a tri-diagonal matrix is a matrix where only three diagonals
of the matrix have non-zero values. Such matrices come up frequently in
certain kinds of applications. An NxN tri-diagonal matrix has only 3xN-2
non-zero elements, and only the non-zero elements need to be stored;
with negligible inefficiency, they can be stored in an Nx3 (or 3xN)
matrix, with one column (or row) for each diagonal of the full array.

Another way to store a sparse matrix is to just keep a list of the
non-zero values in the matrix, along with the row and column numbers
where that value is located.

For each way of storing a sparse matrix, there is a different way of
writing the matrix multiplication process so that it only bothers
performing those calculations that involve the non-zero elements of the
sparse matrix - not by testing for them, as your code does, but by
keeping track of where they are. That can save you a lot of processing
time if the matrix is sufficiently sparse. It can also waste a lot of
processing time if the matrix is only moderately sparse, because the
straightforward way of calculating the product of non-sparse matrices
can be implemented very efficiently, particularly on machines that have
any kind of vectorization capability.

**Barry Schwarz** · Sep 23 '08, 02:25 AM

Re: Efficiency

On Sun, 21 Sep 2008 23:45:33 -0700 (PDT), Lars
<Lars.Imsland@g mail.comwrote:

>I need to multiply to large matrices (of type double), where one is
>"dense" (mostly non-zero entries) and the other is "sparse" (mostly
>zero entries).
>
>Will it be more efficient to check for non-zero entries, or does it
>matter at all?
>
>If B is "sparse":
>
>for (i = 0; i <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = +; k <= nc-1; k++)

I assume the k=+ is just a finger check.

If efficiency is that critical, why are you using <=nc-1 when <nc will
do the same without the unnecessary subtraction?

C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
>}
>
>or
>
>for (i = 1; 0 <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = 0; k <= nc-1; k++)
if (B[k][j] != 0)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
>}
>
>?

--
Remove del for email

**Richard Harter** · Sep 23 '08, 04:55 AM

Re: Efficiency

On Sun, 21 Sep 2008 23:45:33 -0700 (PDT), Lars
<Lars.Imsland@g mail.comwrote:

>I need to multiply to large matrices (of type double), where one is
>"dense" (mostly non-zero entries) and the other is "sparse" (mostly
>zero entries).
>
>Will it be more efficient to check for non-zero entries, or does it
>matter at all?
>
>If B is "sparse":
>
>for (i = 0; i <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = +; k <= nc-1; k++)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
>}
>
>or
>
>for (i = 1; 0 <= nr-1; i++) {
for (j = 0; j <= nc-1; j++) {
C[i][j] = 0.0;
for (k = 0; k <= nc-1; k++)
if (B[k][j] != 0)
C[i][j] = C[i][j] + A[i][k]*B[k][j];
}
}
>}

Both choices are less than happy, the first requiring unnecessary
multiplications by 0, the second requiring conditional code in an
inner loop. The trouble with either choice is that you are
referencing B[k][j] nr times. Suppose you only referenced it
once when it was zero. Then you wouldn't need the nr tests or
multiplications by zero for the zero entries. Can you do that?
Of course you can. Rewrite the code like this:

for (i=0; i<nr;i++) {
for (j=0; j<nc; j++) {
C[i][j] = 0.0;
}
}
for (k=0; k<nc; k++) {
for (j=0; j<nc; j++) {
if (B[k][j] != 0) continue;
for (i=0; i<nr; i++) {
C[i][j] += A[i][k]*B[k][j]]
}
}
}

If B is sparse this code will require many fewer operations than
your alternatives. None-the-less it might not run faster! In
the original code there is only one array reference that has a
stride greater than one; in the revised there are two.

Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Efficiency

Efficiency

Comment

Comment

Comment

Comment

Comment