Array class and pointer aliasing problems

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Mathieu Benoit

    Array class and pointer aliasing problems

    (I'm having trouble to post this, sorry if this mail comes
    several times)

    I'm trying to optimize the use of an integer array class,
    and I found that one major problem is pointer aliasing.

    I consider the following example:

    --------------------------------------------------------
    class IntTab
    {
    public:
    IntTab(int n, int m) :
    data(new int[n*m]), dim0(n), dim1(m) {
    }
    ~IntTab() {
    delete data;
    }
    int ni() const {
    return dim0;
    }
    int nj() const {
    return dim1;
    }
    int & operator()(int i, int j) {
    return data[i*dim1+j];
    }
    const int & operator()(int i, int j) const {
    return data[i*dim1+j];
    }
    void myjob(IntTab & a) const;
    private:
    int * const data;
    const int dim0;
    const int dim1;
    };

    void myjob(const IntTab & a, IntTab & b)
    {
    int i, j, k;
    int ni = b.ni();
    int nj = b.nj();
    for (i=0, k=0; i<ni; i++)
    for (j=0; j<nj; j++, k++)
    b(k,0) = a(i,j);
    }

    void IntTab::myjob(I ntTab & a) const
    {
    int a_j = a.dim1;
    int ni = dim0;
    int nj = dim1;
    int i, j, k;
    int * a_data = a.data;
    int * my_data = data;
    for (i=0, k=0; i<ni; i++)
    for (j=0; j<nj; j++, k++)
    a_data[k*a_j] = my_data[i*nj+j];
    }
    ---------------------------------------------------------

    "myjob" is poorly optimized by the compiler (g++ 3.2)
    because it assumes that in the process of writing b, I could
    silently update the pointers (a.data and b.data) and the
    index counters (a.dim1 and b.dim1) which are used in the
    loop. Indeed, making these variables local as in
    IntTab::myjob is much better but of course I prefer to write
    my algorithms with the operator() which allows to easily put
    some asserts for debugging.

    Strangely enough, changing my array for an array of double
    leads to the same problem, even with "-fstrict-aliasing".

    Moreover, the "restrict" keyword might help here but it is
    not supported by g++. Is there another way to tell the
    compiler that caching the pointers is safe while preserving
    readability ?

    Is there something special with g++3.2 ?

    Thanks,
    Benoit MATHIEU

  • Jumbo

    #2
    Re: Array class and pointer aliasing problems


    "Mathieu Benoit" <benoit.mathieu 2@free.fr> wrote in message
    news:400a7ccb$0 $22298$626a54ce @news.free.fr.. .[color=blue]
    > (I'm having trouble to post this, sorry if this mail comes
    > several times)
    >
    > I'm trying to optimize the use of an integer array class,
    > and I found that one major problem is pointer aliasing.
    >
    > I consider the following example:
    >
    > --------------------------------------------------------
    > class IntTab
    > {
    > public:
    > IntTab(int n, int m) :
    > data(new int[n*m]), dim0(n), dim1(m) {
    > }
    > ~IntTab() {
    > delete data;
    > }
    > int ni() const {
    > return dim0;
    > }
    > int nj() const {
    > return dim1;
    > }
    > int & operator()(int i, int j) {
    > return data[i*dim1+j];
    > }
    > const int & operator()(int i, int j) const {
    > return data[i*dim1+j];
    > }
    > void myjob(IntTab & a) const;
    > private:
    > int * const data;
    > const int dim0;
    > const int dim1;
    > };
    >
    > void myjob(const IntTab & a, IntTab & b)
    > {
    > int i, j, k;
    > int ni = b.ni();
    > int nj = b.nj();
    > for (i=0, k=0; i<ni; i++)
    > for (j=0; j<nj; j++, k++)
    > b(k,0) = a(i,j);
    > }
    >
    > void IntTab::myjob(I ntTab & a) const
    > {
    > int a_j = a.dim1;
    > int ni = dim0;
    > int nj = dim1;
    > int i, j, k;
    > int * a_data = a.data;
    > int * my_data = data;
    > for (i=0, k=0; i<ni; i++)
    > for (j=0; j<nj; j++, k++)
    > a_data[k*a_j] = my_data[i*nj+j];
    > }
    > ---------------------------------------------------------
    >
    > "myjob" is poorly optimized by the compiler (g++ 3.2)
    > because it assumes that in the process of writing b, I could
    > silently update the pointers (a.data and b.data) and the
    > index counters (a.dim1 and b.dim1) which are used in the
    > loop. Indeed, making these variables local as in
    > IntTab::myjob is much better but of course I prefer to write
    > my algorithms with the operator() which allows to easily put
    > some asserts for debugging.
    >
    > Strangely enough, changing my array for an array of double
    > leads to the same problem, even with "-fstrict-aliasing".
    >
    > Moreover, the "restrict" keyword might help here but it is
    > not supported by g++. Is there another way to tell the
    > compiler that caching the pointers is safe while preserving
    > readability ?
    >
    > Is there something special with g++3.2 ?
    >
    > Thanks,
    > Benoit MATHIEU
    >[/color]
    Why don't you pass the dimensions into the function and call it like this:
    myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())


    Comment

    • Benoit Mathieu

      #3
      Re: Array class and pointer aliasing problems (well, just performanceissu es in fact)

      >>I'm trying to optimize an integer array class,[color=blue][color=green]
      >>and I found that one major problem is pointer aliasing.
      >>--------------------------------------------------------
      >>class IntTab
      >>{
      >>public:
      >> IntTab(int n, int m) :
      >> data(new int[n*m]), dim0(n), dim1(m) {
      >> }
      >> ~IntTab() {
      >> delete data;
      >> }
      >> int ni() const {
      >> return dim0;
      >> }
      >> int nj() const {
      >> return dim1;
      >> }
      >> int & operator()(int i, int j) {
      >> return data[i*dim1+j];
      >> }
      >> const int & operator()(int i, int j) const {
      >> return data[i*dim1+j];
      >> }
      >> void myjob(IntTab & a) const;
      >>private:
      >> int * const data;
      >> const int dim0;
      >> const int dim1;
      >>};
      >>
      >>void myjob(const IntTab & a, IntTab & b)
      >>{
      >> int i, j, k;
      >> int ni = b.ni();
      >> int nj = b.nj();
      >> for (i=0, k=0; i<ni; i++)
      >> for (j=0; j<nj; j++, k++)
      >> b(k,0) = a(i,j);[/color][/color]
      <snip>[color=blue][color=green]
      >>"myjob" is poorly optimized by the compiler (g++ 3.2)
      >>because it assumes that in the process of writing b, I could
      >>silently update the pointers (a.data and b.data) and the
      >>index counters (a.dim1 and b.dim1) which are used in the
      >>loop.[/color][/color]
      <snip>
      [color=blue]
      > Why don't you pass the dimensions into the function and call it like this:
      > myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())[/color]

      Would this really help ? I already made a local copy of ni
      and nj... I can see in the assembly output that ni and nj
      and handled efficiently, only data and dim1 are reloaded
      every time...

      Maybe this issue is strongly related to the implementation
      details in g++ (which might be particularly conservative
      about aliasing) and I should ask in another group. But if
      this is a g++ issue and the answer is g++ specific, then I
      will probably not use it...

      I would be more interested if somebody knows about a C++
      construct which clearly tells the compiler that the actual
      arrays will never alias these indexes and pointers... Does
      the standard say something about that ?

      Comment

      • Jumbo

        #4
        Re: Array class and pointer aliasing problems (well, just performance issues in fact)


        "Benoit Mathieu" <benoit.mathieu 2@free.fr> wrote in message
        news:400c51cb$0 $1167$636a55ce@ news.free.fr...[color=blue][color=green][color=darkred]
        > >>I'm trying to optimize an integer array class,
        > >>and I found that one major problem is pointer aliasing.
        > >>--------------------------------------------------------
        > >>class IntTab
        > >>{
        > >>public:
        > >> IntTab(int n, int m) :
        > >> data(new int[n*m]), dim0(n), dim1(m) {
        > >> }
        > >> ~IntTab() {
        > >> delete data;
        > >> }
        > >> int ni() const {
        > >> return dim0;
        > >> }
        > >> int nj() const {
        > >> return dim1;
        > >> }
        > >> int & operator()(int i, int j) {
        > >> return data[i*dim1+j];
        > >> }
        > >> const int & operator()(int i, int j) const {
        > >> return data[i*dim1+j];
        > >> }
        > >> void myjob(IntTab & a) const;
        > >>private:
        > >> int * const data;
        > >> const int dim0;
        > >> const int dim1;
        > >>};
        > >>
        > >>void myjob(const IntTab & a, IntTab & b)
        > >>{
        > >> int i, j, k;
        > >> int ni = b.ni();
        > >> int nj = b.nj();
        > >> for (i=0, k=0; i<ni; i++)
        > >> for (j=0; j<nj; j++, k++)
        > >> b(k,0) = a(i,j);[/color][/color]
        > <snip>[color=green][color=darkred]
        > >>"myjob" is poorly optimized by the compiler (g++ 3.2)
        > >>because it assumes that in the process of writing b, I could
        > >>silently update the pointers (a.data and b.data) and the
        > >>index counters (a.dim1 and b.dim1) which are used in the
        > >>loop.[/color][/color]
        > <snip>
        >[color=green]
        > > Why don't you pass the dimensions into the function and call it like[/color][/color]
        this:[color=blue][color=green]
        > > myjob(IntTab1, IntTab2, IntTab2.ni(), IntTab2.nj())[/color]
        >
        > Would this really help ? I already made a local copy of ni
        > and nj... I can see in the assembly output that ni and nj
        > and handled efficiently, only data and dim1 are reloaded
        > every time...
        >
        > Maybe this issue is strongly related to the implementation
        > details in g++ (which might be particularly conservative
        > about aliasing) and I should ask in another group. But if
        > this is a g++ issue and the answer is g++ specific, then I
        > will probably not use it...
        >
        > I would be more interested if somebody knows about a C++
        > construct which clearly tells the compiler that the actual
        > arrays will never alias these indexes and pointers... Does
        > the standard say something about that ?
        >[/color]
        Sorry I don't know.
        What about doing it an an asm block, or something like that?


        Comment

        • Karl Heinz Buchegger

          #5
          Re: Array class and pointer aliasing problems (well, justperformance issues in fact)

          Benoit Mathieu wrote:[color=blue]
          >
          > Maybe this issue is strongly related to the implementation
          > details in g++ (which might be particularly conservative
          > about aliasing) and I should ask in another group. But if
          > this is a g++ issue and the answer is g++ specific, then I
          > will probably not use it...
          >
          > I would be more interested if somebody knows about a C++
          > construct which clearly tells the compiler that the actual
          > arrays will never alias these indexes and pointers... Does
          > the standard say something about that ?[/color]

          What do you want to achieve?
          As I see it, you want the compiler to replace the index
          calculation in

          for (j=0; j<nj; j++, k++)
          b(k,0) = a(i,j);

          with some sort of pointer magic. The same thing it can
          do when you write

          for( i = 0; i < size; ++i )
          m[i] = ...

          where the compiler can replace the whole thing with

          basePtr = m;
          for( i = 0; i < size; ++i, basePtr++ )
          *basePtr = ...

          If your compiler is unable to do this transformation in
          your current case, then clearly it is a quality of implementation
          issue (meaning: g++ specific).

          But what can you do?
          One thing you can do in any case is to look how others solve this
          (or a similar) problem. What immediatly comes to my mind is the
          STL. How do they do it? The STL has introduced a concept called
          iterators. There are functions for getting the first iterator
          and incrementing such an iterator. Well, in the above, transformed
          example, basePtr acts as an iterator! It is assigned a starting value
          and after that the pointer (aehh: iterator) is incremented and evaluated.
          Hmm.

          So you could add functions to your class IntTab:

          class intTab
          {
          int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
          int GetNextColumnVa lue( int*& Pos ) { return *(Pos++); }

          ...
          };

          And your looping construct modifies to:

          for( i = 0, k = 0; i < n1; ++i ) {
          int* PosA = a.GetFirstColum n( i );

          for( j = 0; j < nj; j++, k++ )
          b(k,0) = a.GetNextColumn Value( PosA );
          }

          Try it and see, if the compiler is able to optimize this in a better
          way.

          Of course one could polish up the above by introducing a seperate
          iterator class etc., but for a quick test to see if your compiler can
          do a better job the above should be sufficient.

          Yes I am aware, that this could be seen as premature oprimization
          and as we all know this is the root of all evil :-)

          --
          Karl Heinz Buchegger
          kbuchegg@gascad .at

          Comment

          • Benoit Mathieu

            #6
            Re: Array class and pointer aliasing problems (well, just performanceissu es in fact)

            Karl Heinz Buchegger wrote:
            <snip>[color=blue]
            > So you could add functions to your class IntTab:
            >
            > class intTab
            > {
            > int* GetFirstColumn( int Row ) { return &data[ Row*dim1 ]; }
            > int GetNextColumnVa lue( int*& Pos ) { return *(Pos++); }
            >
            > ...
            > };
            >
            > And your looping construct modifies to:
            >
            > for( i = 0, k = 0; i < n1; ++i ) {
            > int* PosA = a.GetFirstColum n( i );
            >
            > for( j = 0; j < nj; j++, k++ )
            > b(k,0) = a.GetNextColumn Value( PosA );
            > }
            >
            > Try it and see, if the compiler is able to optimize this in a better
            > way.
            >
            > Of course one could polish up the above by introducing a seperate
            > iterator class etc., but for a quick test to see if your compiler can
            > do a better job the above should be sufficient.
            >
            > Yes I am aware, that this could be seen as premature oprimization
            > and as we all know this is the root of all evil :-)[/color]

            (why is it that I so often look at the assembly output,
            that's certainly evil... :) )

            Using an iterator is cool in the example that I showed :
            seen from the compiler, it "is" a pointer, and from my point
            of view it is also quite interesting since I can hide some
            sanity checking in the iterator to prevent the user from
            going out of bounds. Things get worse when I need to access
            the data randomly (and this is in fact the general case, and
            here it is much more intersting to have some bound
            checking). An iterator won't do the job in this case... I
            also wonder if preserving logical constness is easy with
            iterators (I'll have a look in the STL, the answer probably
            sits there).

            I feel that there is no easy solution to my
            performance/clarity/safety tradeoff problem (I mean, other
            than "write the code in fortran" (uh!)). I will take it this
            way : code every thing as clearly as possible with all the
            safe bound checking, run the code and eventually perhaps
            concentrate on the few pieces of code where I really spend a
            lot of time (creating some "friend" functions where I can
            hack with pointers and other evil things...)

            Anyway this is interesting: I should have a look at the STL
            more often. Thank you ...

            BenoƮt Mathieu

            Comment

            Working...