Vectorization of template functions

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • vectorizor

    Vectorization of template functions

    Hello all,

    I am attempting to vectorize few template functions with the Intel
    compiler, but without much success so far. Ok granted, this question
    is not 100% c++, but it is related enough that I felt I could post it
    here. Also, I did ask in the Intel forums, without much success. And
    maybe there are some c++ coders in here that are familiar with the
    Intel compiler.

    The code below highlights the problem I have. Essentially, I have a
    template class that handle images (i.e. large matrices). The template
    parameter represent the data type held by the image (i.e. uint8,
    float, ...).

    I have many functions that apply certain filters to these images, and
    I would like to vectorize them. Obviously, since the argument of the
    functions is an instance of a template class (the image), the function
    itself is a template. The name of the problematic function is 'test',
    defined towards the bottom of the code. 'test' is called in 2
    different ways.

    First, in the main, I defined 2 images, 'gray' and 'tmp', and call
    'test' using them as argument. Now when I compile with -QxN, the inner
    loop within 'test' is vectorized. Good.

    However, I cannot always have the definition of the template function
    in the same file as the main, because they are simply too many
    template functions defined. Hence I define the template in another
    source file, but I have to instantiate it with the actual parameters
    it is going to be used, otherwise the linker will not find the code. I
    tried to reproduce this organization in a single file. Just below the
    definition of 'test', I inserted a line that instantiates 'test' for
    the parameters with which it is going to be used. The compiler
    understands that correctly, but now says that it cannot vectorize the
    inner loop! The message is : "loop was not vectorized: deference too
    complex". Why is the deference now too complex, when the compiler
    handled it just fine for the other instantiation in the main?!

    Any help would be much appreciated.

    Alex



    ##### CUT #####

    #include <windows.h>

    #include <stdio.h>

    #include <math.h>

    typedef unsigned char u8;

    typedef float f32;







    ////////////////////

    ///// MEMORY ROUTINES

    ////////////////////

    enum { MemoryAlignment =64};

    void* AllocateMemory( size_t size)

    {

    return _aligned_malloc (size, MemoryAlignment );

    }

    void ReleaseMemory(v oid *memblock)

    {

    return _aligned_free(m emblock);

    }

    int ComputeAlignedW idth(int width)

    {

    int alignment_neede d = MemoryAlignment / sizeof(float);

    return (int)ceil((floa t)width/(float)alignmen t_needed) *
    alignment_neede d;

    }







    ////////////////////

    ///// CLASS DECLARATION

    ////////////////////



    template <typename T>

    struct Image

    {

    public: // members

    // std information

    int width, height, depth;

    // actual width of the buffer

    // buffer holding image data is padded to be a multiple

    // of MemoryAlignment for optimisation purposes

    int width_padded;

    // dimensions helper

    int firstRow, lastRow, firstCol, lastCol;

    // pointer to the image data

    T* data;

    public: // methods

    // ctor

    Image():

    width(0),height (0),depth(0),

    width_padded(0) ,

    firstRow(0), lastRow(0), firstCol(0), lastCol(0),

    data(NULL)

    {

    }

    // dtor

    ~Image()

    {

    }

    // memory management

    void Allocate() { data =
    static_cast<T*> (AllocateMemory (width_padded*h eight*depth*siz eof(T)));}

    void Release () { ReleaseMemory(d ata);}

    // pixel access

    // virtual T& operator() (int row, int col)

    // dimensions management

    void SetDimensions(i nt h, int w, int d){

    height = h;

    width = w;

    depth = d;

    width_padded = ComputeAlignedW idth(width);

    firstRow = 0;

    firstCol = 0;

    lastRow = height-1;

    lastCol = width-1;

    }

    // size information

    int GetTotalSize(bo ol padded=false){

    if (padded) return width_padded*he ight*depth*size of(T);

    else return width *height*depth*s izeof(T);

    }

    int GetImageSize(bo ol padded=false){

    if (padded) return width_padded*he ight*depth;

    else return width *height*depth;

    }

    int GetPlaneSize(bo ol padded=false){

    if (padded) return width_padded*he ight;

    else return width *height;

    }

    };







    template <typename T>

    struct GrayImage : public Image<T>

    {

    public: // methods

    // ctor

    GrayImage():

    Image()

    {

    depth=1;

    }

    // pixel access

    T& operator() (int row, int col)

    {

    return data[row*width_padde d + col];

    }

    };

    template <typename T>

    void test(GrayImage< T&input, GrayImage<T&out put)

    {

    int lastR = input.lastRow, firstR = input.firstRow;

    int lastC = input.lastCol, firstC = input.firstCol;

    for(int row=firstR ; row<=lastR ; ++row){

    #pragma ivdep

    for(int col=firstC ; col<=lastC ; ++col){


    //for(int row=input.first Row ; row<=input.last Row ; ++row){

    // for(int col=input.first Col ; col<input.lastC ol; ++col){

    output(row, col) = input(row, col) + 1;

    }

    }

    }

    template void test<f32>(GrayI mage<f32&input, GrayImage<f32>
    &output);



    int main(int argc, char* argv[])

    {

    GrayImage<f32gr ay, tmp;


    gray.SetDimensi ons(2000, 2000, 1); gray.Allocate() ;

    tmp.SetDimensio ns(gray.height, gray.width, 1); tmp.Allocate();

    test(gray, tmp);

    gray.Release(); tmp.Release();

    return 0;

    }

    ##### CUT #####

  • Victor Bazarov

    #2
    Re: Vectorization of template functions

    vectorizor wrote:
    [..] The compiler
    understands that correctly, but now says that it cannot vectorize the
    inner loop! The message is : "loop was not vectorized: deference too
    complex". Why is the deference now too complex, when the compiler
    handled it just fine for the other instantiation in the main?!
    [..]
    The vectorization of loops is (AFAIUI) an optimization technique your
    compiler has and employs if possible. Apparently, if not possible, it
    doesn't employ it. But all this has really nothing to do with C++
    language, you need to talk to Intel technical support to learn more
    about the availability of different opmitization methods and in what
    circumstances the compiler can or cannot use those.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask


    Comment

    • jg

      #3
      Re: Vectorization of template functions

      On Jun 22, 8:22 am, vectorizor <vectori...@goo glemail.comwrot e:
      Hello all,
      >
      the parameters with which it is going to be used. The compiler
      understands that correctly, but now says that it cannot vectorize the
      inner loop! The message is : "loop was not vectorized: deference too
      complex". Why is the deference now too complex, when the compiler
      handled it just fine for the other instantiation in the main?!
      >
      Any help would be much appreciated.
      >
      I guess it is related to inlining of your function template.
      If the call site and template function definition are within the
      same file, a compiler can inline it. After inlining, it may
      do a better alias analysis, and your code can be vectorized.
      Without inlining, reference parameters of your template func
      are internally implmented as pointers, which involves more
      pointer analysis to see if it can be vectorized. Your best bet
      is to ask Intel.

      JG

      Comment

      Working...