heap object question, and RAII advice request

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • presencia
    New Member
    • Sep 2009
    • 14

    heap object question, and RAII advice request

    Hi all,
    I am still in the process of lerning how to write decent C++ code. I will appreciate any good advice or corrections. I have two questions, a technical one and one for advice for how to design my code.

    1. I have read somewhere (can't remember, must have been some tutorial page) that objects allocated on the heap can only be accessed through pointers. Thus, for example
    Code:
    vector<int> * v_ptr = new vector<int>(100);
    Here v_ptr points to a vector of 100 integers living on the heap.
    I can access the entries of the vector via "v_ptr->at(4)" or "(*v_ptr)[4]" . (Is there another way to access the entries?)
    I could try to fill a local vector variable with heap contents like
    Code:
    vector<int> v = *(new vector<int>(100));
    but if I understand correctly, this will first create a vector object with 100 integers on the heap, take its pointer, dereference it, and then construct ANOTHER vector object on the stack (named v) from the one on the heap using the copy constructor. Thus, this would lead to an immediate memory leak, since there is no way to call delete on that pointer any more. (Am I understanding this correctly?)
    Now, my question: What will happens if I type
    Code:
    vector<int> v& = *(new vector<int>(100));
    Is there any copying going on here that I can't see? Or have I really constructed only one vector on the heap, which I can now access like "v[4]" , and which I can delete later on with "delete &v;" ?

    2. While the first question was just about understanding the code I have another one concerning my way of code design. I have read the following advice:
    • Don't allocate variables on the heap unless you have to (in case you need the data outside of the scope it was created in, or in case there is not enough space on the stack).
    • If you allocate memory using the "new" operator, then follow the RAII principle, make use of some wrapper objects (like std::auto_ptr) on the stack that will do the deleting for you, so that there will be no memory leaks.

    Now, I am in the situation that I have to write functions that receive a number of vectors (normally of complex numbers) and produce several of them. And these vectors can have a pretty large size. I need my code to work with vectors with 1,000,000 entries.
    Now my questions:
    • Is it the right choice to use the heap (create vectors using "new") in this case, even if I need the objects just for the scope I am in?
    • If I want a function to return more than one vector object, I would create one before calling the function and than call it providing it with a reference (or pointer) to that vector so that it can write to it. How do I do this best when using "std::auto_ptr" s ? I can't just give the auto_ptr as a parameter to the function because then the vector will be deleted on return.


    I would greatly appreciate anything useful you would have to say.
    ~<><~~~~~~~~~ presencia
  • weaknessforcats
    Recognized Expert Expert
    • Mar 2007
    • 9214

    #2
    1. I have read somewhere (can't remember, must have been some tutorial page) that objects allocated on the heap can only be accessed through pointers. Thus, for example
    Expand|Select|W rap|Line Numbers vector<int> * v_ptr = new vector<int>(100 );
    Here v_ptr points to a vector of 100 integers living on the heap.
    I can access the entries of the vector via "v_ptr->at(4)" or "(*v_ptr)[4]" . (Is there another way to access the entries?)
    The name of an array is the address of element 0. Therefore. v_ptr is really &v_ptr[0]. So, you can access your array elements as v_ptr[4] exactly as you would for any other array. Remember, a vector must be implemented as an array.


    I could try to fill a local vector variable with heap contents like
    Expand|Select|W rap|Line Numbers vector<int> v = *(new vector<int>(100 ));
    but if I understand correctly, this will first create a vector object with 100 integers on the heap, take its pointer, dereference it, and then construct ANOTHER vector object on the stack (named v) from the one on the heap using the copy constructor. Thus, this would lead to an immediate memory leak, since there is no way to call delete on that pointer any more. (Am I understanding this correctly?)
    Look as this from the compiler's point of view. First, create a heap vector of 100 ints by a call to a vector constructor. Second, create a second vector (v) as a copy of the heap vector. The compiler is done.

    However, you failed to capture the address opf the heap vector. It is now lost and you can never delete it. Do not ever to do this.

    Now, my question: What will happens if I type
    Expand|Select|W rap|Line Numbers vector<int> v& = *(new vector<int>(100 ));
    Is there any copying going on here that I can't see? Or have I really constructed only one vector on the heap, which I can now access like "v[4]" , and which I can delete later on with "delete &v;" ?
    No. Unforunately, v is now a reference to a vector rather than being a vector pointer. Question: How to you know this particular reference refers to a heap object? That is, how will you ever delete the heap object. Again, do not ever do this.

    I suggest you look in the C insights on this web site for an article on the Handle Class, or smart pointer. I think you will be very pleased with what you read there.

    2. While the first question was just about understanding the code I have another one concerning my way of code design. I have read the following advice:
    Don't allocate variables on the heap unless you have to (in case you need the data outside of the scope it was created in, or in case there is not enough space on the stack).
    This is just backwards. The rule is to NEVER use the stack unless you have to. Think about it. On the stack thje compiler controls the life of the object rather than you. That means the compiler can delete the object without your consent and when it does how does the compiler know it is deleting the last instance of the object and that there aren't pointers elsewhere in the program that point to this object? Answer: It doesn't.

    The single most dangerous thing you can do in a C++ program is pass pointers around. BTW: Have you read that C insights article on the Handle Class yet? This question is answered there.

    If you allocate memory using the "new" operator, then follow the RAII principle, make use of some wrapper objects (like std::auto_ptr) on the stack that will do the deleting for you, so that there will be no memory leaks.
    This is silly. Read that Handle Class article.

    Now, I am in the situation that I have to write functions that receive a number of vectors (normally of complex numbers) and produce several of them. And these vectors can have a pretty large size. I need my code to work with vectors with 1,000,000 entries.
    Now my questions:
    Is it the right choice to use the heap (create vectors using "new") in this case, even if I need the objects just for the scope I am in?
    Yes, in fact you must use the heap. You can't afford to crash your program due to stack overflow. You have to maintain control of your data and not leave it to the compiler.

    If I want a function to return more than one vector object, I would create one before calling the function and than call it providing it with a reference (or pointer) to that vector so that it can write to it. How do I do this best when using "std::auto_ptr" s ? I can't just give the auto_ptr as a parameter to the function because then the vector will be deleted on return.
    You don't want to return a vector because a copy will be made to return. You want to return a pointer to that vector, or a handle class object. Please read that article.

    Next, NEVER use auto_ptr. It doesn't work the way you think. Refer to Scott Meyers book "Effective STL".

    I would greatly appreciate anything useful you would have to say.
    I hope this helps.

    Comment

    • Banfa
      Recognized Expert Expert
      • Feb 2006
      • 9067

      #3
      Think about it. On the stack thje compiler controls the life of the object rather than you. That means the compiler can delete the object without your consent and when it does how does the compiler know it is deleting the last instance of the object and that there aren't pointers elsewhere in the program that point to this object? Answer: It doesn't.
      That is not strictly true, the compiler does not control the lifetime of variables on the stack the C++ standard does (assuming you are using a standard compliant compiler and if you are then get a new compiler) since those variables have auto storage-class and the standard sets out clearly how variables with that storage class should operate.

      You can rely on when a variable declared on the stack will exist and when it will be deleted (or at least when you should stop using it because it may be deleted).

      And the thing about the compiler deleting a stack object without knowing if all pointers to it have stopped being used is nonsense because the same is true for a heap it its just that the user has control. The programmer writing code that stores the pointer of a stack based object after the object has been deleted is in the same class of error as the programmer storing a pointer to a heap based object after the user has deleted the object. In both cases the programmer the object in question had a know lifetime and the programmer attempted to use it after it was deleted.

      presencia please understand that I don't actually disagree with what weaknessforcats has said, only the manor in which he says it which for my liking is a little too cut and dried. Programming is not like that, everything weaknessforcats has written are reasonable rules of thumb which if followed will give you the right thing 90-95% of the time. But it is important to understand the underlying mechanisms that result in those rules of thumb so that you can know when they don't apply.

      Also those rules of thumb don't apply to every platform you need to be able to operate when they don't apply. For instance it is not uncommon for embedded projects to forbid the use of the heap or at the very least only allow very limited use but many C++ programmers treat the heap as if it were an unlimited resource. Get into the habit of following these rules of thumb and you run the risk of ending up thinking the same way because those rules of thumb assume a relatively large heap and a relatively small stack but that is just not true of every platform.

      Definitely read the Handle Class/smart pointer in the C insights it is a very smart way of dealing with a lot of the issues related to knowing when to delete a pointer.

      Interestingly I have used auto_ptr recently but only in a very limited way. It is certainly no replacement for a handle class.

      Comment

      • weaknessforcats
        Recognized Expert Expert
        • Mar 2007
        • 9214

        #4
        Originally posted by Banfa
        You can rely on when a variable declared on the stack will exist and when it will be deleted (or at least when you should stop using it because it may be deleted).
        This is not true.

        You pass a pointer to a stack variable to a function. That function updates a global pointer with the pointer it received on the call and returns. Then the function with the stack variable returns. Now the global pointer points to deleted memory. Then you crash using the global pointer.

        You can never rely on the target of a pointer to exist at the time you use the pointer.

        This scenarion also works where multiple threads are involved.

        Originally posted by Banfa
        And the thing about the compiler deleting a stack object without knowing if all pointers to it have stopped being used is nonsense because the same is true for a heap it its just that the user has control.
        The above example disproves this statement. And the same thing is not true for a heap obeuct. Inthe exanple, only the pointer to the heap object is deleted (since it is on the stack) but the object itself still exists and the global pointer is still valid.

        Again, I say the compiler deletes stack variables when they go out of scope regardless of consequences. You really need to implement reference counting and this is where the Handle object comes into play. Just passing naked pointer around a program is trhe most dangerous thing you can do.

        Comment

        • Banfa
          Recognized Expert Expert
          • Feb 2006
          • 9067

          #5
          I think you misunderstand me a little. My point is that both stack objects and heap objects have known life times and trying to access either type beyond the end of their life will lead to serious trouble for the program.

          You say the compiler deletes stack variables when they go out of scope regardless of consequences but in response to that I say that exactly the same is true of heap variables when they are deleted, the variable is deleted regardless of the consequences in response to the issue of a delete statement. In both those cases it is up to the programmer/designer to ensure there are no consequences of the object deletion.

          You say
          You can never rely on the target of a pointer to exist at the time you use the pointer.
          Well I agree but with the caveat that that statement applies equally well to both stack and heap objects because either one could have been deleted if it has been used incorrectly.

          Because both heap and stack objects have a known life cycle the major events in which are the same
          Code:
          Stack Object Life Cycle              Heap Object Life Cycle
          Compiler Creates Object              User Creates Object
          Object In Scope                      Object In Scope
          Compiler Deletes Object              User Deletes Object
          Object Out Of Scope                  Object Out Of Scope
          any fault you can pick with a stack object I can equally well pick with a heap object because their life-cycles are so similar.

          The key, as you state, is having a managed pointer or Handle object, that manages the allocation/deallocation life-cycle of some other object preventing deallocation while that other object is in use, using for example reference counting.

          Comment

          • weaknessforcats
            Recognized Expert Expert
            • Mar 2007
            • 9214

            #6
            Originally posted by Banfa
            I think you misunderstand me a little. My point is that both stack objects and heap objects have known life times and trying to access either type beyond the end of their life will lead to serious trouble for the program.
            I agree.

            Originally posted by Banfa
            You say the compiler deletes stack variables when they go out of scope regardless of consequences but in response to that I say that exactly the same is true of heap variables when they are deleted, the variable is deleted regardless of the consequences in response to the issue of a delete statement. In both those cases it is up to the programmer/designer to ensure there are no consequences of the object deletion.
            I agree with the proviso that with a heap object you blew your own foot off rather than the compiler doing it for you from ambush. It's a matter of control: He who allocates is he who deletes. I use the heap to keep one one person in control: me.

            Originally posted by Banfa
            You say

            Quote:
            You can never rely on the target of a pointer to exist at the time you use the pointer.
            Well I agree but with the caveat that that statement applies equally well to both stack and heap objects because either one could have been deleted if it has been used incorrectly.
            I agree, Do not use pointers. Use handles. Handles are always valid.

            Comment

            • presencia
              New Member
              • Sep 2009
              • 14

              #7
              Thanks to both of you for all the discussion.
              I haver learned some stuff (f.expl., that you can access vector elements from a vector pointer with "v_ptr[i]" , thought that was only true for C arrays), got a better feeling about things (about use of stack and heap). I have now read the Handle Class Article and I liked it, was quite easy to understand.

              But all those concepts are not really new to me. I did think about writing some handle class for my vectors (didn't know it was called handle class), just like the ones you introduce in the article (This concept was introduced to me as RAII). I also thought about using auto_ptr - and yes: I do know how they work and that they don't provide the handle class' features really. I just wanted advice about how to write those functions that take a number of large vectors (I don't mean that technically - they may just as well get pointers or handly classes) and calculate other large amounts of data from them, which I want to be put in vectors again. I think, I will just write my own vector handle class, what do you think? Or is there anything already in the std or boost libraries that would serve me well?

              And - I am sorry - I still have not understood an answer to my first question. Is it true that you cannot have references to heap objects? Or did I give a counterexample. What exactly does the compiler do with
              Code:
              vector<int> v& = *(new vector<int>(100));
              Thanks in advance
              ~<><~~~~~~~~~~ presencia

              Comment

              • Banfa
                Recognized Expert Expert
                • Feb 2006
                • 9067

                #8
                The point about taking references to heap objects is not that you can't do it, your code snipet will compile. It is just a very bad idea because you hide the underlying nature of the object (allocated on the heap) so even though you could free the vector later using

                delete &v;

                it would be better not to hide the nature of the object. You also risk breaking one of the normal tenants of references which is you can't have a bad reference. As soon as you take a reference to a heap object you run the risk that the object gets deleted while someone still has a reference to it breaking the basic always good tenant.

                So it is better to use a pointer rather than a reference because everyone knows pointers can't be relied upon. Except, of course, with reference to the rest of this thread rather than a pointer you should use a handle class.

                This is true of many thngs in C++ (and C). You can do them it just isn't a good idea normally for program maintainability sometimes for design or just writing the rest of the program. Global data is another one of these, you can have global data but best practice is not to.

                Comment

                • presencia
                  New Member
                  • Sep 2009
                  • 14

                  #9
                  Thanks, very useful.

                  Comment

                  • presencia
                    New Member
                    • Sep 2009
                    • 14

                    #10
                    I have found the boost::shared_p tr template to be quite similar to the handle classes proposed by weaknessforcats . Is that true? (I just read some of its documentation)
                    Would you think it is sensible for me to write
                    Code:
                    boost::shared_ptr<vector<double> > v (new vector<double>(100000));
                    for large vectors that I want to live on the heap? And how could I then access the elements?
                    Code:
                    v[5] //does that work? I don't expect that although I would like that
                    (*v)[5] //that would probably work. Is this a common way to go?
                    I must admit, I am still not quite sure how to go on. I could also implement my own handle class for vectors overloading the [] operator for element access. But I like using the templates already implemented by other people so I won't make heavy mistakes there. What would a pro do here? I read a lot about people using vectors containing shared_ptr s. But rarely anybody seems to have their container objects on the heap. Or am I missing something?

                    Comment

                    • Banfa
                      Recognized Expert Expert
                      • Feb 2006
                      • 9067

                      #11
                      OK I think you are making a mistake; vectors (and this is true for all STL containers) always store the data they contain on the heap regardless of where the vector is located. This has to be because the heap is the only location available where the size of the data can be altered at run time. For the stack and the data segment the size is fixed at compile time.

                      So when you say "for large vectors that I want to live on the heap" I am forced to ask do you really mean that you want the vector to live on the heap or do you really mean that you want the large amount of data that the vector contains to live on the heap? If it is the latter case then you don't necessarily have to new the vector, although you may still want to because you certainly want to avoid copying a vector with such large contents.

                      I suspect you are confusing the vector object and the vector contents. The vector object itself has a fixed and relatively small size, it is its contents that change size and that can get large. Look at this test program

                      Code:
                      #include <iostream>
                      #include <vector>
                      
                      using namespace std;
                      
                      int main()
                      {
                      	vector<int> v;			// Empty vector
                      	
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      	
                      	v.reserve(100);			// Reserve space for 100 integers
                      
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      	
                      	v.resize(100, 5); 		// Put 5 into the first 100 locations
                      
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      	
                      	v.push_back(10);		// Add 1 more entry
                      	
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      	
                      	v.resize(100000,20);	// Increase size to 100,000 filling in new entries with 20
                      
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      
                      	v.clear();				// Clear the vector
                      	
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      
                      	vector<int> v2;
                      	v.swap(v2);				// Swap with an empty vector
                      	
                      	cout << "Size of vector object: " << sizeof v << " Size of contained data: " << v.capacity() * sizeof(int) << endl;
                      	
                      	return 0;
                      }
                      It creates a vector an slowly increases the size of its contents before clearing it again printing the size of the vector and the size of the contained data. Here is the output (using MinGW 3.4.5 on Windows XP 32 bit)

                      1. Size of vector object: 12 Size of contained data: 0
                      2. Size of vector object: 12 Size of contained data: 400
                      3. Size of vector object: 12 Size of contained data: 400
                      4. Size of vector object: 12 Size of contained data: 800
                      5. Size of vector object: 12 Size of contained data: 400000
                      6. Size of vector object: 12 Size of contained data: 400000
                      7. Size of vector object: 12 Size of contained data: 0
                      You can see the size of the object never changes from 12 bytes only the size of the data it contains changes. 12 bytes makes sense (on a 32 bit system) because a vector has to keep track of 3 pieces of information, the start of the allocated memory, the current size of the vector and the current capacity of the vector. 32 bits is 4 bytes so 3 bits of information * 4 bytes = 12 bytes total.

                      My point is in this simple program my vector is on the stack, something you may wish to avoid for reasons already discussed. But only the 12 bytes of the actual vector object appear on the stack, the vector contents are allocated on the heap. Personally I consider the size of a vector object small enough that if I am only using it locally I will happily place it on the stack regardless of the amount of data it contains.


                      Now having said all that, yes I believe boost::shared_p tr is a handle object and also that it is due to become part of the C++ standard at the next release of the standard. That seems a sensible thing to do it you plan to pass the vector out of functions a lot. On the other hand if you are only passing the vector into functions you could possibly get away with just declaring the function parameters as references to vectors

                      Code:
                      void fun1(vector<int>& vec)
                      {
                          // Do something with vec
                      }
                      
                      void fun2()
                      {
                          vector<int> vec;
                      
                          // Do something
                      
                          fun1(vec);
                      }
                      In my opinion may be an acceptable construct depending on the overall design of your program and what fun2 actually does with vec.

                      As to you proposed ways to access such a variable it would take you 5 minutes to write a test program to see which works rather than waiting possibly hours for a reply here.

                      Comment

                      • presencia
                        New Member
                        • Sep 2009
                        • 14

                        #12
                        Wow, you opened my eyes.
                        I really was mistakenly thinking the vector contents were allocated on the stack. Thanks for pointing that out. - And for all your time answering me so elaborately. I will use references of vectors as parameters to my functions, just as you were proposing.

                        And you are right about writing short test programs rather then posting.

                        You guys helped me a lot. I think I am fit for the task now (at least for the next steps).
                        ~<><~~~~~~~ presencia

                        Comment

                        • weaknessforcats
                          Recognized Expert Expert
                          • Mar 2007
                          • 9214

                          #13
                          Let us know how it goes.

                          Comment

                          Working...