read a file into a memory block

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mickey0
    New Member
    • Jan 2008
    • 142

    read a file into a memory block

    Hello,
    I'm trying to read a text fiel under Windows:
    Code:
    long size;
    ifstream file (name, ios::ate);	//open and goes at the end	
    size = file.tellg(); //it takes 181
    _buffer = new char [size];	
    file.seekg (0, ios::beg);	
    file.read (_buffer, size);
    size = file.gcount(); //it takes 168
    // _buffer[size] = '\0';
    I know that there are other way (stringstream) to do this, but to do exercise, I'd like to solve in this way. I know that Windows use CR + LF at the end of each line and the read take out (maybe) the LF; this explains the 168 vs 181; also I know that the read doesn't put the END delimiter at the end block memory;
    My problem is start when I did "cout << _buffer;" and I saw at its end there were some strange characters that weren't in the file; so I thought that they were the dirty memory....
    Then, how do I have to proceed, please? Do I have to set '\0'? But wich position at? Do I have to call memset() to clear to memory?
    Some hints, please....
  • RRick
    Recognized Expert Contributor
    • Feb 2007
    • 463

    #2
    Since your buffer is a char*, you will have to add a '\000' at the end. This is the marker that tells cout that it has reached the end of data. Cout will continue until it finds one of these markers. I suspect this caused the problem you described.

    When you create the buffer you need an to add an extra character for the '\000'. What you need to do is: buffer = new char[size+1];

    Another technique is use a fixed local buffer (i.e. char buffer[2000]). Since you pass the buffer size to read, 'read' will read that many characters at most and 'read' returns the number of characters actually read. You loop until read returns something less than the max buffer size. What to do with the buffer contents? Append them to a std::string. Now you don't have to seek or do the other io tricks. Nor do you need to worry about appending a '\000'.

    Comment

    • mickey0
      New Member
      • Jan 2008
      • 142

      #3
      Hello,
      the problem is what I explained above about CR+LF too. So where do I have to put '\0' ? In which position? If I set it in _buffer[size+1] there are still strange characters within _buffer;

      One more help?

      (I know the string solution yet).

      Comment

      • JosAH
        Recognized Expert MVP
        • Mar 2007
        • 11453

        #4
        You can read the file in 'binary' mode instead of 'text' mode. That way no folding
        of '\r\n' to '\n' is done and you'll know exactly how large the buffer will be. Optionally
        you can do the folding yourself afterwards and realloc the buffer to get rid of the
        tail of the original buffer.

        kind regards,

        Jos

        Comment

        • mickey0
          New Member
          • Jan 2008
          • 142

          #5
          in binary mode size = file.tellg(); and size = file.gcount(); return the same number, but when I print the _buffer again there are strange characters at the end of _buffer...but doing this the stange chars doens't appear....

          Code:
                  ........................................
          	_buffer = new char [size+1];	
          	//memset( _buffer, 0, size );
          	file.seekg (0, ios::beg);	
          	file.read (_buffer, size);
          	_buffer[size] = '\0';
          	file.close();
          what does that mean???
          BUT I don't have clear why, allocating one char more (size+1), I have to set '\0' on char at position 'size' (if I try _buffer[size+1] = '\0'; the program crash at runtime).

          Thanks.

          Comment

          • mickey0
            New Member
            • Jan 2008
            • 142

            #6
            Hello,
            I asssume that this above is the right solution (even without memset)
            THanks

            Comment

            • RRick
              Recognized Expert Contributor
              • Feb 2007
              • 463

              #7
              The '\000' character at the end of an array of characters, is standard C for describing strings. When you create a string like "Hello", you only see 5 characters, but the real storage requirements is 6 (5 chars + '\000'). That's why you want your buffer to be size + 1.

              If your buffer is not large enough, you'll start overwriting other parts of your program. Sometimes you crash and burn and other times you don't.

              Comment

              Working...