File Pointers - geting a logical problem..

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • souravmallik
    New Member
    • May 2007
    • 11

    File Pointers - geting a logical problem..

    Hello,

    I'm facing a big logical problem while writing a parser in VC++ using C.

    I have to parse a file in a chunk of bytes in a round robin fashion.
    Means, when I select a file, the parser will read first 512kb(IBUFFSIZE ) of data, then move to next file and parse the same way. This way I can parse a number of file spreaded over different directory uniformly.

    I'm keeping a meta data in a file where I'm keeping the track of file parse and the size of file parse.

    Now, I'm using fseek() function I'm moving the file pointer.

    [CODE=cpp] // if the file is parsing for the first time
    if ( TotalFileSize > IBUFFSIZE){
    fseek(in_file_p ointer,IBUFFSIZ E,SEEK_SET);
    FileSizeToParse = ftell(in_file);
    }

    //if the file is parsing for the second time
    FileSize = TotalFileSize - AlreadyParsedFi leSize;

    if ( FileSize > IBUFFSIZE){
    fseek(in_file_p ointer,(Already ParsedFileSize+ IBUFFSIZE),SEEK _SET);
    FileSizeToParse = ftell(in_file) - AlreadyParsedFi leSize;
    }

    /* setting the file to the position from where to parse [for the first time its 0, in the second pass it will be the value thats already parse] */

    fseek(in_file,A lreadyParsedFil eSize,SEEK_SET) ;

    // the loop to read data in buffer and parse the data in memory
    while ((EOFFLAG=fgets (ibuff, FileSizeToParse , in_file_pointer ))!= NULL) {
    /// parsing logic come here
    }
    [/CODE]

    The PROBLEM with this logic is First time its parsing the chunk of data its parsing ok..
    But when the file pointer is moving to the DataAlreadyPars ed and then fetching data from the file with fgets(), its retrieving the entire chunk of data from the beginning of the file to the location its specified. i.e. instead of stating from AlreadyParsedFi leSize to IBUFFSIZE, its taking FileBeginning to AlreadyParsedFi leSize+IBUFFSIZ E.

    Is there any method of specifying the From Byte size and To Byte Size in fgets() function. Because for this bug the parser is parsing data that already been parsed. I'm getting duplicate data, and its the number of duplication is the number of times the file is been read.

    Can anyone suggest/advice me how to get this thing done. As I'm using windows OS (VC++), I cant use much in built c function in file operation.

    I have got a lot of solution form this site.. that helped me to build this parser, so I hope this time also I'll get a solution to this nagging bug.




    Thanks
    SouravM
    Last edited by r035198x; Jul 6 '07, 12:28 PM. Reason: added code tags
  • weaknessforcats
    Recognized Expert Expert
    • Mar 2007
    • 9214

    #2
    Why are you not just reading the file in 512 byte chunks??
    [code=c]
    char* rval = fgets(char *str, IBUFFSIZE,in_fi le_pointer))
    if (rval)
    {
    /* parse file here */
    }


    [/code]

    All you need is a FILE* for each of your files and have this inside a function that returns rval. When 0 is returned, you are at the end of that file.

    Comment

    • souravmallik
      New Member
      • May 2007
      • 11

      #3
      Originally posted by weaknessforcats
      Why are you not just reading the file in 512 byte chunks??
      [code=c]
      char* rval = fgets(char *str, IBUFFSIZE,in_fi le_pointer))
      if (rval)
      {
      /* parse file here */
      }


      [/code]

      All you need is a FILE* for each of your files and have this inside a function that returns rval. When 0 is returned, you are at the end of that file.
      Thanks for pointing out the problem!!

      Actually I'm parsing some data which is delimited by New Line Character.So I'm using the while loop using the fgets() function.

      The fgets function reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or a the End-of-File is reached, whichever comes first.

      I had a impression that IBUFFSIZE that I'm using for chuck read, will read that much data from the file, but now I realize that the fgets function is setting ibuff variable till the new line character is found.

      So the code you suggested won't work for me. I need to get something that will get a chunk of data in a buffer and then within the buffer I can use fgets() to scan each data which is new line terminated record.

      Can you advice how to do that..

      Thanks for understanding my problem...
      Regards
      Sourav

      Comment

      • weaknessforcats
        Recognized Expert Expert
        • Mar 2007
        • 9214

        #4
        Originally posted by souravmallik
        So the code you suggested won't work for me. I need to get something that will get a chunk of data in a buffer and then within the buffer I can use fgets() to scan each data which is new line terminated record.
        fgets() will fetch characters until a \n. This may overrun your buffer. You can't have ot both ways. Either you read 512 bytes or you read a string of any length.

        If in fact, these are strings and the length canbe anything, I would use a nominal buffer that is guaranteed to hold 85% of the strings. Then I would call getch() until the buffer was full or I encountered a \n. Then I would call strcat() and refill the buffer until I reached the \n.

        fgets() will require your buffer to be larger than the string by at least 2 bytes. It puts the \n in the buffer followed by a \0.

        Comment

        • souravmallik
          New Member
          • May 2007
          • 11

          #5
          Originally posted by weaknessforcats
          fgets() will fetch characters until a \n. This may overrun your buffer. You can't have ot both ways. Either you read 512 bytes or you read a string of any length.

          If in fact, these are strings and the length canbe anything, I would use a nominal buffer that is guaranteed to hold 85% of the strings. Then I would call getch() until the buffer was full or I encountered a \n. Then I would call strcat() and refill the buffer until I reached the \n.

          fgets() will require your buffer to be larger than the string by at least 2 bytes. It puts the \n in the buffer followed by a \0.

          Thanks for answering... I have cope up with my problem somehow..

          What I'm doing is:
          I'm scanning the whole file with fgets() with a minimum buffer, then for implementing the 'chunk read', I'm using ftell() function to get the size of file after every fgets(), when the file size is equal to 512kb (or any other chunk value set before) its breaking from the loop and continue parsing the next file.

          Will the ftell(), make my program slower, as its called every time in the loop (theoretically) ?

          This logic is working for me... can you advice anything else..

          Thanks again..

          Sourav

          Comment

          • weaknessforcats
            Recognized Expert Expert
            • Mar 2007
            • 9214

            #6
            Originally posted by souravmallik
            Will the ftell(), make my program slower, as its called every time in the loop (theoretically) ?
            Forget issues of fast or slow until the program is working. And keep in mind an issue a 1 MHZ is 30% of an issue at 3 MHZ.

            Concentrate on bug-free operation. Leave the optimization until it is absolutely needed.

            Comment

            • r035198x
              MVP
              • Sep 2006
              • 13225

              #7
              Originally posted by weaknessforcats
              Forget issues of fast or slow until the program is working. And keep in mind an issue a 1 MHZ is 30% of an issue at 3 MHZ.

              Concentrate on bug-free operation. Leave the optimization until it is absolutely needed.
              I don't wish you to scare you weakness, but you may want to know that there are some cats looming close by.

              Comment

              • souravmallik
                New Member
                • May 2007
                • 11

                #8
                Originally posted by weaknessforcats
                Forget issues of fast or slow until the program is working. And keep in mind an issue a 1 MHZ is 30% of an issue at 3 MHZ.

                Concentrate on bug-free operation. Leave the optimization until it is absolutely needed.

                Yea my program is working fine... and i'm getting what I was looking for..
                Thanks.

                Comment

                Working...