Reading in data from very large flat files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • booksnore

    Reading in data from very large flat files

    I have to read data from a flat file with millions of records. I wanted
    to find the most efficient way of doing this. I was just going to use a
    StreamReader and then break up the input line using Substring as there
    are no delimiters however I have a spec for the format of the file. Is
    using Substring the only way to do this or is there a more efficient
    way?


    while ((line = sr.ReadLine()) != null)
    {
    string param1 = line.Substring( 0,5);
    string param2 = line.Substring( 5,2);
    //etc..etc..
    }

    regards,

    Joe

    *** Sent via Developersdex http://www.developersdex.com ***
  • Jon Skeet [C# MVP]

    #2
    Re: Reading in data from very large flat files

    booksnore <booksnore@nets cape.net> wrote:[color=blue]
    > I have to read data from a flat file with millions of records. I wanted
    > to find the most efficient way of doing this. I was just going to use a
    > StreamReader and then break up the input line using Substring as there
    > are no delimiters however I have a spec for the format of the file. Is
    > using Substring the only way to do this or is there a more efficient
    > way?
    >
    >
    > while ((line = sr.ReadLine()) != null)
    > {
    > string param1 = line.Substring( 0,5);
    > string param2 = line.Substring( 5,2);
    > //etc..etc..
    > }[/color]

    That's a pretty efficient way of reading it. Are you then storing the
    data in memory, or just processing each line in turn? If you're storing
    them and there are lots of little fields, you might consider storing
    just the whole line, and breaking it into bits when it's used. Each
    string has a certain overhead, and if you have lots of strings with
    just a few characters, that overhead could become significant.

    --
    Jon Skeet - <skeet@pobox.co m>
    http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
    If replying to the group, please do not mail me too

    Comment

    • David Pendrey

      #3
      Re: Reading in data from very large flat files

      Depends on what you're doing with the data once you read it. If you only
      need to read the data sequentially then that's a good method. If you need to
      frequently jump from 1 record to another randomly then you should be able to
      make use of each line being a fixed width to jump ahead/backwards in the
      file and read only the desired information as required. But like Jon said
      that's probably the best way (there are other ways but the gain wouldn't be
      worth the coding, trust me) if you are reading the file sequentially.

      "Jon Skeet [C# MVP]" <skeet@pobox.co m> wrote in message
      news:MPG.1dc14a 325cc6dde398c96 6@msnews.micros oft.com...[color=blue]
      > booksnore <booksnore@nets cape.net> wrote:[color=green]
      >> I have to read data from a flat file with millions of records. I wanted
      >> to find the most efficient way of doing this. I was just going to use a
      >> StreamReader and then break up the input line using Substring as there
      >> are no delimiters however I have a spec for the format of the file. Is
      >> using Substring the only way to do this or is there a more efficient
      >> way?
      >>
      >>
      >> while ((line = sr.ReadLine()) != null)
      >> {
      >> string param1 = line.Substring( 0,5);
      >> string param2 = line.Substring( 5,2);
      >> //etc..etc..
      >> }[/color]
      >
      > That's a pretty efficient way of reading it. Are you then storing the
      > data in memory, or just processing each line in turn? If you're storing
      > them and there are lots of little fields, you might consider storing
      > just the whole line, and breaking it into bits when it's used. Each
      > string has a certain overhead, and if you have lots of strings with
      > just a few characters, that overhead could become significant.
      >
      > --
      > Jon Skeet - <skeet@pobox.co m>
      > http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
      > If replying to the group, please do not mail me too[/color]


      Comment

      • booksnore

        #4
        Re: Reading in data from very large flat files


        Thanks for the replies. There will be some validation checks made on the
        values of the resulting variable assignments but that will be on a line
        by line basis (so for example I won't have to jump from one record and
        check something against the last 10 records). The next step is that the
        data is loaded into SQL Server following the validation checks, I was
        going to batch insert by creating an xml document and feeding a stored
        procedure using OPENXML. I am also going to performance test that method
        against a DTS package load although I am not sure to what degree I can
        perform effective validation checks uses DTS.

        Joe



        *** Sent via Developersdex http://www.developersdex.com ***

        Comment

        Working...