reading large file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • guillaume

    reading large file

    I have to read and process a large ASCII file containing a mesh : a
    list of points and triangles.
    The file is 100 MBytes.

    I first tried to do it in memory but I think I am running out of
    memory therefore I decide to use the shelve
    module to store my points and elements on disks.
    Despite the fact it is slow ... Any hint ? I think I have the same
    memory problem but I don't understand why
    since my aPoint should be removed by the gc.

    Have you any idea ?

    Thanks

    Guillaume

    PS :
    here is the code for your info




    import string
    import os
    import sys
    import time
    import resource
    import shelve
    import psyco

    psyco.full()

    class point:
    def __init__(self,x ,y,z):
    self.x = x
    self.y = y
    self.z = z


    def SFMImport(filen ame):
    print 'UNV Import ("%s")' % filename

    db = shelve.open('po ints.db')

    file = open(filename, "r")

    linenumber = 1
    nbpoints = 0
    nbfaces = 0

    pointList = []
    faceList = []

    line = file.readline()
    words = string.split(li ne)
    nbpoints = string.atoi(wor ds[1])
    nbtrias = string.atoi(wor ds[0])

    print "found %s points and %s triangles" % (nbpoints, nbtrias)

    t1 = time.time()
    for i in range(nbpoints) :
    line = file.readline()
    words = string.split(li ne)

    x = string.atof(wor ds[1].replace("D","E "))
    y = string.atof(wor ds[2].replace("D","E "))
    z = string.atof(wor ds[3].replace("D","E "))

    aPoint = point(x, y, z)

    as = "point%s" % i

    if (i%250000 == 0):
    print "%7d points <%s>" % (i, time.time() - t1)
    t1 = time.time()

    db[as] = aPoint

    print "%s points read in %s seconds" % (nbpoints, time.time() - t1)
    bd.close()

    t1 = time.time()
    t2 = time.time()
    for i in range(nbtrias):
    line = file.readline()
    words = string.split(li ne)

    i1 = string.atoi(wor ds[0])
    i2 = string.atoi(wor ds[1])
    i3 = string.atoi(wor ds[2])

    faceList.append ((i1,i2,i3))

    if (i%100000 == 0):
    print "%s faces <%s>" % (i, time.time() - t1)
    t1 = time.time()

    print "%s points read in %s seconds" % (nbpoints, time.time() - t2)

    file.close()

    def callback(fs):
    filename = fs.filename
    UNVImport(filen ame)


    if __name__ == "__main__":
    # try:
    # import GUI
    # except:
    # print "This script is only working with the new GUI module
    ...."
    # else:
    # fs = GUI.FileSelecto r()
    # fs.activate(cal lback, fs)
    print sys.argv[0]
    SFMImport(sys.a rgv[1])
  • Michael Peuser

    #2
    Re: reading large file


    "guillaume" <g_alleon@yahoo .fr> schrieb im Newsbeitrag
    news:9bd0dd3f.0 309030400.7fa2d c0f@posting.goo gle.com...[color=blue]
    > I have to read and process a large ASCII file containing a mesh : a
    > list of points and triangles.
    > The file is 100 MBytes.
    >
    > I first tried to do it in memory but I think I am running out of
    > memory therefore I decide to use the shelve
    > module to store my points and elements on disks.
    > Despite the fact it is slow ... Any hint ? I think I have the same
    > memory problem but I don't understand why
    > since my aPoint should be removed by the gc.[/color]

    What do you expect from shelve? I should recommend you convert your data in
    afirst pass into a binary format (doing all this atoi() in this pre-pass)
    Then use memory mapped file access when reading it for your work pass.

    But maybe you need a lot of memory for your internal structure as well. If
    youe have a small RAM <512 MB the system could do a lot of swapping. You
    will notice that when processor load goes down! The cheapest solution
    generally is doubling your RAM.

    Kindly
    Michael P


    Comment

    • Paul Rubin

      #3
      Re: reading large file

      g_alleon@yahoo. fr (guillaume) writes:[color=blue]
      > print "found %s points and %s triangles" % (nbpoints, nbtrias)
      >
      > t1 = time.time()
      > for i in range(nbpoints) :[/color]

      For another thing, use xrange instead of range here.

      Comment

      • Bengt Richter

        #4
        Re: reading large file

        On 3 Sep 2003 05:00:39 -0700, g_alleon@yahoo. fr (guillaume) wrote:
        [color=blue]
        >I have to read and process a large ASCII file containing a mesh : a
        >list of points and triangles.
        >The file is 100 MBytes.
        >
        >I first tried to do it in memory but I think I am running out of
        >memory therefore I decide to use the shelve
        >module to store my points and elements on disks.
        >Despite the fact it is slow ... Any hint ? I think I have the same
        >memory problem but I don't understand why
        >since my aPoint should be removed by the gc.
        >
        >Have you any idea ?
        >[/color]
        Since your data is very homogeneous, why don't you store it in a couple of
        homogeneous arrays? You could easily create a class to give you convenient
        access via indices or iterators etc. Also you could write load and store
        methods that could write both arrays in binary to a file. You could
        consider doing this as a separate conversion from your source file, and
        then run your app using the binary files and wrapper class.

        Arrays are described in the array module docs ;-)
        I imagine you'd want to use the 'd' type for ponts and 'l' for faces.

        Regards,
        Bengt Richter

        Comment

        • Sophie Alléon

          #5
          Re: reading large file

          Thanks to your comments, it is now possible to read my large file in a
          couple of minutes
          on my machine.

          Guillaume
          "Bengt Richter" <bokr@oz.net> a écrit dans le message de news:
          bj5e61$pjr$0@21 6.39.172.122...[color=blue]
          > On 3 Sep 2003 05:00:39 -0700, g_alleon@yahoo. fr (guillaume) wrote:
          >[color=green]
          > >I have to read and process a large ASCII file containing a mesh : a
          > >list of points and triangles.
          > >The file is 100 MBytes.
          > >
          > >I first tried to do it in memory but I think I am running out of
          > >memory therefore I decide to use the shelve
          > >module to store my points and elements on disks.
          > >Despite the fact it is slow ... Any hint ? I think I have the same
          > >memory problem but I don't understand why
          > >since my aPoint should be removed by the gc.
          > >
          > >Have you any idea ?
          > >[/color]
          > Since your data is very homogeneous, why don't you store it in a couple of
          > homogeneous arrays? You could easily create a class to give you convenient
          > access via indices or iterators etc. Also you could write load and store
          > methods that could write both arrays in binary to a file. You could
          > consider doing this as a separate conversion from your source file, and
          > then run your app using the binary files and wrapper class.
          >
          > Arrays are described in the array module docs ;-)
          > I imagine you'd want to use the 'd' type for ponts and 'l' for faces.
          >
          > Regards,
          > Bengt Richter[/color]


          Comment

          • Bengt Richter

            #6
            Re: reading large file

            On Fri, 5 Sep 2003 08:26:12 +0200, "Sophie Alléon" <alleon@club-internet.fr> wrote:

            <toppost moved to preferred location below ;-) />
            [color=blue]
            >"Bengt Richter" <bokr@oz.net> a écrit dans le message de news:
            >bj5e61$pjr$0@2 16.39.172.122.. .[color=green]
            >> On 3 Sep 2003 05:00:39 -0700, g_alleon@yahoo. fr (guillaume) wrote:
            >>[color=darkred]
            >> >I have to read and process a large ASCII file containing a mesh : a
            >> >list of points and triangles.
            >> >The file is 100 MBytes.
            >> >
            >> >I first tried to do it in memory but I think I am running out of
            >> >memory therefore I decide to use the shelve
            >> >module to store my points and elements on disks.
            >> >Despite the fact it is slow ... Any hint ? I think I have the same
            >> >memory problem but I don't understand why
            >> >since my aPoint should be removed by the gc.
            >> >
            >> >Have you any idea ?
            >> >[/color]
            >> Since your data is very homogeneous, why don't you store it in a couple of
            >> homogeneous arrays? You could easily create a class to give you convenient
            >> access via indices or iterators etc. Also you could write load and store
            >> methods that could write both arrays in binary to a file. You could
            >> consider doing this as a separate conversion from your source file, and
            >> then run your app using the binary files and wrapper class.
            >>
            >> Arrays are described in the array module docs ;-)
            >> I imagine you'd want to use the 'd' type for ponts and 'l' for faces.
            >>
            >> Regards,
            >> Bengt Richter[/color]
            >
            >[/color]
            <topPostText>[color=blue]
            >Thanks to your comments, it is now possible to read my large file in a
            >couple of minutes
            >on my machine.
            >
            >Guillaume[/color]
            </topPostText>

            Well, so long as you're happy, glad to have played a role ;-)

            But I would think that time could still be cut a fair amount. E.g., I imagine just copying
            your file at the command line might take 20-25 sec, depending on your system,
            and if you have a fast processor, you should be i/o bound a lot, so a lot of
            the conversions etc. should be able to happen mostly while waiting for the disk.

            There doesn't seem to be any way to tell the array module an estimated full (over or exact)capacity
            for an array yet to be populated, but I would think such a feature in the array module would be good
            for your kind of application. (Of course, hopefully the fromfile method increases size with a single
            memory allocation, but you can't use that if your data requires conversion or filtering (scanf/printf
            per-line conversion from/to ascii files might be another useful feature?)).

            Anyway, even as is, I'd bet we could get the time down to under a minute, if it was important.
            Of course, a couple of minutes is not bad if you're not going to do it over and over.

            Regards,
            Bengt Richter

            Comment

            • Adam Przybyla

              #7
              Re: reading large file

              guillaume <g_alleon@yahoo .fr> wrote:[color=blue]
              > I have to read and process a large ASCII file containing a mesh : a
              > list of points and triangles.
              > The file is 100 MBytes.
              >
              > I first tried to do it in memory but I think I am running out of
              > memory therefore I decide to use the shelve
              > module to store my points and elements on disks.
              > Despite the fact it is slow ... Any hint ? I think I have the same
              > memory problem but I don't understand why
              > since my aPoint should be removed by the gc.[/color]
              [color=blue]
              > Have you any idea ?[/color]
              ... try PyTables;-) Regards
              Adam Przybyla

              Comment

              Working...