Cache a large list to disk

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris

    Cache a large list to disk

    I have a set of routines, the first of which reads lots and lots of
    data from disparate regions of disk. This read routine takes 40
    minutes on a P3-866 (with IDE drives). This routine populates an
    array with a number of dictionaries, e.g.,

    [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
    {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
    (not actually the data i'm reading)

    This information is acted upon by subsequent routines. These routines
    change very often, but the data changes very infrequently (the
    opposite pattern of what I'm used to). This data changes once per
    week, so I can safely cache this data to a big file on disk, and read
    out of this big file -- rather than having to read about 10,000 files
    -- when the program is loaded.

    Now, if this were C I'd know how to do this in a pretty
    straightforward manner. But being new to Python, I don't know how I
    can (hopefully easily) write this data to a file, and then read it out
    into memory on subsequent launches.

    If anyone can provide some pointers, or even some sample code on how
    to accomplish this, it would be greatly appreciated.

    Thanks in advance for any help.
    -cjl
  • Peter Otten

    #2
    Re: Cache a large list to disk

    Chris wrote:
    [color=blue]
    > week, so I can safely cache this data to a big file on disk, and read
    > out of this big file -- rather than having to read about 10,000 files
    > -- when the program is loaded.
    >
    > Now, if this were C I'd know how to do this in a pretty
    > straightforward manner. But being new to Python, I don't know how I
    > can (hopefully easily) write this data to a file, and then read it out
    > into memory on subsequent launches.[/color]

    Have a look at pickle:
    [color=blue][color=green][color=darkred]
    >>> data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},[/color][/color][/color]
    .... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
    .... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
    .... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}][color=blue][color=green][color=darkred]
    >>>
    >>> import cPickle as pickle # cPickle is pickle implemented in C
    >>> pickle.dump(dat a, file("tmp.pickl e", "w"))
    >>> data_reloaded = pickle.load(fil e("tmp.pickle") )
    >>> data_reloaded == data, data_reloaded is data[/color][/color][/color]
    (True, False)[color=blue][color=green][color=darkred]
    >>> data_reloaded[/color][/color][/color]
    [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0}, {'el2': 15, 'el
    3': 21, 'el1': 9, 'el4': 33, 'el5': 51}, {'el2': 35, 'el3': 49, 'el1
    ': 21, 'el4': 77, 'el5': 119}, {'el2': 45, 'el3': 63, 'el1': 27, 'el
    4': 99, 'el5': 153}]

    Peter

    Comment

    • Karl Chen

      #3
      Re: Cache a large list to disk


      Use the pickle or shelve modules.




      [color=blue][color=green][color=darkred]
      >>>>> "Chris" == Chris <iamlevis3@hotm ail.com> writes:[/color][/color][/color]
      Chris> straightforward manner. But being new to Python, I
      Chris> don't know how I can (hopefully easily) write this data
      Chris> to a file, and then read it out into memory on
      Chris> subsequent launches.

      --
      Karl 2004-05-17 13:44


      Comment

      • Svein Ove Aas

        #4
        Re: Cache a large list to disk

        Peter Otten wrote:
        [color=blue]
        > Chris wrote:
        >[color=green]
        >> week, so I can safely cache this data to a big file on disk, and read
        >> out of this big file -- rather than having to read about 10,000 files
        >> -- when the program is loaded.
        >>
        >> Now, if this were C I'd know how to do this in a pretty
        >> straightforward manner. But being new to Python, I don't know how I
        >> can (hopefully easily) write this data to a file, and then read it out
        >> into memory on subsequent launches.[/color]
        >
        > Have a look at pickle:
        >[color=green][color=darkred]
        >>>> data = [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},[/color][/color]
        > ... {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
        > ... {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
        > ... {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}][color=green][color=darkred]
        >>>>
        >>>> import cPickle as pickle # cPickle is pickle implemented in C[/color][/color][/color]

        And, yes, cPickle is faster. A lot faster.

        There are switches you can throw to have it use binary instead of sticking
        to readable characters for some savings, too.

        Comment

        • Paul Rubin

          #5
          Re: Cache a large list to disk

          iamlevis3@hotma il.com (Chris) writes:[color=blue]
          > I have a set of routines, the first of which reads lots and lots of
          > data from disparate regions of disk. This read routine takes 40
          > minutes on a P3-866 (with IDE drives). This routine populates an
          > array with a number of dictionaries, e.g.,
          >
          > [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
          > {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
          > {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
          > {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
          > (not actually the data i'm reading)[/color]

          The dict entries are the same for each list item?
          [color=blue]
          > Now, if this were C I'd know how to do this in a pretty
          > straightforward manner. But being new to Python, I don't know how I
          > can (hopefully easily) write this data to a file, and then read it out
          > into memory on subsequent launches.
          >
          > If anyone can provide some pointers, or even some sample code on how
          > to accomplish this, it would be greatly appreciated.[/color]

          I dunno what the question is. You can open files, seek on them, etc.
          in Python just like in C. You can use the mmap module to map a file
          into memory. If you want to lose some efficiency, you can write out
          the Python objects (dicts, lists, etc) with the pickle or cpickle modules.

          Comment

          • Chris

            #6
            Re: Cache a large list to disk

            Ah, poifect =) This'll do just fine. Thanks, folks.


            Karl Chen <quarl@nospam.q uarl.org> wrote in message news:<mailman.2 6.1084827059.69 49.python-list@python.org >...[color=blue]
            > Use the pickle or shelve modules.
            >
            > http://www.python.org/doc/current/li...le-pickle.html
            >
            > http://www.python.org/doc/current/li...le-shelve.html
            >[color=green][color=darkred]
            > >>>>> "Chris" == Chris <iamlevis3@hotm ail.com> writes:[/color][/color]
            > Chris> straightforward manner. But being new to Python, I
            > Chris> don't know how I can (hopefully easily) write this data
            > Chris> to a file, and then read it out into memory on
            > Chris> subsequent launches.[/color]

            Comment

            • Radovan Garabik

              #7
              Re: Cache a large list to disk

              Chris <iamlevis3@hotm ail.com> wrote:[color=blue]
              > I have a set of routines, the first of which reads lots and lots of
              > data from disparate regions of disk. This read routine takes 40
              > minutes on a P3-866 (with IDE drives). This routine populates an
              > array with a number of dictionaries, e.g.,
              >
              > [{'el2': 0, 'el3': 0, 'el1': 0, 'el4': 0, 'el5': 0},
              > {'el2': 15, 'el3': 21, 'el1': 9, 'el4': 33, 'el5': 51},
              > {'el2': 35, 'el3': 49, 'el1': 21, 'el4': 77, 'el5': 119},
              > {'el2': 45, 'el3': 63, 'el1': 27, 'el4': 99, 'el5': 153}]
              > (not actually the data i'm reading)
              >
              > This information is acted upon by subsequent routines. These routines
              > change very often, but the data changes very infrequently (the
              > opposite pattern of what I'm used to). This data changes once per
              > week, so I can safely cache this data to a big file on disk, and read
              > out of this big file -- rather than having to read about 10,000 files
              > -- when the program is loaded.
              >
              > Now, if this were C I'd know how to do this in a pretty
              > straightforward manner. But being new to Python, I don't know how I
              > can (hopefully easily) write this data to a file, and then read it out
              > into memory on subsequent launches.
              >
              > If anyone can provide some pointers, or even some sample code on how
              > to accomplish this, it would be greatly appreciated.[/color]

              as already mentioned, use cPickle or shelve
              However, depending how big and how many your dictionaries are,
              you can use *dbm databases instead of dictionaries, with numbers
              packed up using struct module (I found out it is sometimes much
              efficient than using shelve).
              Looking at your sample, you could even reorganize the data as:
              {'el2': [0, 15, 35, 45],
              'el3': [0, 21, 49, 63],
              ...
              }
              and use one big dbm database, with lists represented as array objects -
              that is going to give you major memory efficiency boost.

              If the arrays are going to be big (like really BIG, of some tens
              of megabytes), you can store them one per file, and use mmap
              to access them - I am doing now something similar


              --
              -----------------------------------------------------------
              | Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
              | __..--^^^--..__ garabik @ kassiopeia.juls .savba.sk |
              -----------------------------------------------------------
              Antivirus alert: file .signature infected by signature virus.
              Hi! I'm a signature virus! Copy me into your signature file to help me spread!

              Comment

              Working...