Shelve operations are very slow and create huge files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Eric Wichterich

    Shelve operations are very slow and create huge files

    Hello Pythonistas,

    I use Python shelves to store results from MySQL-Queries (using Python
    for web scripting).
    One script searches the MySQL-database and stores the result, the next
    script reads the shelve again and processes the result. But there is a
    problem: if the second script is called too early, the error "(11,
    'Resource temporarily unavailable') " occurs.
    So I took a closer look at the file that is generated by the shelf: The
    result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
    the saved file is over 3 MB large and contains over 230.000 lines (!),
    which seems way too much!

    Following statements are used:
    dbase = shelve.open(fil ename)
    if dbase.has_key(k ey): #overwrite objects stored with same key
    del dbase[key]
    dbase[key] = object
    dbase.close()

    Any ideas?

    Thanks,
    Eric


  • Peter Otten

    #2
    Re: Shelve operations are very slow and create huge files

    Eric Wichterich wrote:
    [color=blue]
    > Hello Pythonistas,
    >
    > I use Python shelves to store results from MySQL-Queries (using Python
    > for web scripting).
    > One script searches the MySQL-database and stores the result, the next
    > script reads the shelve again and processes the result. But there is a
    > problem: if the second script is called too early, the error "(11,
    > 'Resource temporarily unavailable') " occurs.
    > So I took a closer look at the file that is generated by the shelf: The
    > result-list from MySQL-Query contains 14.600 rows with 7 columns. But,
    > the saved file is over 3 MB large and contains over 230.000 lines (!),
    > which seems way too much![/color]

    Let's see:
    [color=blue][color=green][color=darkred]
    >>> 3*2**20/14600/7[/color][/color][/color]
    30.780117416829 746[color=blue][color=green][color=darkred]
    >>>[/color][/color][/color]

    Are thirty bytes per field, including administrative data, that much?
    By the way, don't bother counting the lines in a file containing pickled
    data; the pickle protocol inserts a newline after each attribute, unless
    you specify the binary mode, e. g.:

    shelve.open(fil ename, binary=True)
    [color=blue]
    > Following statements are used:
    > dbase = shelve.open(fil ename)
    > if dbase.has_key(k ey): #overwrite objects stored with same key
    > del dbase[key]
    > dbase[key] = object
    > dbase.close()[/color]

    I've never used the shelve module so far, but the rule of least surprise
    would suggest that

    if dbase.has_key(k ey):
    del dbase[key]
    dbase[key] = data

    is the same as

    dbase[key] = data
    [color=blue]
    > Any ideas?[/color]

    Try to omit the shelve completely, preferably by moving the second script's
    operations into the first. If you want to keep two scripts, don't invoke
    them independently, make a little batch file or shell script instead.

    If you need an intermediate step with a preprocessed snapshot of the MySQL
    table, and you have sufficient rights, use a MySQL table for the temporary
    data.

    Peter

    Comment

    Working...