Re: Optimizing size of very large dictionaries

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Miles

    Re: Optimizing size of very large dictionaries

    On Wed, Jul 30, 2008 at 8:29 PM, <python@bdurham .comwrote:
    Background: I'm trying to identify duplicate records in very large text
    based transaction logs. I'm detecting duplicate records by creating a SHA1
    checksum of each record and using this checksum as a dictionary key. This
    works great except for several files whose size is such that their
    associated checksum dictionaries are too big for my workstation's 2G of RAM.
    What are the values of this dictionary?

    You can save memory by representing the checksums as long integers, if
    you're currently using strings.

    -Miles
Working...