On Wed, Jul 30, 2008 at 8:29 PM, <python@bdurham .comwrote:
What are the values of this dictionary?
You can save memory by representing the checksums as long integers, if
you're currently using strings.
-Miles
Background: I'm trying to identify duplicate records in very large text
based transaction logs. I'm detecting duplicate records by creating a SHA1
checksum of each record and using this checksum as a dictionary key. This
works great except for several files whose size is such that their
associated checksum dictionaries are too big for my workstation's 2G of RAM.
based transaction logs. I'm detecting duplicate records by creating a SHA1
checksum of each record and using this checksum as a dictionary key. This
works great except for several files whose size is such that their
associated checksum dictionaries are too big for my workstation's 2G of RAM.
You can save memory by representing the checksums as long integers, if
you're currently using strings.
-Miles