Re: Question about optimization

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jean-Paul Calderone

    Re: Question about optimization

    On Thu, 24 Jul 2008 17:19:41 -0400, Wei Hao <weihao89@gmail .comwrote:
    >Hi:
    >
    >I'm pretty new to python and I have some optimization issues. I'll show you
    >the piece of code which is causing it, with pseudo-code before it and
    >comments. I'm accessing a gigantic table (like 15 million rows) in SQL.
    >
    >d is some dictionary, r is a precompiled regex string
    >Big loop, so I search through the table in chunks given by delta
    SQL query ("select * from table where rowID >= n and rowID < (n +
    >delta)"), result of query stored in a. Each individual row is a[n1], columns
    >of rows are a[n1][n2].
    >
    [snip]
    >
    >I am 100% sure it's this code snippet that's the cause of my problems.
    >Here's what I can tell you. Each chunk of rows that I grab is essentially
    >equal in size (rowID skips over stuff, but rather arbitrarily). The time it
    >takes to fetch the SQL query doesn't change. But as the program progresses,
    >this snippet gets slower. Here's the output:
    >
    >2500 0.441551299341
    >5000 1.26162739664
    >7500 2.35092688403
    >10000 3.48417469666
    >12500 4.59031305491
    >15000 5.78972588775
    >17500 6.28305527139
    >20000 6.73344570903
    >22500 8.31732146487
    >25000 9.65322872159
    >27500 8.98186042757
    >30000 11.8042818095
    >32500 12.1965593712
    >35000 13.2735763291
    >37500 14.0282617344
    >
    >What is it in the code snippet that slows down as n increases? Is there
    >something about the way low level python functions I don't understand which
    >is slowing me down?
    Perhaps you need an index on rowID.

    Jean-Paul
Working...