can this code be further optimizzed??

kaushik1221

New Member

Join Date: May 2010
Posts: 10

can this code be further optimizzed??

Jun 15 '10, 02:22 PM

i understand that the code given below will not be compltely understood unless i explain my whole of previous and next lines of code.
But this is part of the code which is causing so much of delay in my project and want to optimize this.
i want to know which code part is faulty and how could this be replaced.
i guess,few can say that use of this function is heavy compared and other ligher method are available to do this work

please help,

thanks in advance

Code:

    for i in range(len(lists)):
        save=database_index[lists[i]]
        #print save
        #if save[1]!='text0194'and save[1]!='text0526':
        using_data[save[0]]=save
        p=os.path.join("c:/begpython/wavnk/",str(str(str(save[1]).replace('phone','text'))+'.pm'))
        x1=open(p , 'r')
        x2=open(p ,'r')
        for i in range(6):
            x1.readline()
            x2.readline()
        gen = (float(line.partition(' ')[0]) for line in x1)
        r= min(enumerate(gen), key=lambda x: abs(x[1] - float(save[4])))
        #print r[0]
        a1=linecache.getline(str(str(p).replace('.pm','.mcep')), (r[0]+1))
        #print a1
        p1=str(str(a1).rstrip('\n')).split(' ')
        #print p1
        join_cost_index_end[save[0]]=p1
        #print join_cost_index_end
    
        gen = (float(line.partition(' ')[0]) for line in x2)
        r= min(enumerate(gen), key=lambda x: abs(x[1] - float(save[3])))
        #print r[0]
        a2=linecache.getline(str(str(p).replace('.pm','.mcep')), (r[0]+1))
        #print a2
        p2=str(str(a2).rstrip('\n')).split(' ')
        #print p2
        join_cost_index_strt[save[0]]=p2
        #print join_cost_index_strt
        j=j+1
    
        #print j
        #print join_cost_index_end
        #print join_cost_index_strt

here my database_index has about 2,50,000 entries`

Tags: python

Glenton

Recognized Expert Contributor

Join Date: Nov 2008

Posts: 391
#2

Jun 16 '10, 06:35 AM

Can you explain in words what you're trying to do. 2,500,000 (is this right) lines in a database shouldn't be such a big problem. Although opening the file p twice should be unnecessary, and not closing it could cause a memory leak.

Mostly it depends on what the data structures are that you're using. A good way to increase speed is to change a list to a numpy array and then use array functions to act on the whole array at once (in other words use array functions instead of using "for i in whatever:" loops).
Comment

can this code be further optimizzed??

can this code be further optimizzed??

Comment