how to access the individual elements of a matrix in python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • aboxylica
    New Member
    • Jul 2007
    • 111

    #46
    Thanks for all the help,I wouldnt have come this far without the help of all u ppl.
    there is a new problem
    Code:
    f=open("deeps1.txt","r")
    line=f.next()
    while not line.startswith('PO'):
        line=f.next()
    
    headerlist=line.strip().split()[1:]
    linelist=[]
    
    
    line=f.next().strip()
    while not line.startswith('/'):
        if line != '':
            linelist.append(line.strip().split())
        line=f.next().strip()
        
    keys=[i[0] for i in linelist]
    values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
    array={}
    linedict=dict(zip(keys,values))
    keys = linedict.keys()
    keys.sort()
    for key in keys:
        array=[key,linedict[key]]
    
    datadict={}
    datadict1={}
    for i,item in enumerate(headerlist):
        datadict[item]={}
        for key_ in linedict:
            datadict[item][key_]=linedict[key_][i]
    
    for keymain in datadict:
        for keysub in datadict[keymain]:
            datadict[keymain][keysub]+=1.0
    #print datadict['T']['16']
    seq="ATA"
    res=1
    for i in range(1,len(seq)):
        key=seq[i]
        for keymain in datadict:
            if keymain==key:
                print key,i
      #print datadict[key]
    #print res
    this is the code.as I already posted i want to find something like
    A[01]*T[02]*A[03]
    But the problem I am facing is that the datadict has keys like "01","02" but the in the loop of seq i have 1,2,3,4. and i cant start from zero whatsoever. how can i make the looping of my seq to 01,02 etc..if i say
    for i in range('01',len( seq)):
    its taking it as a string!
    waiting for ur reply,
    cheers!!

    Comment

    • aboxylica
      New Member
      • Jul 2007
      • 111

      #47
      oops am sorry!
      The one you suggested is working:
      this is the code:
      Code:
      f=open("deeps1.txt","r")
      line=f.next()
      while not line.startswith('PO'):
          line=f.next()
      
      headerlist=line.strip().split()[1:]
      linelist=[]
      
      
      line=f.next().strip()
      while not line.startswith('/'):
          if line != '':
              linelist.append(line.strip().split())
          line=f.next().strip()
          
      keys=[i[0] for i in linelist]
      values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
      array={}
      linedict=dict(zip(keys,values))
      keys = linedict.keys()
      keys.sort()
      for key in keys:
          array=[key,linedict[key]]
      
      datadict={}
      datadict1={}
      for i,item in enumerate(headerlist):
          datadict[item]={}
          for key_ in linedict:
              datadict[item][key_]=linedict[key_][i]
      
      for keymain in datadict:
          for keysub in datadict[keymain]:
              datadict[keymain][keysub]+=1.0
      #print datadict['T']['16']
      seq="ATATT"
      res=1
      for i,key in enumerate (seq):
          res*=datadict[key]["%02d"%(i+1)]# I dont understand this line.the formatting especially
      print res
      This seems to do for the first letter only "A"
      waiting for your reply,
      cheers!

      Comment

      • aboxylica
        New Member
        • Jul 2007
        • 111

        #48
        sorry!!
        its working:)
        cheers!!

        Comment

        • aboxylica
          New Member
          • Jul 2007
          • 111

          #49
          I am doing a simple task here,and I am getting error.cant understand why that is happening!
          I am trying to find out a score for a sequence after creating all the dictionaries:
          if the seq="ACGT"
          value of A and T is 0.3
          VALUE of C and G is 0.2
          score=val(A)*va l(c)*val(G)*val (T)
          so it should be score=0.3*0.2*0 .2*0.3=3.6
          My error is mentioned in comment form in the last lines of the code
          this is my code:
          Code:
          f=open("deeps1.txt","r")
          line=f.next()
          while not line.startswith('PO'):
              line=f.next()
          
          headerlist=line.strip().split()[1:]
          linelist=[]
          
          
          line=f.next().strip()
          while not line.startswith('/'):
              if line != '':
                  linelist.append(line.strip().split())
              line=f.next().strip()
              
          keys=[i[0] for i in linelist]
          values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
          array={}
          linedict=dict(zip(keys,values))
          keys = linedict.keys()
          keys.sort()
          for key in keys:
              array=[key,linedict[key]]
          
          datadict={}
          datadict1={}
          for i,item in enumerate(headerlist):
              datadict[item]={}
              for key_ in linedict:
                  datadict[item][key_]=linedict[key_][i]
          
          for keymain in datadict:
              for keysub in datadict[keymain]:
                  datadict[keymain][keysub]+=1.0
          #print datadict['T']['16']
          seq="CGTCAG"
          
          res=1
          for i in range(0,len(seq)):
              key=seq[i]
              res*=datadict[key]["%02d"%(i+1)]
              #print res
              score=1
              value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
              for it in value:
                  for item in seq:
                      if it==key:
                          score=score*value[it]
                          print score# I get an error that says TypeError: can't multiply sequence by non-int of type 'str'
                          
                  
          print res
          waiting 4 ur reply
          cheers!

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #50
            Originally posted by aboxylica
            I am doing a simple task here,and I am getting error.cant understand why that is happening!
            I am trying to find out a score for a sequence after creating all the dictionaries:
            if the seq="ACGT"
            value of A and T is 0.3
            VALUE of C and G is 0.2
            score=val(A)*va l(c)*val(G)*val (T)
            so it should be score=0.3*0.2*0 .2*0.3=3.6
            My error is mentioned in comment form in the last lines of the code
            this is my code:
            Code:
            f=open("deeps1.txt","r")
            line=f.next()
            while not line.startswith('PO'):
                line=f.next()
            
            headerlist=line.strip().split()[1:]
            linelist=[]
            
            
            line=f.next().strip()
            while not line.startswith('/'):
                if line != '':
                    linelist.append(line.strip().split())
                line=f.next().strip()
                
            keys=[i[0] for i in linelist]
            values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
            array={}
            linedict=dict(zip(keys,values))
            keys = linedict.keys()
            keys.sort()
            for key in keys:
                array=[key,linedict[key]]
            
            datadict={}
            datadict1={}
            for i,item in enumerate(headerlist):
                datadict[item]={}
                for key_ in linedict:
                    datadict[item][key_]=linedict[key_][i]
            
            for keymain in datadict:
                for keysub in datadict[keymain]:
                    datadict[keymain][keysub]+=1.0
            #print datadict['T']['16']
            seq="CGTCAG"
            
            res=1
            for i in range(0,len(seq)):
                key=seq[i]
                res*=datadict[key]["%02d"%(i+1)]
                #print res
                score=1
                value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
                for it in value:
                    for item in seq:
                        if it==key:
                            score=score*value[it]
                            print score# I get an error that says TypeError: can't multiply sequence by non-int of type 'str'
                            
                    
            print res
            waiting 4 ur reply
            cheers!
            You are receiving the error because the values in dictionary 'value' are strings. Either define them as numbers (e.g. "A":0.3,"T":0.3 ,....) or convert to float in the calculation:[code=Python]score=score*flo at(value[it])[/code]

            Comment

            • aboxylica
              New Member
              • Jul 2007
              • 111

              #51
              Thanks for that.Now what I am trying to do is that instead of the sequence which i had.. i am generting a random sequence and calculating the score for that.
              what exactly should happen is that
              supposing my sequence contains is 50 alphabets..for each iteration it should consider 16 alphabets..so the for the first iteration it should be for first 16 alphabets,then it should be(leaving the first) the next sixteen..and so on..until it the sequence remains of the length sixteen(length less than sixteen is to be omitted)
              so i have the program which is goin to calculate the score(the same thing i kept calculating using my input file).this score is called "res" in my code.
              and for the same sequence I am calculating another "score" by giving specific values for each alphabet,then i am calculating the log(res/score)..I get an error which doesnt make any sense to me!please tell me what change i should do
              here is my code:
              Code:
              from math import *
              import random
              f=open("deeps1.txt","r")
              line=f.next()
              while not line.startswith('PO'):
                  line=f.next()
              
              headerlist=line.strip().split()[1:]
              linelist=[]
              
              
              line=f.next().strip()
              while not line.startswith('/'):
                  if line != '':
                      linelist.append(line.strip().split())
                  line=f.next().strip()
                  
              keys=[i[0] for i in linelist]
              values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
              array={}
              linedict=dict(zip(keys,values))
              keys = linedict.keys()
              keys.sort()
              for key in keys:
                  array=[key,linedict[key]]
              
              datadict={}
              datadict1={}
              for i,item in enumerate(headerlist):
                  datadict[item]={}
                  for key_ in linedict:
                      datadict[item][key_]=linedict[key_][i]
              
              for keymain in datadict:
                  for keysub in datadict[keymain]:
                      datadict[keymain][keysub]+=1.0
              
              def random_seq():
                  seq=""
                  ch=""
                  for i in range(0,1000):
                      ch=random.choice(("ATGC"))
                      seq=seq+ch
                  return seq
              
              
              p=random_seq()
              
              #def my_rand():
               #   
                  #print p
                #  part=""
                 # q=len(p)
                 # seqq=""
              
                 # for i in range(0,q):
                  #    part= p[i:i+16]
                  #    if len(part)==16:
                   #       seqq=part
                    #      return seqq
              
              
              
              #my_seq=my_rand()
              #print len(my_seq)
              
              
              
              
              res=1
              part=""
              q=len(p)
              seqq=""
              for i in range(0,q):
                  part=p[i:i+16]
                  if len(part)==16:
                      seqq=part
                      for i in range(0,16):
                          key=p[i]
                          print p[i]
                          res*=datadict[key]["%02d"%(i+1)]
                      print res,"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"
                  #score=1
                  #value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
                #  for it in value:
                 #     for key in p:
                 #         if it==key:
                  #            score=score*float(value[it])
              #log_ratio=(res/score)
              #print log(log_ratio)
              my error says instea of printing some value of res,prints something like
              inf &&&&&&&&&&&&&&& &&&&&&&&&&&&&&& &&&&&
              i think there is an error in this line
              res*=datadict[key]["%02d"%(i+1 )]
              please help
              waiting for ur reply,
              cheers!

              Comment

              • aboxylica
                New Member
                • Jul 2007
                • 111

                #52
                hey,
                sorry,i found the mistake and got the output!:)
                cheers!!

                Comment

                • aboxylica
                  New Member
                  • Jul 2007
                  • 111

                  #53
                  There is one thing i have got to do to my code and i donno how to do that.
                  After adding one to every element in the input file(until this i have already done).i have to normalize the rows as in every element should be divided by sum of the elements of that row
                  my input file is :
                  NA bap
                  PO A C G T
                  01 0.00 3.67 0.00 0.00
                  02 0.00 0.00 3.67 0.00
                  03 0.00 0.00 0.00 3.67
                  04 0.00 3.67 0.00 0.00
                  05 3.67 0.00 0.00 0.00
                  06 3.46 0.00 0.22 0.00
                  07 0.00 0.00 3.67 0.00
                  08 0.00 0.00 0.00 3.67
                  09 0.00 0.00 0.00 3.67
                  10 0.00 3.67 0.00 0.00
                  11 3.67 0.00 0.00 0.00
                  12 3.67 0.00 0.00 0.00
                  13 0.00 0.00 3.67 0.00
                  14 0.00 0.00 0.00 3.67
                  15 0.00 0.00 3.67 0.00
                  16 0.00 3.67 0.00 0.00
                  //
                  //
                  A[01]=1.0(this is because a have already added one to the element)/[1.0+4.67+1.0+1. 0]
                  similarly it has to be done for every element in every row.

                  the basic formula
                  formula=element/(sum of the elements of that row)
                  My code with one already added is
                  Code:
                  from math import *
                  import random
                  f=open("deeps1.txt","r")
                  line=f.next()
                  while not line.startswith('PO'):
                      line=f.next()
                  
                  headerlist=line.strip().split()[1:]
                  linelist=[]
                  
                  
                  line=f.next().strip()
                  while not line.startswith('/'):
                      if line != '':
                          linelist.append(line.strip().split())
                      line=f.next().strip()
                      
                  keys=[i[0] for i in linelist]
                  values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                  array={}
                  linedict=dict(zip(keys,values))
                  keys = linedict.keys()
                  keys.sort()
                  for key in keys:
                      array=[key,linedict[key]]
                  
                  datadict={}
                  datadict1={}
                  for i,item in enumerate(headerlist):
                      datadict[item]={}
                      for key_ in linedict:
                          datadict[item][key_]=linedict[key_][i]
                  
                  for keymain in datadict:
                      for keysub in datadict[keymain]:
                          datadict[keymain][keysub]+=1.0
                  # here one has been added to all elements now how do i normalize it?
                  waiting for ur reply,
                  cheers!

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #54
                    Originally posted by aboxylica
                    There is one thing i have got to do to my code and i donno how to do that.
                    After adding one to every element in the input file(until this i have already done).i have to normalize the rows as in every element should be divided by sum of the elements of that row
                    my input file is :
                    NA bap
                    PO A C G T
                    01 0.00 3.67 0.00 0.00
                    02 0.00 0.00 3.67 0.00
                    03 0.00 0.00 0.00 3.67
                    04 0.00 3.67 0.00 0.00
                    05 3.67 0.00 0.00 0.00
                    06 3.46 0.00 0.22 0.00
                    07 0.00 0.00 3.67 0.00
                    08 0.00 0.00 0.00 3.67
                    09 0.00 0.00 0.00 3.67
                    10 0.00 3.67 0.00 0.00
                    11 3.67 0.00 0.00 0.00
                    12 3.67 0.00 0.00 0.00
                    13 0.00 0.00 3.67 0.00
                    14 0.00 0.00 0.00 3.67
                    15 0.00 0.00 3.67 0.00
                    16 0.00 3.67 0.00 0.00
                    //
                    //
                    A[01]=1.0(this is because a have already added one to the element)/[1.0+4.67+1.0+1. 0]
                    similarly it has to be done for every element in every row.

                    the basic formula
                    formula=element/(sum of the elements of that row)
                    My code with one already added is
                    Code:
                    from math import *
                    import random
                    f=open("deeps1.txt","r")
                    line=f.next()
                    while not line.startswith('PO'):
                        line=f.next()
                    
                    headerlist=line.strip().split()[1:]
                    linelist=[]
                    
                    
                    line=f.next().strip()
                    while not line.startswith('/'):
                        if line != '':
                            linelist.append(line.strip().split())
                        line=f.next().strip()
                        
                    keys=[i[0] for i in linelist]
                    values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                    array={}
                    linedict=dict(zip(keys,values))
                    keys = linedict.keys()
                    keys.sort()
                    for key in keys:
                        array=[key,linedict[key]]
                    
                    datadict={}
                    datadict1={}
                    for i,item in enumerate(headerlist):
                        datadict[item]={}
                        for key_ in linedict:
                            datadict[item][key_]=linedict[key_][i]
                    
                    for keymain in datadict:
                        for keysub in datadict[keymain]:
                            datadict[keymain][keysub]+=1.0
                    # here one has been added to all elements now how do i normalize it?
                    waiting for ur reply,
                    cheers!
                    Create a new list of the sums of the items on each row of the original data:[code=Python]valueSums = [sum(item)+4 for item in values][/code]Since there are 16 lines in the first data set, there should be 16 elements. Keep in mind lists are ordered and dictionaries are not. Iterate on each subdictionary of dataDict, create a sorted list of subdictionary keys, iterate (use enumerate) on the sorted list of keys, and update each element using the indexing operator.

                    Comment

                    • elbin
                      New Member
                      • Jul 2007
                      • 27

                      #55
                      Originally posted by bvdet
                      Create a new list of the sums of the items on each row of the original data:[code=Python]valueSums = [sum(item)+4 for item in values][/code]Since there are 16 lines in the first data set, there should be 16 elements. Keep in mind lists are ordered and dictionaries are not. Iterate on each subdictionary of dataDict, create a sorted list of subdictionary keys, iterate (use enumerate) on the sorted list of keys, and update each element using the indexing operator.
                      Or much easier:

                      Code:
                      datadict1 = datadict.copy()
                      for keymain in datadict:
                          for keysub in datadict[keymain]:
                              datadict1[keymain][keysub] = datadict[keymain][keysub] / (sum(values[int(keysub) - 1]) + 4)
                      I take it the second dictionary is for the normalized values, but this does not change anything. You use the old keys and subkeys from dictdata directly and fill out the copy. If you fill it in the same dictionary you don't need the copy.

                      Comment

                      • aboxylica
                        New Member
                        • Jul 2007
                        • 111

                        #56
                        Code:
                        from math import *
                        import random
                        f=open("deeps1.txt","r")
                        line=f.next()
                        while not line.startswith('PO'):
                            line=f.next()
                        
                        headerlist=line.strip().split()[1:]
                        linelist=[]
                        
                        
                        line=f.next().strip()
                        while not line.startswith('/'):
                            if line != '':
                                linelist.append(line.strip().split())
                            line=f.next().strip()
                            
                        keys=[i[0] for i in linelist]
                        values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                        
                        array={}
                        linedict=dict(zip(keys,values))
                        keys = linedict.keys()
                        keys.sort()
                        for key in keys:
                            array=[key,linedict[key]]
                        
                        datadict={}
                        datadict1={}
                        for i,item in enumerate(headerlist):
                            datadict[item]={}
                            for key_ in linedict:
                                datadict[item][key_]=linedict[key_][i]
                                
                        
                        for keymain in datadict:
                            for keysub in datadict[keymain]:
                                datadict[keymain][keysub]+=1.0
                                datadict1=datadict.copy()
                                for keysub in datadict:
                                    for keysub in datadict[keymain]:
                                        datadict1[keymain][keysub]=datadict[keymain][keysub]/(sum(values[int(keysub)-1])+4)
                                
                        def random_seq(nchars,insertat,astring):
                            seq=""
                        
                            for i in range(nchars):
                              if i== insertat:
                                  seq+=astring
                              ch=random.choice(("ATGC"))
                              seq+=ch
                            print seq
                            return seq
                        thestring="CGTCAAGTTCAAGTGCAAAA"
                        count=50-len(thestring)
                        p=random_seq(count,15,thestring)
                        file=open("temp.txt",'w')
                        #consensus="CGTCAAGTTCAAGTGCAAAA"
                        #file.write(consensus)
                        file.write(str(p))
                        file.close()
                        
                        def file_chk():
                            file=open("temp.txt","r")
                            file_content=file.read()
                            return file_content
                            
                        
                        
                        
                        
                        #p=file_chk()
                        
                        
                        #def my_rand():
                         #   
                            #print p
                          #  part=""
                           # q=len(p)
                           # seqq=""
                        
                           # for i in range(0,q):
                            #    part= p[i:i+16]
                            #    if len(part)==16:
                             #       seqq=part
                              #      return seqq
                        
                        
                        
                        #my_seq=my_rand()
                        #print len(my_seq)
                        
                        
                        
                        
                        res=1
                        part=""
                        q=len(p)
                        seqq=""
                        for i in range(0,q):
                            part=p[i:i+16]
                            if len(part)==16:
                                seqq=part
                                res=1
                                for j in range(0,16):
                                    key=seqq[j]
                                    res=res*datadict[key]["%02d"%(j+1)]
                                    print res
                                    score=1
                                    value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
                                    for it in value:
                                        for key in seqq:
                                            if it==key:
                                                score=score*float(value[it])
                                #print score,"*******************",res
                                log_ratio=log10(res/score)
                                #print i,log_ratio
                        this is my full code where i am calculating the scores dividing by another background value and ultimately taking a log. because of this normalisation some values are becomin zero.
                        like when i print the normalised values some values are becoming zero.
                        sorry but am not able to paste my o/p file..
                        but donno why this is happening

                        Comment

                        • bvdet
                          Recognized Expert Specialist
                          • Oct 2006
                          • 2851

                          #57
                          Originally posted by elbin
                          Or much easier:

                          Code:
                          datadict1 = datadict.copy()
                          for keymain in datadict:
                              for keysub in datadict[keymain]:
                                  datadict1[keymain][keysub] = datadict[keymain][keysub] / (sum(values[int(keysub) - 1]) + 4)
                          I take it the second dictionary is for the normalized values, but this does not change anything. You use the old keys and subkeys from dictdata directly and fill out the copy. If you fill it in the same dictionary you don't need the copy.
                          Yep, you can index on int(keySub)-1:[code=Python]valueSums = [sum(item)+4 for item in values]

                          for keyMain in dataDict:
                          for keySub in dataDict[keyMain]:
                          dataDict[keyMain][keySub] /= valueSums[int(keySub)-1][/code]

                          Comment

                          • aboxylica
                            New Member
                            • Jul 2007
                            • 111

                            #58
                            Code:
                            from math import *
                            import random
                            f=open("deeps1.txt","r")
                            line=f.next()
                            while not line.startswith('PO'):
                                line=f.next()
                            
                            headerlist=line.strip().split()[1:]
                            linelist=[]
                            
                            
                            line=f.next().strip()
                            while not line.startswith('/'):
                                if line != '':
                                    linelist.append(line.strip().split())
                                line=f.next().strip()
                                
                            keys=[i[0] for i in linelist]
                            values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                            valueSums = [sum(item)+4 for item in values]
                            
                            array={}
                            linedict=dict(zip(keys,values))
                            keys = linedict.keys()
                            keys.sort()
                            for key in keys:
                                array=[key,linedict[key]]
                            
                            datadict={}
                            datadict1={}
                            for i,item in enumerate(headerlist):
                                datadict[item]={}
                                for key_ in linedict:
                                    datadict[item][key_]=linedict[key_][i]
                                    
                            
                            for keymain in datadict:
                                 for keysub in datadict[keymain]:
                                    datadict[keymain][keysub]+=1.0
                                    for keyMain in datadict:
                                        for keySub in datadict[keyMain]:
                                            datadict[keyMain][keySub] /= valueSums[int(keySub)-1]
                            
                                    
                                    
                            def random_seq(nchars,insertat,astring):
                                seq=""
                            
                                for i in range(nchars):
                                  if i== insertat:
                                      seq+=astring
                                  ch=random.choice(("ATGC"))
                                  seq+=ch
                                print seq
                                return seq
                            thestring="CGTCAAGTTCAAGTGCAAAA"
                            count=50-len(thestring)
                            p=random_seq(count,15,thestring)
                            file=open("temp.txt",'w')
                            #consensus="CGTCAAGTTCAAGTGCAAAA"
                            #file.write(consensus)
                            file.write(str(p))
                            file.close()
                            
                            def file_chk():
                                file=open("temp.txt","r")
                                file_content=file.read()
                                return file_content
                                
                            
                            
                            
                            
                            #p=file_chk()
                            
                            
                            #def my_rand():
                             #   
                                #print p
                              #  part=""
                               # q=len(p)
                               # seqq=""
                            
                               # for i in range(0,q):
                                #    part= p[i:i+16]
                                #    if len(part)==16:
                                 #       seqq=part
                                  #      return seqq
                            
                            
                            
                            #my_seq=my_rand()
                            #print len(my_seq)
                            
                            
                            
                            
                            res=1
                            part=""
                            q=len(p)
                            seqq=""
                            for i in range(0,q):
                                part=p[i:i+16]
                                if len(part)==16:
                                    seqq=part
                                    res=1
                                    for j in range(0,16):
                                        key=seqq[j]
                                        res=res*datadict[key]["%02d"%(j+1)]
                                        print res
                                        score=1
                                        value={"A":"0.3","T":"0.3","C":"0.2","G":"0.2"}
                                        for it in value:
                                            for key in seqq:
                                                if it==key:
                                                    score=score*float(value[it])
                                    #print score,"*******************",res
                                    log_ratio=log(res/score)
                                    #print i,log_ratio
                            since we are adding one to each element.. i don think my res value could be zero.do u see any mistake.or is it because it is going negative?? please help
                            waiting for ur reply,
                            cheers!

                            Comment

                            • aboxylica
                              New Member
                              • Jul 2007
                              • 111

                              #59
                              when i say print valuesums.
                              many of those values are zero.but this is not possible right??

                              Comment

                              • elbin
                                New Member
                                • Jul 2007
                                • 27

                                #60
                                Originally posted by aboxylica
                                this is my full code where i am calculating the scores dividing by another background value and ultimately taking a log. because of this normalisation some values are becomin zero.
                                like when i print the normalised values some values are becoming zero.
                                sorry but am not able to paste my o/p file..
                                but donno why this is happening
                                Code:
                                from math import *
                                import random
                                f=open("deeps1.txt","r")
                                line=f.next()
                                while not line.startswith('PO'):
                                    line=f.next()
                                
                                headerlist=line.strip().split()[1:]
                                linelist=[]
                                
                                
                                line=f.next().strip()
                                while not line.startswith('/'):
                                    if line != '':
                                        linelist.append(line.strip().split())
                                    line=f.next().strip()
                                    
                                keys=[i[0] for i in linelist]
                                values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                                
                                array={}
                                linedict=dict(zip(keys,values))
                                keys = linedict.keys()
                                keys.sort()
                                for key in keys:
                                    array=[key,linedict[key]]
                                
                                datadict={}
                                datadict1={}
                                for i,item in enumerate(headerlist):
                                    datadict[item]={}
                                    for key_ in linedict:
                                        datadict[item][key_]=linedict[key_][i]
                                        
                                
                                for keymain in datadict:
                                    for keysub in datadict[keymain]:
                                        datadict[keymain][keysub]+=1.0
                                
                                datadict1=datadict.copy()
                                for keysub in datadict:
                                    for keysub in datadict[keymain]:
                                        datadict1[keymain][keysub]=datadict[keymain][keysub]/(sum(values[int(keysub)-1])+4)
                                   
                                
                                def random_seq(nchars,insertat,astring):
                                    seq=""
                                    for i in range(nchars):
                                      if i== insertat:
                                          seq+=astring
                                      ch=random.choice(("ATGC"))
                                      seq+=ch
                                    print seq
                                    return seq
                                
                                thestring="CGTCAAGTTCAAGTGCAAAA"
                                count=50-len(thestring)
                                p=random_seq(count,15,thestring)
                                file=open("temp.txt",'w')
                                ##consensus="CGTCAAGTTCAAGTGCAAAA"
                                ##file.write(consensus)
                                file.write(str(p))
                                file.close()
                                
                                def file_chk():
                                    f=open("temp.txt","r")
                                    file_content=f.read()
                                    return file_content
                                    
                                #p=file_chk()
                                
                                
                                #def my_rand():
                                 #   
                                    #print p
                                  #  part=""
                                   # q=len(p)
                                   # seqq=""
                                
                                   # for i in range(0,q):
                                    #    part= p[i:i+16]
                                    #    if len(part)==16:
                                     #       seqq=part
                                      #      return seqq
                                
                                
                                
                                #my_seq=my_rand()
                                #print len(my_seq)
                                
                                res=1
                                part=""
                                q=len(p)
                                seqq=""
                                
                                value={"A":0.3,"T":0.3,"C":0.2,"G":0.2}
                                for i in range(q-16):
                                    part=p[i:i+16]
                                    seqq=part
                                    res=1
                                    score=1
                                    for j in range(16):
                                        key=seqq[j]
                                        res=res*datadict1[key]["%02d"%(j+1)]
                                        #print res
                                    for key in seqq:
                                        score=score * value[key]
                                    #print score,"*******************",res
                                    log_ratio=log10(res/score)
                                    print i,log_ratio
                                I think you had some problems with indentation, and I simplified a lot of the last part with the score and log. I think it is ok now. I don't know why you got 0's, but I think I know what you want to do, so it looks good now.

                                Comment

                                Working...