looping through a big file containing a set of files.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • aboxylica
    New Member
    • Jul 2007
    • 111

    #16
    what i exactly should do is that. for every sequence(from the list of file) and every matrix(that is also a list of matrices in file form) i should calculate the scores.
    the output should be in in the form of sequence header(this is the heading that follows the> symbol in the file),position of the sequence read",log value
    for eg:
    matrix1:
    >ref1
    01,2.0012
    02,3.0047
    .
    .
    .
    >ref2
    01,3.0047
    02,8.0067
    matrix 2
    >ref1
    01,2.0012
    02,3.0047
    .
    .
    .
    >ref2
    01,3.0047
    02,8.0067

    so ..on

    the problem is for every sequence in the sequence file and for every position in the file, i am going to calculate the log value.is it clear now?

    Comment

    • bvdet
      Recognized Expert Specialist
      • Oct 2006
      • 2851

      #17
      Originally posted by elbin
      Then I suppose the window-size changes for each matrix... It was just some thoughts on the task, because I am into genetics as well... Thanks for the clarification :), I overlooked that.
      Thank you for your comments and suggestions. I hope we have helped the OP find a solution.

      Comment

      • aboxylica
        New Member
        • Jul 2007
        • 111

        #18
        sorry i am not sure if i got what u meant. but i have posted how exactly my o/p should be.so tell me accordingly

        Comment

        • elbin
          New Member
          • Jul 2007
          • 27

          #19
          Originally posted by aboxylica
          the problem is for every sequence in the sequence file and for every position in the file, i am going to calculate the log value.is it clear now?
          The partial sequence for which you are calculating the log has the length of the matrix, doesn't it? So you need to combine the code from this thread and the matrix thread, so you have both the sequences and matrices, and the scoring, and then run it over all matrices and over all sequences.

          Comment

          • aboxylica
            New Member
            • Jul 2007
            • 111

            #20
            After that for each sequence in the file .for every A,T,G,C I am going to calculate the values,like i give specific scores for each alphabet,then calculate the divide the score of the martix divided by the score of this(at each position) take a log for this. and give the o/p in the form i mentioned.
            q=len(d)
            print q
            seqq=""

            value={"A":0.3, "T":0.3,"C":0.2 ,"G":0.2}
            for i in range(q-16):
            part=d[i:i+16]
            seqq=part
            res=1
            score=1
            for j in range(16):
            key=seqq[j]
            res=res*datadic t1[key]["%02d"%(j+1 )]
            print res
            for key in seqq:
            score=score * value[key]
            log_ratio=log10 (res/score)
            print i,log_ratio

            Comment

            • aboxylica
              New Member
              • Jul 2007
              • 111

              #21
              Originally posted by elbin
              The partial sequence for which you are calculating the log has the length of the matrix, doesn't it? So you need to combine the code from this thread and the matrix thread, so you have both the sequences and matrices, and the scoring, and then run it over all matrices and over all sequences.
              but my matrix file is something like this. i am not gonna be specific about the datasets it is a huge file.like this.(this ia a part)
              NA Abd-B
              PO A C G T
              01 10.19 0.00 10.65 6.24
              02 5.79 0.67 10.50 10.11
              03 4.50 0.00 0.00 22.57
              04 0.00 0.00 0.00 27.08
              05 0.00 0.00 0.00 27.08
              06 0.00 0.00 0.00 27.08
              07 27.08 0.00 0.00 0.00
              08 0.00 2.83 0.00 24.25
              09 0.00 0.00 24.45 2.62
              10 19.33 0.00 4.34 3.41
              11 0.31 12.28 3.39 11.09
              //
              //
              NA Adf1//

              PO A C G T
              01 0.71 0.08 26.02 1.55
              02 3.03 23.00 1.24 1.09
              03 0.26 10.50 3.29 14.31
              04 0.00 0.06 28.23 0.07
              05 0.12 27.27 0.06 0.91
              06 1.44 20.36 0.37 6.19
              07 5.35 0.28 21.49 1.24
              08 7.81 16.10 3.81 0.63
              09 0.51 17.77 0.45 9.63
              10 0.00 0.14 28.21 0.00
              11 0.00 25.69 0.20 2.46
              12 0.48 9.98 0.07 17.82
              13 1.27 0.00 27.01 0.07
              14 15.59 7.98 2.92 1.87
              15 4.28 22.37 0.00 1.70
              16 0.18 0.77 22.70 4.70
              //
              //
              NA Aef1
              PO A C G T
              01 0.00 0.06 12.49 0.00
              02 3.80 0.17 0.00 8.57
              03 0.87 0.06 0.00 11.62
              04 0.06 9.76 2.32 0.41
              05 9.82 0.00 2.73 0.00
              06 9.76 0.00 0.00 2.78
              07 3.80 0.31 0.00 8.43
              08 0.00 0.00 0.00 12.54
              09 0.00 6.53 5.85 0.17
              10 0.00 12.38 0.17 0.00
              11 2.73 1.02 8.80 0.00
              12 5.85 0.00 6.70 0.00
              13 1.02 5.96 0.00 5.57
              14 0.00 5.16 4.66 2.73
              15 1.03 7.55 3.97 0.00
              16 4.82 5.00 2.73 0.00
              //
              //
              NA Antp
              PO A C G T
              01 5.52 14.49 27.56 0.49
              02 8.17 14.02 11.42 14.47
              03 18.18 27.29 1.31 1.29
              04 40.26 5.66 1.83 0.32
              05 19.05 12.67 0.43 15.91
              06 9.94 0.07 0.20 37.86
              07 26.63 15.17 0.00 6.27
              08 47.45 0.06 0.00 0.56
              09 0.81 0.48 0.00 46.79
              10 26.46 19.05 1.81 0.75
              11 48.07 0.00 0.00 0.00
              12 30.51 0.00 0.00 17.56
              13 43.45 0.00 0.00 4.62
              14 30.06 5.98 0.00 12.03
              15 0.38 0.64 0.00 47.05
              16 22.14 0.29 7.15 18.49
              //
              //
              NA BEAF-32
              PO A C G T
              01 16.78 0.91 0.00 3.45
              02 0.62 0.92 11.18 8.41
              03 0.07 20.94 0.00 0.14
              04 0.45 0.47 19.97 0.25
              05 11.06 2.12 4.95 3.01
              06 0.90 0.00 9.47 10.77
              07 12.46 3.27 0.00 5.41
              08 0.45 6.88 13.48 0.33
              09 0.10 1.02 0.00 20.03
              10 9.15 1.11 5.14 5.75
              11 2.37 0.29 0.00 18.48
              12 0.00 8.76 8.01 4.37
              13 0.42 8.63 11.09 1.00
              14 7.27 1.53 12.08 0.26
              15 1.82 0.05 3.23 16.04
              //
              //
              NA BEAF-32A
              PO A C G T
              01 1.00 0.00 0.24 1.30
              02 0.93 0.00 1.53 0.08
              03 1.53 0.00 1.00 0.00
              04 1.53 1.00 0.00 0.00
              05 0.00 0.00 2.54 0.00
              06 0.00 1.69 0.77 0.08
              07 0.00 0.00 2.46 0.08
              08 0.00 0.64 1.30 0.60
              09 0.00 0.08 2.46 0.00
              10 0.00 1.05 0.00 1.49
              11 0.24 0.00 2.30 0.00
              12 0.08 0.11 0.00 2.35
              13 0.24 0.00 2.30 0.00
              14 0.00 0.93 0.00 1.61
              15 0.00 0.00 2.54 0.00
              16 0.08 1.53 0.00 0.93
              //
              //
              NA BEAF-32B
              PO A C G T
              01 0.00 7.91 0.00 0.00
              02 0.00 0.00 7.91 0.00
              03 7.91 0.00 0.00 0.00
              04 0.00 0.00 0.00 7.91
              05 7.91 0.00 0.00 0.00
              06 0.00 1.67 3.51 2.73
              07 0.00 0.00 0.00 7.91
              08 3.49 0.16 0.00 4.27
              09 0.00 0.00 0.00 7.91
              10 0.00 5.11 0.91 1.89
              11 0.00 4.31 3.60 0.00
              12 0.16 7.64 0.00 0.11
              13 7.00 0.00 0.91 0.00
              14 0.00 6.18 0.00 1.73
              15 4.27 2.80 0.00 0.84
              16 1.84 5.11 0.84 0.11
              //
              //
              NA Cf2-II
              PO A C G T
              01 0.00 0.00 0.43 12.03
              02 0.00 10.74 0.00 1.72
              03 6.27 0.00 6.19 0.00
              04 0.00 11.76 0.00 0.70
              05 0.78 0.00 11.25 0.43
              06 0.00 0.00 0.00 12.46
              07 11.91 0.00 0.12 0.43
              08 6.27 0.00 0.00 6.19
              09 11.56 0.12 0.78 0.00
              10 5.88 0.00 0.00 6.58
              11 8.86 0.00 3.60 0.00
              12 5.77 0.12 0.00 6.58
              13 0.00 6.27 6.19 0.00
              14 0.00 12.46 0.00 0.00
              15 6.69 0.00 5.77 0.00
              16 3.52 0.00 8.94 0.00
              //
              //
              NA Deaf1
              PO A C G T
              01 5.42 5.98 1.71 0.42
              02 7.31 4.33 0.25 1.64
              03 12.16 1.24 0.13 0.00
              04 13.04 0.13 0.00 0.36
              05 7.25 1.66 4.62 0.00
              06 0.37 1.29 11.76 0.11
              07 0.00 13.47 0.00 0.05
              08 0.75 1.71 11.07 0.00
              09 11.53 0.13 0.05 1.81
              10 0.37 0.00 0.00 13.16
              11 0.00 12.82 0.00 0.71
              12 0.00 0.00 12.84 0.68
              13 8.00 0.25 4.24 1.04
              14 0.00 6.03 0.00 7.50
              15 0.42 0.13 4.38 8.60
              16 0.05 0.98 7.93 4.57
              //
              //
              NA Dfd
              PO A C G T
              01 0.50 1.66 0.07 68.40
              02 52.59 9.34 8.31 0.39
              03 69.57 0.66 0.00 0.40
              04 2.22 0.14 0.41 67.86
              05 0.44 0.18 23.53 46.49
              06 36.75 5.44 26.74 1.70
              07 16.27 4.86 18.49 31.01
              08 8.79 3.43 17.07 41.35
              09 1.40 3.62 29.62 36.00
              10 1.89 20.88 10.86 37.00
              11 30.75 25.66 13.32 0.91
              //
              //
              NA Dref
              PO A C G T
              01 1.28 2.13 14.78 18.90
              02 5.33 12.15 12.68 6.92
              03 4.99 8.72 21.15 2.22
              04 10.42 6.71 18.00 1.95
              05 22.25 0.51 10.62 3.70
              06 15.72 3.00 0.00 18.36
              07 26.44 3.26 4.01 3.38
              08 10.33 5.61 9.50 11.64
              09 8.67 18.41 0.18 9.83
              10 2.83 0.84 0.24 33.17
              11 35.50 0.91 0.60 0.08
              12 0.35 0.08 1.05 35.60
              13 0.22 34.76 0.79 1.31
              14 4.00 0.88 31.28 0.93
              15 23.50 6.09 0.33 7.16
              16 4.83 1.79 1.77 28.69
              //
              //
              NA E-spl-
              PO A C G T
              01 0.26 16.93 0.00 0.00
              02 16.31 0.88 0.00 0.00
              03 0.00 11.13 0.00 6.05
              04 0.00 0.00 10.52 6.67
              05 8.95 2.38 0.00 5.86
              06 0.21 0.07 16.91 0.00
              07 8.38 8.81 0.00 0.00
              08 0.00 17.07 0.12 0.00
              09 17.13 0.05 0.00 0.00
              10 0.81 13.88 2.38 0.12
              11 8.89 0.00 8.22 0.07
              12 8.08 0.07 2.31 6.72
              13 0.21 2.38 14.60 0.00
              14 0.00 8.45 8.74 0.00
              15 11.34 0.00 0.00 5.85
              16 0.00 0.00 2.58 14.60
              //
              //
              NA Eip74EF
              PO A C G T
              01 26.64 2.84 3.15 0.66
              02 28.55 3.74 0.18 0.82
              03 13.77 1.02 15.46 3.05
              04 5.12 14.05 4.35 9.78
              05 14.07 17.06 1.63 0.52
              06 31.86 0.47 0.07 0.89
              07 15.27 1.33 14.82 1.88
              08 9.00 10.66 5.19 8.44
              09 16.44 0.08 3.58 13.18
              10 8.17 0.00 14.74 10.38
              11 7.69 7.01 16.85 1.75
              12 13.60 6.89 2.36 10.44
              13 2.20 0.34 26.98 3.77
              14 4.30 0.43 2.94 25.63
              15 4.05 2.54 3.78 22.93
              //
              //
              NA HLHm5
              PO A C G T
              01 6.96 4.69 0.00 0.00
              02 0.00 3.00 0.00 8.65
              03 0.00 6.96 4.69 0.00
              04 0.00 11.65 0.00 0.00
              05 4.69 0.00 0.00 6.96
              06 0.00 0.00 0.00 11.65
              07 0.00 0.00 6.96 4.69
              08 0.00 0.00 0.00 11.65
              09 0.00 0.00 11.65 0.00
              10 4.69 0.00 6.96 0.00
              11 0.00 11.65 0.00 0.00
              12 4.69 0.00 0.00 6.96
              13 0.00 11.65 0.00 0.00
              14 0.00 0.00 11.65 0.00
              15 0.00 0.00 0.00 11.65
              16 0.00 0.00 11.65 0.00
              //
              //
              NA His2B
              PO A C G T
              01 0.41 0.61 0.53 21.43
              02 0.00 0.00 0.00 22.97
              03 22.97 0.00 0.00 0.00
              04 0.00 22.97 0.00 0.00
              05 0.00 22.97 0.00 0.00
              06 0.00 0.00 0.00 22.97
              07 22.97 0.00 0.00 0.00
              08 22.97 0.00 0.00 0.00
              //
              //

              Comment

              • aboxylica
                New Member
                • Jul 2007
                • 111

                #22
                i donno how to incorporate the length of the sequence each time.its gonna change.this is a part of my sequence file:
                >CG9571_O-E|Drosophila melanogaster|CG 9571|FBgn003108 6|X:19926374..1 9927133
                CCAGTCCACCGGCCG CCGATCTATTTATAC GAGAGGAAGAGGCTG AACTCGAGGATTACC CGTGTATCCTGGGAC GCG
                GATTAGCGATCCATT CCCCTTTTAATCGCC GCGCAAACAGATTCA TGAAAGCCTTCGGAT TCATTCATTGATCCA CAT
                CTACGGGAACGGGAG TCGCAAACGTTTTCG GATTAGCGCTGGACT AGCGGTTTCTAAATT GGATTATTTCTACCT GAC
                CCTGGAGCCATCGTC CTCGTCCTCCGTCCC TTAGCGCCTCCTGCA TGGATGTCGTTTTTG GGTTTCATACCTTTT CAC
                ACTGGAAAAATACGG AATTTGTTGTAAGCC CTTTCAAGACGAATG GGATTTAGCTTCGGA TGTCAACGTCACCAT AAT
                CATATTAGGAATATT TCTACTCAATTGCAA TATTGGTACTTTTCT GACTGTAAACGCGAT GATAATTACAAATAT GCC
                TAATTTGCTGTCTTT ATAATCAAATGGAGT TCTTTATATTTCCAA AATATTGAAATTCCG ATTCCCTAGAAAATA ATA
                CGTTTTTCTGTTATT AATAAAAAACCAATA GGAAAGTTCTCAAAA ATTACTCTGTTGTAT TTGATCATTTCTTTT CCG
                GTATAATCTTTTATT TTAAGCATTCCCATG TGAATAAATTTCAGA CTAATGTATTAATAA GATGTCGTGTTTTTC CAC
                TTACAAATTTCTCAT ACAGCTGGATATATA CTACGAGTACTATAC ACATGCTCTGGG
                >Cp36_DRR|Droso phila melanogaster|Cp 36|FBgn0000359| X:8323349..8324 136
                AGTCGACCAGCACGA GATCTCACCTACCTT CTTTATAAGCGGGGT CTCTAGAAGCTAAAT CCATGTCCACGTCAA ACC
                AAAGACTTGCGGTCT CCAGACCATTGAGTT CTATAAATGGGACTG AGCCACACCATACAC CACACACCACACATA CAC
                ACACGCCAACACATT ACACACAACACGAAC TACACAAACACTGAG ATTAAGGAAATTATT AAAAAAAATAATAAA ATT
                AATACAAAAAAAATA TATATATATACAAAA ATTTGTTGTGTTTGA ATTGAATTAAGAGCT TATCAAGAAAAAAAT TTC
                AGTGACTCATAATAC ACTACTCTACAAGTT TAAATTGAATCAACA ATTTAACTTTCATTG CTCAGGTTTTTAGTA ACA
                ATGTTTATATAAGTT TAGGTATAACAAATG ATTTAAATATAAGAT ACTGTATTTCACATT GAGACGAAACAATCC ACC
                GAAAATCATAAAATA TAAGAATGTTGCATT TTATTTTTAAAAATA AAGATGCCTTTTAAG AGGAATAACTTAAAT GTC
                TTTAATACCTTTGAA TTTAATTATATGGCT AATAAACACAAACTT AAAGCTTAAAACTGC ATCGAATTGAATGCG GTT
                ATAAATGTACTTATA TATCTAATATAATCT GCTAATATGGTTTAC ATGGTATATCTTTCT CGGAAATTTTTACAA AAA
                TTATCTATTCATATA TCTCGAGCGTAAGAT ATTTATCAGTTTATA GATAACATCTTTAAA TTTGGGTGATTAAAA AAA
                AACATTG
                >Cp36_PRR|Droso phila melanogaster|Cp 36|FBgn0000359| X:8324430..8324 513
                TCTAGAGATCTGGGC ACGATGGCGAGACAA AGATGCGGCGCAAAA TCGGAAATGGAGATG GATCACGTAGCCGGC CAT
                GGCGG
                >Him_distal|Dro sophila melanogaster|Hi m|FBgn0030900|X :18039896..1804 3470
                GGTTTTCTGCGATGG CTTCCGCGCCAGCTG AAGTATCTGATTTGC TGCCTTGTTTTTGTT GATATTTCTGCGAAG GGA
                CTTGTGCTTTTCAAA TGGCCTTTTTTTGGG ATTACGGCAAGGGCG CGTTTCCCACGCTCG ATCCCCACTTACCAT TGG
                TGCACGCGATTGCGG CAAGCTGCTGAGGCA AGCTATTAAACGCCA CACTGGGCCGGGGGG CGGTACCGGTGGGCG TGG
                CAGGGGAGTCGACAC ATGTTGTGTGCCAGA GAACTTTGCTCCGAT CCCCAGATCATCAAA TAGTTGTCGCTGTCT GCT
                CGTGCGCAAATTGCA ATACTTTGCATACCC TTACTGCAGGGTATC TGAGCTTGGACTTTA AATAAGGGGGTATAA CAT
                AGCTTATACTCTCTA TCTCTGTTATAAAGT CAATTTTCCTTAGAT CTTTAGTACAGTGGG TAGTTAAGGAGACAT AAC
                TTCCAAAAAAAAAAA CTATAAAATTGCAAT AATTTATGCAAAATA TGTATTTTATTGAAT GGGATGAATAATTTA CCT
                TATACGACTGTAAAA CATTTCTAACGATTA AATGCACTTCTAAAA GTTTTCCCACAAGTA GGTGAGCTATTATGC TAA
                GCGTTCCATGACTTG GAATCTAAGATCTTG TTTTGATCTTCGCTG ATCTTTGAGAACTCG GGGATTACTTACACA TTT
                CTGGGCAGGCACAAG TGGGCCGAGGCAGTG TAGATTCATCACGTT TTCACTCAACACACG CAGCTCATTAACAGC CCC
                GCTGACAACTTGTCA GGACTTCCCCCTCGT GAATCCCCCTGCTAC GCAACCCCCATTCCC CGCCCATTCCAACAC TTC
                CCGCCGGGAGCGTGG GAAATTATGCGTGTT GGTGGGACGTCGGGC GGTGAAAATTGGCGC GCTCTTCGGGGGGCC ACA
                CCGCGTGGCATTGAC AACTCTTCCACATTT CGCGCCCAACGATGC GTTGGCATCAGTGGG TCACAGGGATTACGG CTG
                GCTGGGATTCCAGAG CCAGATCTTTTTCAG CCAAAACTTTCAGCT TTCGAAGACCTCAAG CGATAGGAGAGTGTC GGA
                AGTCCAGAAATAGAC GCGTAGCACATAAAT TATGGATCGTATCGA GTATCGATTAGCCCG GGACAAGCGAAGCGA TAG
                GGAGACATATTTTTA TTACCCTCTCGGGGA CCTGCACTTGTTGGC TTCGCTTCTATGAAA GATCCCTCTACCATA TCA
                CGTATGTGGGCTCCC CCAATCGAACCGAGT TGTGGGAAATGTTTT CCCAGGCCAACAGCT AATTGTCACTCCAAG GGT
                TGTCCCCGCAGCCCA GACGACAGATAAGCG GGCAAGTGAAGCCCA GCGATCTGAGTCAAG TGAAGGGCTTCAATT TCT
                TTCCCGAGTGGAACT GGGATATCGAAATTA CATTTGTAACAGACG TTTTAGTCCGCAATC CTCAGCTAATGGGAC TTA
                CGAACATATATTCAT CTGAAATTCAAGAAC ATGCGCACTTAAAGA GCAGGGAAGTCGCAC ACGCGCAAGTCAGGC GCT
                CAAAAAGGGATCTTC GGAGGTACAGTGGGC AAAAGACTGTAAATA AATAATATAAATAAA ATAATATTTAGCTCT ATG
                TGTTTATATAATCTA CAAAGTAGTTAACAA AAAATATAAAATGGA TATAAAAATACATCT TATATATCCCTATAA TAA
                GAAATAAATAATAAT TTTAGTAAATTAATT TTGTTACACAAAGTA CCTGTATTATTACCT CTTTTTTGTTGGTTG GTT
                CTTTTTTGATGTGGC CCCACTGTGCTCTCT TATCAGTGCGACAAT CAGGCATTGCCTTTC CCCATCGGGGGATTC TAA
                TTCCGTGGACGATGG GCCGAAACGCCTATA AAGTCGCTCATTAAA AATGTTTAATTATGG CCCATCTTGCATCTT GCA
                CCGATGTGGATGGGG TTTGTCGGCAATGAT TTACATTATAAAAAT GCCCGTTATCTGAGC ATTTTGTACGCTCCA CTC
                CCTCTTCCCCCCTCC AAAAAAAAAAAAAAC AGATATGTATATTCC CCGAGATATTCCCAA GCGGCCAAAAATAGA CGC
                AAATTGTAACGCACT TGAAGTGCACTCTGA AACATCTTGAAGTCC AAATAAAATAGCAGA GAGACCCACAATAAT ATA
                CGTTGATATACACAT GTATATATGTATGTA TGTACATAAAGGGCC AGGAGCAGGAACGTT AGGCATGCGGTGGTA CGA
                GCACCGTGGTGCGAG CGAGAGCGCTGTGCT GCCTGAGGGAGAGGT AGCGAGTGGGTTGCA TTGCGCACACAGAAC ATG
                TGAATGCAGAGTTCA AGTGCATGCCGTGAC ACAGACACGCACACA CACACACGCACACAC AGATGAGTAGCCGCT GCA
                AAGTGTTTTTTCCCA GGCGCTATTTATAAT ATGCATCCCGTCGCC GATCCGATCCGATCC AATCCAATCCGATTG GAT
                CCCATCTTGCGGCAC TACGATTATGACGCT CGACACGATGATGCA TTCGCAGAGTTTCCC GATCGCAGAGTACCC TGT
                ACTCGAGTAGTTTTT AGATGCAGTATTATT AAGTAGAAAATTGTA ACCGTATAATATTCC ATTATATTAAATATT TTT
                ATAGCACTAAAGAAA TAAAAGCCCATTTTA TAATTTATATTACAA AAATACTTAACCATA GAAACTTATGATATG ATA
                CCAATATTTAAGTTC CAAAAAATGTAGAAC ATTTTTAAGTATATA CTCGAAAATATTAAT TTTCAAAATTGATAT TCA
                AGAGATATTATAAAA AGATCCCCATTCTAA ATATCTAACATCATG CCATGCTTTCTAATG AGTATAGTATACCCC TGC
                TACCCTGTCAATCCG CAAAACAGGCGCCGA AACATGCGGTTTCTC GCAGCAGACTGCCAC GGGAAAAATTCGGTT CGA
                GATTTGGGAATGGAT GTATGACGGAGCAGA AGGAGCAGGACCCGG ATTTCGGATTTCGGA ATGGATATGGAAATG AAG
                ATGGAAATGGGACTT TGACTGCGCGACGGC CACATGCGCCGCTGG CGATGCCGCTGGATG TTGCATGTGGCAGCG GTC
                GGTGCAGCAGCGAAA GTGTTGCAGCTGTAT GAGAGGGTCTATTTT TGGGGCGATTGTGCG GCGCTGGTGCTGCCA CAT
                GTGTTCTGTGTTGGG CTGCTAAAAGGCATT GTAATGAGAGCAGAA AATAGAATTGACTCC ACTTGAGCAATGTCC CAT
                AAAGCGGGAGTTTCG AGTTTGGCGCGCAAT GTGCCGCACCAGCAA ACGAACAAAAGAAAA AAAAAAAAAAAAAAC ACA
                GCCAGTAACACATGG GCCCACGAGTTATGT TTTATTTTTAATCCC ACAAAGAGTCGATCT CCAAAACAAACCCGC AGA
                GAGCACATATAAAGA GACTCGGTGGACGAG TGGTTCGAAACAGTC TTCCGCCGCAGCTCG ACGCGCTCGCATATC GGG
                AATATATAGATCGGA GATATCGCAGGACCC ACAGCAGAGCAGAGC CGCAGAGCCACCAAC CTCG
                >Him_proximal|D rosophila melanogaster|Hi m|FBgn0030900|X :18041232..1804 3470
                GCCCAGACGACAGAT AAGCGGGCAAGTGAA GCCCAGCGATCTGAG TCAAGTGAAGGGCTT CAATTTCTTTCCCGA GTG
                GAACTGGGATATCGA AATTACATTTGTAAC AGACGTTTTAGTCCG CAATCCTCAGCTAAT GGGACTTACGAACAT ATA
                TTCATCTGAAATTCA AGAACATGCGCACTT AAAGAGCAGGGAAGT CGCACACGCGCAAGT CAGGCGCTCAAAAAG GGA
                TCTTCGGAGGTACAG TGGGCAAAAGACTGT AAATAAATAATATAA ATAAAATAATATTTA GCTCTATGTGTTTAT ATA
                ATCTACAAAGTAGTT AACAAAAAATATAAA ATGGATATAAAAATA CATCTTATATATCCC TATAATAAGAAATAA ATA
                ATAATTTTAGTAAAT TAATTTTGTTACACA AAGTACCTGTATTAT TACCTCTTTTTTGTT GGTTGGTTCTTTTTT GAT
                GTGGCCCCACTGTGC TCTCTTATCAGTGCG ACAATCAGGCATTGC CTTTCCCCATCGGGG GATTCTAATTCCGTG GAC
                GATGGGCCGAAACGC CTATAAAGTCGCTCA TTAAAAATGTTTAAT TATGGCCCATCTTGC ATCTTGCACCGATGT GGA
                TGGGGTTTGTCGGCA ATGATTTACATTATA AAAATGCCCGTTATC TGAGCATTTTGTACG CTCCACTCCCTCTTC CCC
                CCTCCAAAAAAAAAA AAAACAGATATGTAT ATTCCCCGAGATATT CCCAAGCGGCCAAAA ATAGACGCAAATTGT AAC
                GCACTTGAAGTGCAC TCTGAAACATCTTGA AGTCCAAATAAAATA GCAGAGAGACCCACA ATAATATACGTTGAT ATA
                CACATGTATATATGT ATGTATGTACATAAA GGGCCAGGAGCAGGA ACGTTAGGCATGCGG TGGTACGAGCACCGT GGT
                GCGAGCGAGAGCGCT GTGCTGCCTGAGGGA GAGGTAGCGAGTGGG TTGCATTGCGCACAC AGAACATGTGAATGC AGA
                GTTCAAGTGCATGCC GTGACACAGACACGC ACACACACACACGCA CACACAGATGAGTAG CCGCTGCAAAGTGTT TTT
                TCCCAGGCGCTATTT ATAATATGCATCCCG TCGCCGATCCGATCC GATCCAATCCAATCC GATTGGATCCCATCT TGC
                GGCACTACGATTATG ACGCTCGACACGATG ATGCATTCGCAGAGT TTCCCGATCGCAGAG TACCCTGTACTCGAG TAG
                TTTTTAGATGCAGTA TTATTAAGTAGAAAA TTGTAACCGTATAAT ATTCCATTATATTAA ATATTTTTATAGCAC TAA
                AGAAATAAAAGCCCA TTTTATAATTTATAT TACAAAAATACTTAA CCATAGAAACTTATG ATATGATACCAATAT TTA
                AGTTCCAAAAAATGT AGAACATTTTTAAGT ATATACTCGAAAATA TTAATTTTCAAAATT GATATTCAAGAGATA TTA
                TAAAAAGATCCCCAT TCTAAATATCTAACA TCATGCCATGCTTTC TAATGAGTATAGTAT ACCCCTGCTACCCTG TCA
                ATCCGCAAAACAGGC GCCGAAACATGCGGT TTCTCGCAGCAGACT GCCACGGGAAAAATT CGGTTCGAGATTTGG GAA
                TGGATGTATGACGGA GCAGAAGGAGCAGGA CCCGGATTTCGGATT TCGGAATGGATATGG AAATGAAGATGGAAA TGG
                GACTTTGACTGCGCG ACGGCCACATGCGCC GCTGGCGATGCCGCT GGATGTTGCATGTGG CAGCGGTCGGTGCAG CAG
                CGAAAGTGTTGCAGC TGTATGAGAGGGTCT ATTTTTGGGGCGATT GTGCGGCGCTGGTGC TGCCACATGTGTTCT GTG
                TTGGGCTGCTAAAAG GCATTGTAATGAGAG CAGAAAATAGAATTG ACTCCACTTGAGCAA TGTCCCATAAAGCGG GAG
                TTTCGAGTTTGGCGC GCAATGTGCCGCACC AGCAAACGAACAAAA GAAAAAAAAAAAAAA AAAACACAGCCAGTA ACA
                CATGGGCCCACGAGT TATGTTTTATTTTTA ATCCCACAAAGAGTC GATCTCCAAAACAAA CCCGCAGAGAGCACA TAT
                AAAGAGACTCGGTGG ACGAGTGGTTCGAAA CAGTCTTCCGCCGCA GCTCGACGCGCTCGC ATATCGGGAATATAT AGA
                TCGGAGATATCGCAG GACCCACAGCAGAGC AGAGCCGCAGAGCCA CCAACCTCG
                >Obp18a_prom|Dr osophila melanogaster|Ob p18a|FBgn003098 5|X:18969778..1 8972746
                ATGGCGAAAATCTGT TTCCCAACTAACAAT GAGCGCATCATCACA GCTCTATATATATAA CCCATCGATTTGCTA ATT
                CAGCTCAAAAGTAGA CAGGAGATTTTAATT AAATAATTGGATGCT ACTTTACATTCGCCA CACACCAACAAATAA AGT
                CTATAATTGAAATTT TAAGCGCAGTTCCCG ATTATGAGCTACACG TATGTCGTATGCGCA ATATCTGCATTACAA TTG
                CCAATAGTAAATTAC CAACTTGGTTTTCTT CATATTTATTAAGAT AGAAAACATACAATT TTTGGCTTTTACACT CCA
                AGCATCTCTGAAGTT TAAACAAAAAACATA TGTGTAGCCTATCTA CTGTATTGGACTTTA TTCGTATATTTTATA TGG
                TTCATTAATATAGGT ATAAATACAAATTAT ATTCACGCTTTGCGA TTTGCAGCGAATATC ACATCTTATACACGA TGT
                AAAAAAAAAAAAAAT ATTTCGTCATGTTTT TAGGTTGGCCGCAGG CAGTGCTCACTGTAC CGCCACAATGTTTAT CGT
                TTTGCATTTTTTTTT TCTTTGTTTTCTTGC GGTTTCCCCTAATTA TCTTTAGTATAAACT TAGTCTACTGTCTTT TTT
                GGTAAGTATTTTCGT GATGGGCTCGTCTAT GCGAATTCCCATTTC CAATGAATAAATAAA GTAATTAGAACATTA AAA
                TTAGCAATAAAACAC GTACATTTAAAGCTG ACAACAAAAAAAAAA AGTATTCTTATGTTA AACTGTAGTATGTGC CTA
                TGCAATATTAAGAAC AATTAAATAAAATAG CATATTAACTTATGG CAGCACTTTGTTGCT ATGTTTATGTTTATG TTT
                ATGCACGCAGTTAGG CCAGGGCGGATGTAA CATGATCACCCACTC GAAGGCAAAAAGTAT AAGTGCATGGTCAGC ATT
                CACACGCCGACCAAA TACATATTACATACG TACATACATATCTCG CTCTCCCGATAAGCC TAGATATATAAGATA TAC
                ATAAGAACGCCGCTC CGCTGCTGGCGTACC CGGCAGCGCAGCTAC GCGGATTAGCCTAAG TCCAAATATATTAAA AAC
                TGTAAAATCAGAGAG ACTCTGTAGACGTTG AGCTGACAGAACCAT TTCTGCCTACTCTAA AATCAAAAGAAGAAA TTG
                AATAAATATATGTCA GCCCGACGGCTGCCT TCAACTTAAAACGGA CTTGTGTTCTGAATT GGAGTTCATCATTAC ATG
                GCGACCGTGACAGTC GTCCAACGCTGGACG AATTGACCAAAGCTG GTGAAAACAAAGGAA CAAAGGAACACTGGA CTG
                GAAGAAGACTGGACT AATTAAATGGAACTG CAAAAACCAAGGAAA AATCTGAGTGAGTAG AGTTCTATTGAGTAT GGG
                CAAACACCGTGGCGG TTTGAAAACTAAGCT GAATAAACGTATAGC CCACGTAAGGTGGCT AATATACGGTCAGCA AAC
                GCCACCGGTTTGGTC GAAAGCTCTAAAGCT ACATGCAGAGCTAGA CCACTTGTTGCAATA TCAGCAAGAATTAAA GAC
                CCATAAGCTCGAGAA AACTCACTCAGATAA TATTAAAAATATACC CACAATTAATGAAGT TCCAAAATACCAGGC ATG
                TCCAGCACCAGCACC AGCATTAACAAAACC AAAGAAGTCCTGCCC CCCTGGCTGCGAAGG AATCTGGAGTCCCCA CTG
                CCTGGGGACTTGTGA GCGACCATCGACGTC TTCAGCGGCGAAGAA ATAGACAGCAGCGAG GGAGTGTCAGCGTGC CAC
                CCCCGGCGACGCCCA GCTGACACCTGATGA GCATCATCAACAGCA GAATATAATAATAAA TATATATAAATATAA AGT
                AAATATAAAATATAT ATAGATAAGAAAAAT TGTAAGAAATATTGT AAAACGGAGCATATA CTATTATGCCCTGTT AAC
                CCAATATGGCCCGTG AAGCCATAGCTAGAA TCAGGCAGGCAACAA TGTAAAATACAATTT TTTTTTACTCTTGCG AAC
                ATTGAAAGATTTTAT AAATAGATAATTCCA AACATAAATGTCTAT AGAGACAAATGAAAT AAGTAAAACTGAAAA TAA
                AAGTATATACAAAGG AAATTTTCTATTCTA TTCTCCAAAATATAA AATTAGTATACCCAA AATGGGTCTAATAGA CAC
                TAAAACTGTGGACTC TACAGCCAATGTAAT AAATAAAGTAGAAGT CCAAAATGCAGACTT GTTCTGGATAACCAT AAT
                ACTAATTGTAATTGC ATTAATTATGGTATC CAATGCATTAATAAA AATATACAAACTGCA TAACAAGTGTCTTAA GAA
                ACGATACCGTAGCAC TGCTAACGGTATAGA TAATATTTAAGGAAG ATCTTTAATAAAGTC AATTATGAATGAAAA TAT
                GAGAAAAATTATATG AAAAAAAAAAAATAA TAAATAAAAAAAAAA ATATAAAACGTAATA TTGAATTTATCTACG TTA
                AAAAAAAAAATATAT ACAAATGAATAAATT TGAAGTTATGAGTAT ACCACAGCATGGACT GGGAAAAGCTTGTTG ATC
                AGATAAAAGATCAAA ATGAAAATTTCAGAA AATCCTATAAGTGCT TAACGCAAAACAGAT CAACACAAGCTGTAA CAA
                TCAATAGGAATGCCC AAGTCTTGGTAAATA GTTATAATGAAATCA GAGAGTTGATCCAAC AAAATAGAAAGAATT TGG
                AACGCAAACAGTGTG CTAAGGCTTTGAACC TACTGGTGACATTAA GAGAAAAATTAATAT TTATAAAAAATAAAT TCA
                GTCTCCAGATAGAAA TTCCAACCATAGTAA ACACCCCACTAAGAA TAAATTTGAATGAAG ACAGCACTAACTCTG ACG
                AGGAAGATAGGACTA TAGTCAAGGAAGACA TTAAAGAGGAAGATC TTCACGATCTAACTA TACCAGCAAAATTAA TGC
                TGAA

                Comment

                • bvdet
                  Recognized Expert Specialist
                  • Oct 2006
                  • 2851

                  #23
                  Originally posted by aboxylica
                  what i exactly should do is that. for every sequence(from the list of file) and every matrix(that is also a list of matrices in file form) i should calculate the scores.
                  the output should be in in the form of sequence header(this is the heading that follows the> symbol in the file),position of the sequence read",log value
                  for eg:
                  matrix1:
                  >ref1
                  01,2.0012
                  02,3.0047
                  .
                  .
                  .
                  >ref2
                  01,3.0047
                  02,8.0067
                  matrix 2
                  >ref1
                  01,2.0012
                  02,3.0047
                  .
                  .
                  .
                  >ref2
                  01,3.0047
                  02,8.0067

                  so ..on

                  the problem is for every sequence in the sequence file and for every position in the file, i am going to calculate the log value.is it clear now?
                  Not to me. I can't seem to follow the logic in your code:[code=Python]
                  def readfasta():
                  file1= open("chr011.py ",'r')
                  file_content=fi le1.readlines()
                  first=1
                  list1=""
                  for line in file_content:
                  if line[0]==">":
                  if first==0:
                  print "********** *"
                  list1+=sequence
                  print "********** *"
                  else:
                  first=0
                  sequence=""
                  seq=""
                  for i in range(0,len(lin e)-1):
                  seq+=line[i]
                  else:
                  for i in range(0,len(lin e)-1):
                  sequence+=line[i]
                  list1+=sequence
                  return list1

                  p=readfasta()

                  res=1
                  part=""
                  q=len(p)
                  seqq=""

                  value={"A":0.3, "T":0.3,"C":0.2 ,"G":0.2}
                  for i in range(q-16):
                  part=p[i:i+16]
                  seqq=part
                  res=1
                  score=1
                  for j in range(16):
                  key=seqq[j]
                  res=res*datadic t1[key]["%02d"%(j+1 )]
                  #print res
                  for key in seqq:
                  score=score * value[key]
                  #print score,"******** ***********",re s
                  log_ratio=log10 (res/score)
                  print i,log_ratio[/code]I have modified function parseData() to include the header:[code=Python]def parseData(fn, dataset=1, key='>'):
                  '''
                  Read a formatted data file of alpha sequences
                  Return a list of sequences
                  The first element in the list is the header
                  '''
                  # initialize output list
                  dataList = []

                  # open file for reading
                  f = open(fn)

                  # skip to required data set
                  for _ in range(dataset):
                  try:
                  s = f.next()
                  while not s.startswith(ke y):
                  s = f.next()
                  except StopIteration, e:
                  print 'We have reached the end of the file!'
                  f.close()
                  return False

                  # initialize output list
                  dataList = [s,]

                  for line in f:
                  if not line.startswith (key):
                  dataList.append (line.strip())
                  else:
                  break

                  f.close()
                  return dataList

                  dataSeq = parseData(fnSeq , dataset)
                  print dataSeq[0][/code]Output:
                  >>> >Cp36_PRR|Droso phila melanogaster|Cp 36|FBgn0000359| X:8324430..8324 513

                  I have given you the code to parse the matrix data and sequence data so it can easily be manipulated. I don't understand the log calculation you want. Maybe someone smarter than me can figure it out.

                  Comment

                  • elbin
                    New Member
                    • Jul 2007
                    • 27

                    #24
                    Originally posted by aboxylica
                    Code:
                    for i in range(q-16):
                        part=d[i:i+16]
                        seqq=part
                        res=1
                        score=1
                        for j in range(16):
                            key=seqq[j]
                            res=res*datadict1[key]["%02d"%(j+1)]
                    This means that you take for granted that the subsequence you are examining is 16 characters long, and the matrix you are using is 16 lines too. But they are not all 16 lines. So you need to change this part to
                    Code:
                    len(datadict['A'])
                    for example.
                    And what do you mean by "integrate the length of the sequence"?

                    To bvdet: For the log see http://www.thescripts.com/forum/thread672978.html

                    Comment

                    • aboxylica
                      New Member
                      • Jul 2007
                      • 111

                      #25
                      yes, sixteen is not fixed its gonna varry all through.these aspects confuse me.:(
                      as to how my code should be
                      and what i should exactly do is that
                      >seq1
                      atattatatat
                      >seq2
                      atatattatatata
                      >seq3
                      attattatatatata t
                      ...so on..
                      weightmat1
                      po
                      values
                      weightmat2
                      values
                      weightmat3
                      values
                      Now
                      i am actually calculating the log odds ratio(u must be knowing since you are into this)
                      i calculate for each position the log value..
                      am i clear now??
                      you told me previously how to do it for one seq and one weight matrix..now there are multiple matrices and multiple sequences i have to calculate the logodds ratio for each position
                      am i clear?
                      waiting for ur reply
                      cheers

                      Comment

                      • elbin
                        New Member
                        • Jul 2007
                        • 27

                        #26
                        I think you already have all the needed code for this task, please make an effort and combine it, and you will get the result.

                        Comment

                        • aboxylica
                          New Member
                          • Jul 2007
                          • 111

                          #27
                          okay il try that.but i don seem to be confident though.but il try.
                          thanks!

                          Comment

                          • bvdet
                            Recognized Expert Specialist
                            • Oct 2006
                            • 2851

                            #28
                            See if this is what you need:[code=Python]if __name__ == '__main__':

                            value={"A":0.3, "T":0.3,"C":0.2 ,"G":0.2}

                            fnArray = 'arraydata.txt'
                            fnSeq = 'seqdata.txt'
                            dataset = 3
                            dataArray = parseArray(fnAr ray, dataset)
                            dataSeq = parseData(fnSeq , dataset)

                            seq = ''.join(dataSeq[1:])
                            subKeys = dataArray['A'].keys()
                            subKeys.sort()

                            i,j = divmod(len(seq) , len(subKeys))
                            keys = subKeys*i + subKeys[:j]

                            print dataSeq[0],
                            outList = ['%s[%s]*%s = %0.4f' % (s, keys[i], s, dataArray[s][keys[i]]*value[s]) for i, s in enumerate(seq)]
                            print '\n'.join(outLi st)
                            print sum([float(s.split(' =')[1]) for s in outList])[/code]Output:
                            Code:
                            >>> >Cp36_PRR|Drosophila melanogaster|Cp36|FBgn0000359|X:8324430..8324513
                            T[01]*T = 0.0131
                            C[02]*C = 0.0015
                            T[03]*T = 0.0019
                            A[04]*A = 0.0017
                            G[05]*G = 0.0014
                            A[06]*A = 0.2515
                            G[07]*G = 0.0969
                            A[01]*A = 0.0624
                            T[02]*T = 0.0014
                            C[03]*C = 0.0755
                            T[04]*T = 0.2952
                            G[05]*G = 0.0014
                            G[06]*G = 0.0022
                            G[07]*G = 0.0969
                            C[01]*C = 0.0093
                            A[02]*A = 0.0016
                            C[03]*C = 0.0755
                            G[04]*G = 0.0010
                            A[05]*A = 0.0014
                            T[06]*T = 0.0424
                            G[07]*G = 0.0969
                            G[01]*G = 0.1403
                            C[02]*C = 0.0015
                            G[03]*G = 0.0011
                            A[04]*A = 0.0017
                            G[05]*G = 0.0014
                            A[06]*A = 0.2515
                            C[07]*C = 0.0054
                            A[01]*A = 0.0624
                            A[02]*A = 0.0016
                            A[03]*A = 0.1832
                            G[04]*G = 0.0010
                            A[05]*A = 0.0014
                            T[06]*T = 0.0424
                            G[07]*G = 0.0969
                            C[01]*C = 0.0093
                            G[02]*G = 0.1965
                            G[03]*G = 0.0011
                            C[04]*C = 0.0011
                            G[05]*G = 0.0014
                            C[06]*C = 0.0019
                            A[07]*A = 0.1154
                            A[01]*A = 0.0624
                            A[02]*A = 0.0016
                            A[03]*A = 0.1832
                            T[04]*T = 0.2952
                            C[05]*C = 0.0128
                            G[06]*G = 0.0022
                            G[07]*G = 0.0969
                            A[01]*A = 0.0624
                            A[02]*A = 0.0016
                            A[03]*A = 0.1832
                            T[04]*T = 0.2952
                            G[05]*G = 0.0014
                            G[06]*G = 0.0022
                            A[07]*A = 0.1154
                            G[01]*G = 0.1403
                            A[02]*A = 0.0016
                            T[03]*T = 0.0019
                            G[04]*G = 0.0010
                            G[05]*G = 0.0014
                            A[06]*A = 0.2515
                            T[07]*T = 0.0310
                            C[01]*C = 0.0093
                            A[02]*A = 0.0016
                            C[03]*C = 0.0755
                            G[04]*G = 0.0010
                            T[05]*T = 0.2773
                            A[06]*A = 0.2515
                            G[07]*G = 0.0969
                            C[01]*C = 0.0093
                            C[02]*C = 0.0015
                            G[03]*G = 0.0011
                            G[04]*G = 0.0010
                            C[05]*C = 0.0128
                            C[06]*C = 0.0019
                            A[07]*A = 0.1154
                            T[01]*T = 0.0131
                            G[02]*G = 0.1965
                            G[03]*G = 0.0011
                            C[04]*C = 0.0011
                            G[05]*G = 0.0014
                            G[06]*G = 0.0022
                            5.0655
                            >>> seq
                            'TCTAGAGATCTGGGCACGATGGCGAGACAAAGATGCGGCGCAAAATCGGAAATGGAGATGGATCACGTAGCCGGCCATGGCGG'

                            Comment

                            • aboxylica
                              New Member
                              • Jul 2007
                              • 111

                              #29
                              okay.one thing I am doubtful about what does dataset refer to??
                              and in the last code the calculation u sent me. is it something like
                              A[01]*A which means your multiplying the normalised value of A at position one and dividing it by the standard A value??so please tell me. and which statement of the code does that??
                              what I should ba doing is
                              if i have a sequence like
                              >header
                              ATTTATTATATATAT ATTATTATAATTAAA TAT
                              and using the matrix
                              calculate A[01]*T[02]*T[03]*.............. ...divided by standard values which is
                              A=0.3,T=0.3.
                              C=0.2,G=0.2
                              so it should be done like A[01]*T[02]*T[03]*.............. .../0.3*0.3*0.3.... ........
                              for the sequence
                              then take a log for this value.Then move to the next window of the sequence
                              TTTATTATATATATA TTATTATAATTAAAT AT(I am just leaving the A) calculate the same way with T in the first position.
                              I have to do this way for all the sequences.

                              Comment

                              • aboxylica
                                New Member
                                • Jul 2007
                                • 111

                                #30
                                Code:
                                from math import *
                                import random
                                f=open("deeps1.txt","r")
                                line=f.next()
                                while not line.startswith('PO'):
                                    line=f.next()
                                 
                                headerlist=line.strip().split()[1:]
                                linelist=[]
                                 
                                 
                                line=f.next().strip()
                                while not line.startswith('/'):
                                    if line != '':
                                        linelist.append(line.strip().split())
                                    line=f.next().strip()
                                    
                                keys=[i[0] for i in linelist]
                                values=[[float(s) for s in item] for item in [j[1:] for j in linelist]]
                                 
                                array={}
                                linedict=dict(zip(keys,values))
                                keys = linedict.keys()
                                keys.sort()
                                for key in keys:
                                    array=[key,linedict[key]]
                                 
                                datadict={}
                                datadict1={}
                                for i,item in enumerate(headerlist):
                                    datadict[item]={}
                                    for key_ in linedict:
                                        datadict[item][key_]=linedict[key_][i]
                                        
                                 
                                for keymain in datadict:
                                    for keysub in datadict[keymain]:
                                        datadict[keymain][keysub]+=1.0
                                 
                                datadict1=datadict.copy()
                                for keysub in datadict:
                                    for keysub in datadict[keymain]:
                                        datadict1[keymain][keysub]=datadict[keymain][keysub]/(sum(values[int(keysub)-1])+4)
                                   
                                 
                                def random_seq(nchars,insertat,astring):
                                    seq=""
                                    for i in range(nchars):
                                      if i== insertat:
                                          seq+=astring
                                      ch=random.choice(("ATGC"))
                                      seq+=ch
                                    print seq
                                    return seq
                                 
                                thestring="CGTCAAGTTCAAGTGCAAAA"
                                count=50-len(thestring)
                                p=random_seq(count,15,thestring)
                                file=open("temp.txt",'w')
                                file.write(str(p))
                                file.close()
                                 
                                
                                
                                 
                                res=1
                                part=""
                                q=len(p)
                                seqq=""
                                 
                                value={"A":0.3,"T":0.3,"C":0.2,"G":0.2}
                                for i in range(q-16):
                                    part=p[i:i+16]
                                    seqq=part
                                    res=1
                                    score=1
                                    for j in range(16):
                                        key=seqq[j]
                                        res=res*datadict1[key]["%02d"%(j+1)]
                                        #print res
                                    for key in seqq:
                                        score=score * value[key]
                                    #print score,"*******************",res
                                    log_ratio=log10(res/score)
                                    print i,log_ratio
                                This is the code that works and calculates for a single sequence and a single matrix(containi ng 16 positions) I want to do it for many sequences and many matrices.I guess am clearer now.I have given how my sequences and matrices look like.I just need to generalize it.am i clearer now
                                waiting for ur reply
                                cheers!

                                Comment

                                Working...