How to replace multiple integers at once

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lel7lel7
    New Member
    • Mar 2010
    • 12

    How to replace multiple integers at once

    Hi, i am new to python and having some problems...

    I am trying to figure out how to change multiple integers at once so i can produce the complement of this DNA strand...
    >>>dna = """tgaattctatga atggactgtccccaa agaagtaggacccac taatgcagatcctgg a
    tccctagctaagatg tattattctgctgtg aattcgatcccacta aagat"""
    (ie. acttaagatactt.. ...)

    I have learnt how to remove \n character using
    >>> dna = replace(dna, ’\n’, "”)
    but i cannot figure out how to write a script that will change t to a, a to t, g to c and c to g.

    If i change them individually
    >>> dna = replace(dna, ’a’, "t”)
    then all the a's do become t's then the true t's cannot be distinguished. I then thought of making them w,x,y,z then changing them to the complent but that also takes too much time.

    is there some way to write
    >>> dna = replace(dna, 'a', "t") and replace(dna, 't', "a")...and so on so it will change all 4 at once???

    Any help would be greatly appreciated :)
    Thanks, Lel
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    You meant "characters " instead of "integers"? Define a conversion dictionary and use a the string join() method and a list comprehension to convert each letter in succession. A string is a type of sequence. We can also skip the newline and any other invalid character.
    Code:
    dd = {"t": "a", "a": "t", "g": "c", "c": "g"}
    dna = """
    tgaattctatgaatggactgtccccaaagaagtaggacccactaatgcagatcctggatccc
    tagctaagatgtattattctgctgtgaattcgatcccactaaagat
    """
    dnacomp = "".join([dd[s] for s in dna.strip() if s in "atcg"])
    print dnacomp
    The output:
    Code:
    >>> acttaagatacttacctgacaggggtttcttcatcctgggtgattacgtctaggacctagggatcgattctacataataagacgacacttaagctagggtgatttcta
    >>>

    Comment

    • lel7lel7
      New Member
      • Mar 2010
      • 12

      #3
      Ok thanks, i dont really understand the dnacomp command but ill look back at it when im further through the training manual.
      As a simple way ive figured out that it can be done using

      >>>replace(repl ace(replace(rep lace(dna, 'a', "T"), 't', "A"), 'g', "C"), 'c', "G")

      but im assuming this format will not be sufficient when im trying to write harder scripts

      Thanks again :)

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        It is not a command, but an assignment. I assigned a str object to an identifier named dnacomp using string method join(), a dictionary, and a list comprehension. Here's another way to accomplish the same thing:
        Code:
        dd = {"t": "a", "a": "t", "g": "c", "c": "g"}
        temp = []
        for s in dna.strip():
            if s in "atcg":
                temp.append(dd[s])
        dnacomp = "".join(temp)
        This uses str method replace() similar to your code.
        Code:
        dnacomp = dna.replace('a', "T").replace('t', "A").replace('g', "C").replace('c', "G").replace("\n", "").lower()

        Comment

        • lel7lel7
          New Member
          • Mar 2010
          • 12

          #5
          Basic python question

          Ok, im still learning and this is all making sense except one thing...
          Above you write "for s in dna.strip().... " I am trying to understand what 's' represents. Is 's' the dictionary? and does it always have to be this character or can i assign any random one?

          In other words, do 's' and (as below) 'l' , 'k' and 'i' have their own function?

          The reason i ask is because I am trying to understand the function below but i get the following error...
          Code:
          >>> def digests(enzymes):
          	digests = []
          	for i in range(len(enzymes)):
          		for k in range (i+l, len(enzymes)):
          			digests.append( [enzymes[i], enzymes[k]] )
          		return digests
          Traceback (most recent call last):
          File "<pyshell#1 8>", line 4, in digests
          for k in range (i+l, len(enzymes)):
          NameError: global name 'l' is not defined
          Last edited by bvdet; May 11 '10, 07:23 AM. Reason: Add code tags [code].........[/code]

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            "s" represents the current item of the sequence dna.strip() in the for loop. Maybe this example will clear it up:
            Code:
            >>> for s in "string":
            ... 	print s
            ... 	
            s
            t
            r
            i
            n
            g
            >>> s
            'g'
            >>> for item in [1,2,3,4,5]:
            ... 	print item
            ... 	
            1
            2
            3
            4
            5
            >>> item
            5
            >>>

            Comment

            • lel7lel7
              New Member
              • Mar 2010
              • 12

              #7
              Ahh i see now. Thank-you again!!
              :)

              Comment

              • Glenton
                Recognized Expert Contributor
                • Nov 2008
                • 391

                #8
                Although bvdet's solution is brilliant, as always, it is relatively standard in coding to swap things by having an inbetween step like your upper-case letters. eg to swap a and b, you go x=a, a=b, b=x.

                In python this can be done better though with:
                Code:
                a,b=b,a
                e.g.
                Code:
                In [21]: a=1
                
                In [22]: b=2
                
                In [23]: a,b=b,a
                
                In [24]: a
                Out[24]: 2
                
                In [25]: b
                Out[25]: 1

                Comment

                • lel7lel7
                  New Member
                  • Mar 2010
                  • 12

                  #9
                  HI Glenton,
                  I understand what your saying but am having trouble working it...

                  Code:
                  dna = "AAATTTCCCGGG"
                  a = "A"
                  b = "T"
                  c = "G"
                  d = "C"
                  dd = a,b=b,a
                  ddd = c,d=d,c
                  seq = []
                  for x in dna.strip():
                  if x in "ATGC":
                  seq.append(dd[x]).append(ddd[x])
                  return dna
                  dnacomp = "".join(seq)
                  obviously this doesnt work.. ive tried a few other ways also but cannot figure it out

                  Thanks for your help :)

                  Comment

                  • Glenton
                    Recognized Expert Contributor
                    • Nov 2008
                    • 391

                    #10
                    Okay, so what you want to do is replace t to a, a to t, g to c and c to g.

                    Code:
                    In [1]: dna="tgactgacgatgctagct"
                    
                    In [2]: dna=dna.replace("t","x")
                    
                    In [3]: dna=dna.replace("a","t")
                    
                    In [4]: dna=dna.replace("x","a")
                    
                    In [5]: dna
                    Out[5]: 'agtcagtcgtagcatgca'
                    and swapping g and c is similar.

                    Comment

                    • woooee
                      New Member
                      • Mar 2008
                      • 43

                      #11
                      but i cannot figure out how to write a script that will change t to a, a to t, g to c and c to g.
                      Perhaps using a dictionary and looking at each character individually will be easier to understand. A link to an online book's description of a dictionary http://www.greenteapress.com/thinkpy...l/book012.html
                      Code:
                      dna = """tgaattctatgaatggactgtccccaaagaagtaggacccactaatg cagatcctgga 
                      tccctagctaagatgtattattctgctgtgaattcgatcccactaaagat """
                      
                      changes_dict = {"t":"a", "a":"t", "g":"c", "c":"g" }
                      
                      ##   it is more efficient to append to a list and join()
                      ##   rather than concatenating a string each time
                      new_dna = []
                      for ch in dna:
                          if ch in changes_dict:
                              new_dna.append(changes_dict[ch])  ## new character
                          else:
                              new_dna.append(ch)     ## original character
                      new_dna_str = "".join(new_dna)
                      print new_dna_str

                      Comment

                      Working...