How to find a overlap between 2 sequences, and make the function return the overlap

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AnneTanne
    New Member
    • Jan 2013
    • 4

    How to find a overlap between 2 sequences, and make the function return the overlap

    I need to make a function that determines the overlap between 2 sequences, and then return the overlap.
    The overlap is a coherent sequence that is in the left end of the first sequence, and in the right end of the second sequence.

    my sequences are: s1= "CGATTCCAGGCTCC CCACGGGGTACCCAT AACTTGACAGTAGAT CTC"
    s2= "GGCTCCCCACGGGG TACCCATAACTTGAC AGTAGATCTCGTCCA GACCCCTAGC"

    my function should be named def getOverlap(left , right)

    and should return ‘GGCTCCCCACGGGG TACCCATAACTTGAC AGTAGATCTC’
    if s1 is the left sequence and s2 the right one.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Have you developed an algorithm? Have you tried anything yet?

    Comment

    • AnneTanne
      New Member
      • Jan 2013
      • 4

      #3
      I have tried this:
      Code:
      left = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
      right = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
      
      
      def getOverlap(left,right):
          if left == right[::-1]:
              return ""
          else:
              for i in left:
                  for i in right:
                      if left[len(left)-i]==(right[::-1])[len(right)-i]:
                          if False:
                              continue
                          return right[:len(left)-i]



      But something is very wrong :)
      Last edited by bvdet; Jan 2 '13, 09:12 PM. Reason: Please use code tags when posting code. [code]....[/code]

      Comment

      • AnneTanne
        New Member
        • Jan 2013
        • 4

        #4
        When i turn the s1, and s2 around, the answer is supposed to be 'c', but i get the same answer.

        Comment

        • bvdet
          Recognized Expert Specialist
          • Oct 2006
          • 2851

          #5
          It appears to me you have the sequences switched. The overlap you describe is at the left end of s2 and right end of s1. You can use string method find to find the overlap.

          A possible algorithm:
          1. Initialize an empty string and assign to an identifier, let's say "X"
          2. Iterate on the left string, character by character
          3. Assign "X"+next character to an identifier, let's sat "temp"
          4. If the current value of "temp" is found in the the second string, assign the value of "X" to "temp".
          5. If the current value of "temp" is not found in the the second string, return "X"

          This assumes the "left" string overlap will always start at the beginning of the string.

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            To add to the qualification above - The algorithm I described does not require the overlap to be at the right end of the "right" string.
            Code:
            def getOverlap(left, right):
                overlap = ""
                for c in left:
                    temp  = overlap+c
                    if right.find(temp) >= 0:
                        overlap = temp
                    else:
                        return overlap
                return    
            
            s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
            s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
            
            print getOverlap(s1,s2)
            print getOverlap(s2,s1)
            Produces:
            Code:
            >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
            CG
            >>>

            Comment

            • bvdet
              Recognized Expert Specialist
              • Oct 2006
              • 2851

              #7
              You should use the slice operator to determine the overlap, something like this:
              Code:
              s1 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"
              s2 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
              
              def getOverlap(left, right):
                  if left == right[-len(left):]:
                      return left
                  i = -1
                  while True:
                      temp = left[0:i]
                      if temp == right[-len(temp):]:
                          return temp
                      elif temp == "":
                          return ""
                      i -= 1
              
              print getOverlap(s1,s2)
              print getOverlap(s2,s1)
              Output:
              Code:
              >>> GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC
              C
              >>>

              Comment

              Working...