need help with substring alignment

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • CCGG26
    New Member
    • Sep 2010
    • 13

    need help with substring alignment

    Situation..Give n two sequences, where one is a substring of the other, we define a substring alignment by matching the substring with the longer sequence and placing gaps everywhere else. For example if the input is ACCTGTAGG and TGT then the substring alignment is

    ACCTGTAGG
    ---TGT---

    i need help designing program that prints the substring alignment of two unaligned sequences. If no substring alignment exists then the program should print "No alignment found"...im puzzled over this situation. please help
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Text method find() is the answer. This function should do it:
    Code:
    def alignment(source, target):
        i = source.find(target)
        if i < 0:
            return "No alignment"
        results = ["_"*i,target]
        j = i+len(target)
        while True:
            i = source.find(target, j)
            if i < 0:
                results.append(len(source[j:])*"_")
                return "".join(results)
            results.extend([(i-j)*"_", target])
            j = i+len(target)
    Some interaction:
    Code:
    >>> s1 = "ACCTGTAGGTGTACCGTGT"
    >>> print alignment(s1, "TGTACCGT")
    _________TGTACCGT__
    >>> print alignment(s1, "CTGTA")
    __CTGTA____________
    >>> print alignment(s1, "TGT")
    ___TGT___TGT____TGT
    >>> print alignment(s1, "xxx")
    No alignment
    >>>

    Comment

    • CCGG26
      New Member
      • Sep 2010
      • 13

      #3
      ok, by any chance is it possible to make this read from a txt file and for it to work for any substring alignment

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        What is posted is a function that receives arguments. You can read a file line by line, pass the each line to the function as source and pass the target substring as target. I will leave that exercise to you.

        Comment

        • CCGG26
          New Member
          • Sep 2010
          • 13

          #5
          ok, i inserted this into the program but nothing printed and there wasnt an error message

          Code:
          filename = input("Please enter a filename:")
          with open(filename, "r") as myfile:
              data = myfile.readlines()
          for i in range(len(data)):
              data[i] = data[i].rstrip("\n")

          Comment

          • bb64
            New Member
            • Sep 2010
            • 7

            #6
            if youre given ----------TGT---- as the substring, is there a way to tell the command to only read the letters TGT and ignore the "-" ?

            Comment

            • CCGG26
              New Member
              • Sep 2010
              • 13

              #7
              in this situation the ---TGT--- needs to be read as is...we cannot ignore the dashes or the program will not work

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #8
                You can iterate on data like this:
                Code:
                for line in data:
                    print alignment(line.strip(), "TGT")

                Comment

                • bb64
                  New Member
                  • Sep 2010
                  • 7

                  #9
                  CCGG26 did you figure out this problem yet? I'm still having trouble coming up with answers for this one.

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #10
                    Would str method strip() do what you need?
                    Code:
                    >>> "----------TGT----".strip("-")
                    'TGT'
                    >>>

                    Comment

                    Working...