Print column numbers that are 100% conserved

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • CCGG26
    New Member
    • Sep 2010
    • 13

    Print column numbers that are 100% conserved

    how can i write a Python program that prints the column numbers in a FASTA format multiple alignment that are 100% conserved. im really having trouble getting a grip of this concept...For example in the multiple alignment below

    >human
    ACC
    >mouse
    ACC
    >cat
    TCC
    >dog
    ACA

    column 2 is 100% conserved but columns 1 and 3 are not 100% conserved.
    Last edited by bvdet; Nov 12 '10, 02:13 AM. Reason: Remove question from title
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    What is the output supposed to look like? I don't know what a Fasta format multiple alignment is. :(

    Comment

    • CCGG26
      New Member
      • Sep 2010
      • 13

      #3
      the fasta format multiple alignment is just a name i believe, i do not think it is anything significant..

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        What is the output supposed to look like?

        Comment

        • CCGG26
          New Member
          • Sep 2010
          • 13

          #5
          output: column 2 = 100%

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            What is column 2? 100% of what? Please be specific.

            Comment

            • CCGG26
              New Member
              • Sep 2010
              • 13

              #7
              A 100% conserved column is
              one that has the exact same nucleotide in every sequence. For example if the
              user enters 1 and the the multiple alignment below is given as input

              >human
              ACC
              >mouse
              ACC
              >cat
              TCC
              >dog
              ACA

              then the output should be "No". But if the user enters 2 then the output
              should be "Yes".

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #8
                Following is a test script that show's how it can be done using set().
                Code:
                import random
                
                data = '''>human
                ACC
                >mouse
                ACC
                >cat
                TCC
                >dog
                ACA'''
                
                def conserved(col, seq):
                    colList = set([item[col] for item in seq])
                    if len(colList) == 1:
                        return True
                    return False
                
                dataList = data.split("\n")
                sequences =[list(dataList[i]) for i in range(1, len(dataList), 2)]
                
                column = random.choice([0,1,2])
                
                result = conserved(column, sequences)
                print "Column %s %s conserved" % (column, ["IS", "IS NOT"][not result or 0])
                The three columns are 0, 1 and 2 which is consistent with a list index.

                Comment

                • dwblas
                  Recognized Expert Contributor
                  • May 2008
                  • 626

                  #9
                  Shouldn't it be true for 1 or 2, and false for 3 or 4?

                  Comment

                  Working...