How to get the number of A's in each column?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • CCGG26
    New Member
    • Sep 2010
    • 13

    How to get the number of A's in each column?

    ...hey guys.. how can you write aprogram that prints the number of A's in each column of a multiple sequence alignment. For example for the multiple alignment below

    >human
    ACCT
    >mouse
    ACCT
    >cat
    TCCT
    >dog
    ACAT

    the output should be

    2 0 1 0
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    We are not here to write programs for you. At least show us some effort on your part to write the code.

    Comment

    • CCGG26
      New Member
      • Sep 2010
      • 13

      #3
      Code:
      with open("e:/dna14.txt", "r") as myfile:
          data = myfile.readlines()
          myfile.close()
          
      for i in range(0, len(data), 1):
          data[i] = data[i].rstrip("/n")
      
      column_number = input("Please enter a coumn number: ")
      column_number = int(column_number)
      
      ch1 = (data[1])[column_number]
      Acount = 0
      Last edited by bvdet; Nov 18 '10, 11:01 PM. Reason: Please use code tags when posting code. [code]....code goes here....[/code]

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        You can create a new list from data that contains every other list item (each item is a str) in data. It should look like this: ['ACCT', 'ACCT', 'TCCT', 'ACAT']

        Then you can use zip to create a list of tuples. Each tuple would contain the respective column.
        Code:
        >>> zip(*sequences)
        [('A', 'A', 'T', 'A'), ('C', 'C', 'C', 'C'), ('C', 'C', 'C', 'A'), ('T', 'T', 'T', 'T')]
        >>>
        Then:
        Code:
        >>> " ".join([str(list(item).count(letter)) for item in zip(*sequences)])
        '3 0 1 0'
        >>>

        Comment

        • CCGG26
          New Member
          • Sep 2010
          • 13

          #5
          ok, im having trouble understanding zip(*sequences) ..ive never used that before at my level of programming and was wondering if there is an alternative method

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            It can also be accomplished with a list comprehension.
            Code:
            >>> sequences
            ['ACCT', 'ACCT', 'TCCT', 'ACAT', 'CAAT']
            >>> [[item[i] for item in sequences] for i in range(len(sequences[0]))]
            [['A', 'A', 'T', 'A', 'C'], ['C', 'C', 'C', 'C', 'A'], ['C', 'C', 'C', 'A', 'A'], ['T', 'T', 'T', 'T', 'T']]
            >>>

            Comment

            • CCGG26
              New Member
              • Sep 2010
              • 13

              #7
              great, can this way work for any sequence or just this specific problem...if the sequences were of any length how would you combine them without having to type out each sequence?

              Comment

              • bvdet
                Recognized Expert Specialist
                • Oct 2006
                • 2851

                #8
                You would not have to type in anything if you had a disk file to read from. The key is to parse the file as it is read. In this case I might do this:
                Code:
                f = open(file_name)
                sequences = [line.strip() for i, line in enumerate(f) if i>0 and (i==1 or not (i+1)%2)]
                f.close()

                Comment

                Working...