How to get the number of A's in each column?

**bvdet** · Nov 18 '10, 07:13 PM

We are not here to write programs for you. At least show us some effort on your part to write the code.

**CCGG26** · Nov 18 '10, 09:42 PM

Code:

with open("e:/dna14.txt", "r") as myfile:
    data = myfile.readlines()
    myfile.close()
    
for i in range(0, len(data), 1):
    data[i] = data[i].rstrip("/n")

column_number = input("Please enter a coumn number: ")
column_number = int(column_number)

ch1 = (data[1])[column_number]
Acount = 0

**bvdet** · Nov 19 '10, 12:42 AM

You can create a new list from data that contains every other list item (each item is a str) in data. It should look like this: ['ACCT', 'ACCT', 'TCCT', 'ACAT']

Then you can use zip to create a list of tuples. Each tuple would contain the respective column.

Code:

>>> zip(*sequences)
[('A', 'A', 'T', 'A'), ('C', 'C', 'C', 'C'), ('C', 'C', 'C', 'A'), ('T', 'T', 'T', 'T')]
>>>

Then:

Code:

>>> " ".join([str(list(item).count(letter)) for item in zip(*sequences)])
'3 0 1 0'
>>>

**CCGG26** · Nov 19 '10, 01:45 AM

ok, im having trouble understanding zip(*sequences) ..ive never used that before at my level of programming and was wondering if there is an alternative method

**bvdet** · Nov 19 '10, 01:36 PM

It can also be accomplished with a list comprehension.

Code:

>>> sequences
['ACCT', 'ACCT', 'TCCT', 'ACAT', 'CAAT']
>>> [[item[i] for item in sequences] for i in range(len(sequences[0]))]
[['A', 'A', 'T', 'A', 'C'], ['C', 'C', 'C', 'C', 'A'], ['C', 'C', 'C', 'A', 'A'], ['T', 'T', 'T', 'T', 'T']]
>>>

**CCGG26** · Nov 19 '10, 10:12 PM

great, can this way work for any sequence or just this specific problem...if the sequences were of any length how would you combine them without having to type out each sequence?

**bvdet** · Nov 19 '10, 10:32 PM

You would not have to type in anything if you had a disk file to read from. The key is to parse the file as it is read. In this case I might do this:

Code:

f = open(file_name)
sequences = [line.strip() for i, line in enumerate(f) if i>0 and (i==1 or not (i+1)%2)]
f.close()

How to get the number of A's in each column?

How to get the number of A's in each column?

Comment

Comment

Comment

Comment

Comment

Comment

Comment