Parsing a file, and matching numbers of a sequence (python)

**dwblas** · Apr 22 '12, 07:04 PM

We are looking for the gene on the bottom strand. Meaning we count from 0 starting on the right.

You can start from either the left or the right, assuming you already know how to extract the "+" or "-" and the begin and end postitions-->if not post back. Note that the smallest number is always first in the slice-->[:]

Code:

forward="AAAAAAAAAAXXXXXXXXXXTTTTTTTTTTGGGGGGGGGG"
print forward[10:20]  ## prints "X"'s

backward="AAAAAAAAAACCCCCCCCCCTTTTTTTTTTGGGGGGGGGGTTTTTTTTTTGGGGGGGGGGYYYYYYYYYYCCCCCCCCCC"
print backward[-20:-10]  ## prints "Y"'s

You can also split the string if you know that each sequence is always 10, otherwise you have to test for some break indicator.

Code:

backward="AAAAAAAAAACCCCCCCCCCTTTTTTTTTTGGGGGGGGGGTTTTTTTTTTGGGGGGGGGGAAAAAAAAAACCCCCCCCCC"

list_of_seq=[backward[start:start+10] for start in range(0, len(backward), 10)]
print list_of_seq

for start in [10, -20, 30, -30, 70]:
    print list_of_seq[start/10]

**dwblas** · Apr 22 '12, 07:18 PM

fields = line.split(' \t')

This may give you and extra field because of the newline=\n, or retain the newline in the last field which could also create problems. Instead strip the whitespace first:
fields = line.strip().sp lit()

Note that split() receives the return of strip() and the default will split on whitespace (any spaces, tabs, or newlines combination).

Parsing a file, and matching numbers of a sequence (python)

Parsing a file, and matching numbers of a sequence (python)

Comment

Comment