Reading from a file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RodAG
    New Member
    • Mar 2012
    • 6

    Reading from a file

    I'm trying to get data from a txt file, still I don't know how to do it. The data is in fasta format (a format used in molecular biology to store protein/DNA sequences) whis is very simple:
    Code:
    >Header1
    Sequence1
    
    >Header2
    Sequence2
    .
    .
    .
    >HeaderN
    SequenceN
    The ">" is always present and denotes an identifier line (in which we usually write the name/id of the sequence below). The line or lines following the header are the proper sequence, which have different lenghts.

    So, my question is which instructions to use so I can read the file and copy all the sequences, each one in a list for itself.
    Any ideas? Thanks in advance.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    The following code will read a text file and create a dictionary of headers and sequences.
    Code:
    f = open("fasta1.txt")
    dd = {}
    current_header = False
    for line in f:
        if line.startswith(">"):
            current_header = line[1:].strip()
        elif current_header:
            dd.setdefault(current_header, []).append(line.strip())
    f.close()
    It saves the sequences in a list in case the sequence spans over multiple lines.

    Comment

    Working...