parsing tab-delimited text file into arrays

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cahram
    New Member
    • May 2010
    • 1

    parsing tab-delimited text file into arrays

    Hi, I'm new to Python and have a task of reading a user input text file that is tab-delimited and contains 4 columns in each line: Authors, Year, Title and Journal.

    I currently am just able to open a file, and now I don't know how to begin parsing the data.

    The recommended way of sorting the data is to use the following three lists (which I set as):
    Code:
    authorsList = []
    journalsList = []
    papersList = []
    In the papersList, each paper's entry is its title, year published, the index of each author(s) and the index of the journal; in this way the name of each journal and author is only stored in one place.

    What I learned to do in Python: basic I/O, loops and conditions, defining functions and little exception handling. I've been going through google but a lot of answers to the same question I have, have been using the csv module and regular expressions, which I tried to learn myself but couldn't understand the code that was suggested. Is there a way to do it without the csv and re module?

    I was thinking of doing something like this:
    Code:
    for line in openfile:
       a, b, c, d = line.split("\t")
       authorsList.append(a)
       papersList.append(b, c)
       journalsList.append(d)
    but dont think that is right at all.
    Any suggestions or tips?
    Thanks for your time and consideration.
  • erbrose
    New Member
    • Oct 2006
    • 58

    #2
    this is what i've been doing... im pretty new to python too though...
    Code:
    TmpArr = []
    
    for line in openfile:
        #strips line
        line = line.strip()
        TmpArr.append(line.split('\t'))
    now you have a multidimensiona l list (TmpArr)... you can sort by columns by doing something like this

    Code:
    TmpArr.sort(key=lambda a:(a[0]))
    if say you wanted to sort by authorlist

    Comment

    • Glenton
      Recognized Expert Contributor
      • Nov 2008
      • 391

      #3
      We'll assume that the authors are in a nice consistent format.

      Then something like the following (untested) code should work.

      Code:
      authorsList = []
      journalsList = []
      papersList = []
      for line in openfile:
          authors, yea, tit, jou = line.split("\t")
          authInd=[]  #This is the index we will add to the papers list.
          #suppose authors is a list of authors.  
          #Then we need to go through each one
          for a in authors:
              #first check if it's in authorsList
              if a not in authorsList:
                  #if a is not in authorsList, then add it
                  authorsList.append(a)
              #add the index to the authInd
              authInd.append(authorsList.index(a))
          papersList.append(authInd)
      Etc. Since you need similar stuff for the journal, you might want to write a function that does the necessary work, and then pass both the author and journal stuff to the function.

      Good luck.

      Comment

      Working...