Large lists in python

**bvdet** · Aug 14 '10, 03:34 PM

My thought would be to read in a range of lines at a time, process those lines and move onto the next range, storing the results in a file as needed.

This function reads in a range of lines:

Code:

def fileLineRange(fn, start, end):
    f = open(fn)
    for i in xrange(start-1):
        try:
            f.next()
        except StopIteration, e:
            return "Start line %s is beyond end of file." % (num)
        
    outputList = []
    for line in xrange(start, end+1):
        outputList.append(f.next().strip())
    f.close()
    return outputList

fileLineRange(f n, 700, 720) would read in lines 700 through 720.

**fekioh** · Aug 14 '10, 03:48 PM

Yes, I thought of that. Problem is: (i) I need to be able to calculate statistics for different block sizes without having to read the file over and over again and (ii) I need to know some info from the very last line (files have a time-column, started the same time but are not equally long).

Is there any way to store the whole thing in some kind of data structure (e.g. to create a class "extending" list or something?) Sorry for the java terminology :)

**bvdet** · Aug 14 '10, 03:57 PM

You are only truly reading a group of lines at a time, but I understand that it might not be the most efficient way. You should consider storing all the data in a MySql database for efficient access. MySqldb is the Python interface.

An afterthought to the code I posted. In case the end line number is greater than the number of lines, I added a try/except block:

Code:

def fileLineRange(fn, start, end):
    f = open(fn)
    for i in xrange(start-1):
        try:
            f.next()
        except StopIteration, e:
            return "Start line %s is beyond end of file." % (num)
        
    outputList = []
    for i in xrange(start, end+1):
        try:
            outputList.append(f.next().strip())
        except StopIteration, e:
            print "The last line in the file is line number %s." % (i-1)
            break
    f.close()
    return outputList

**fekioh** · Aug 14 '10, 04:11 PM

Hmm, sorry I wasn't very clear. What I meant is:

(i) the files contain ~ month long measurements and I'd like to be able when I've read a file in to have e.g. per-day or per-week means. Or for a specific file to focus on the first hours. So that's what I meant I don't want to read the whole thing over and over again...

(ii) as for the last line, I guess it's not a big issue. I just need to know the duration of all measurements from the start to do some of the calculations. But I guess I should just read the last line in the beginning and then go back to the start of the file.

**fekioh** · Aug 14 '10, 04:32 PM

Also, not very familiar with MySQL. Is there no alternative "large list implementation" of say storing on disk and loading in RAM a chunk ("page") at a time.

**dwblas** · Aug 15 '10, 05:17 PM

You may want to use SQL, but since you do not say what specifically you want to access or how you want to do it, it is difficult to tell whether using a list is the best way. Most of us have code generators for quick and dirty apps, so below is a generated SQL example of what you might want to do, using SQLite which comes with Python (code comments are sparse though). I don't want to waste time on something that may not be used, so post back if you want more info.

Code:

import random
import sqlite3 as sqlite

class SQLTest:
   def __init__( self ) :
      self.SQL_filename = './SQLtest.SQL'
      self.open_files()

   ##----------------------------------------------------------------------
   def add_rec( self, val_tuple) :
      self.cur.execute('INSERT INTO example_dbf values (?,?,?,?,?)', val_tuple)
      self.con.commit()

   ##----------------------------------------------------------------------
   def list_all_recs( self ) :
      self.cur.execute("select * from example_dbf")
      recs_list = self.cur.fetchall()
      for rec in recs_list:
         print rec

   ##----------------------------------------------------------------------
   def lookup_date( self, date_in ) :
      self.cur.execute("select * from example_dbf where st_date==:dic_lookup", 
              {"dic_lookup":date_in})
      recs_list = self.cur.fetchall()
      print
      print "lookup_date" 
      for rec in recs_list:
         print "%3d %9s %10.6f %3d  %s" % (rec[0], rec[1], rec[2], rec[3], rec[4])

   ##----------------------------------------------------------------------
   def lookup_2_fields( self, lookup_dic ) :
      self.cur.execute("select * from example_dbf where st_date==:dic_field_1 and st_int==:dic_field_2", lookup_dic)

      recs_list = self.cur.fetchall()
      print
      print "lookup_2_fields" 
      if len(recs_list):
         for rec in recs_list:
            print rec
      else:
         print "no recs found"

   ##----------------------------------------------------------------------
   def open_files( self ) :
         ##  a connection to the database file
         self.con = sqlite.connect(self.SQL_filename)

         # Get a Cursor object that operates in the context of Connection con
         self.cur = self.con.cursor()

         ##--- CREATE FILE ONLY IF IT DOESN'T EXIST
         self.cur.execute("CREATE TABLE IF NOT EXISTS example_dbf(st_rec_num int, st_date varchar, st_float, st_int int, st_lit varchar)")

##===================================================================
if __name__ == "__main__":
   ST = SQLTest()

   """ add some records with the format
       record_number  date  float  int  string
   """
   rec_num = 0
   ccyy = 2010
   for x in range(1, 11):
      rec_num += 1
      mm = x + 1
      dd = x + 2
      date = "%d%02d%02d" % (ccyy, mm, dd)
      add_fl = random.random() * 1000
      add_int = random.randint(1, 21)
      lit = "test lit # %d" % (x)
      ST.add_rec( (rec_num, date, add_fl, add_int, lit) )

   ## add duplicate dates for testing
   for x in range(1, 3):
      for y in range(2):
         rec_num += 1
         mm = x + 1
         dd = x + 2
         date = "%d%02d%02d" % (ccyy, mm, dd)
         add_fl = random.random() * 1000
         add_int = random.randint(1, 21)
         lit = "test lit # %d" % (x)
         ST.add_rec( (rec_num, date, add_fl, add_int, lit) )
  
   ST.list_all_recs()
   ST.lookup_date("20100203")

   lookup_dict = {"dic_field_1":"20100203",
                  "dic_field_2":10}
   ST.lookup_2_fields(lookup_dict)

**fekioh** · Aug 15 '10, 08:36 PM

Thank you, i will look into this tomorrow and I'll post back if in trouble..

Cheers!

Large lists in python

Large lists in python

Comment

Comment

Comment

Comment

Comment

Comment

Comment