Parsing by Line Data

**Eddie Corns** · Jul 18 '05, 11:57 AM

Re: Parsing by Line Data

python1 <python1@spamle ss.net> writes:
[color=blue]
>Having slight trouble conceptualizing a way to write this script. The
>problem is that I have a bunch of lines in a file, for example:[/color]
[color=blue]
>01A\n
>02B\n
>01A\n
>02B\n
>02C\n
>01A\n
>02B\n
>.
>.
>.[/color]
[color=blue]
>The lines beginning with '01' are the 'header' records, whereas the
>lines beginning with '02' are detail. There can be several detail lines
>to a header.[/color]
[color=blue]
>I'm looking for a way to put the '01' and subsequent '02' line data into
>one list, and breaking into another list when the next '01' record is found.[/color]
[color=blue]
>How would you do this? I'm used to using 'readlines()' to pull the file
>data line by line, but in this case, determining the break-point will
>need to be done by reading the '01' from the line ahead. Would you need
>to read the whole file into a string and use a regex to break where a
>'\n01' is found?[/color]

def gen_records(src ):
rec = []
for line in src:
if line.startswith ('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line )
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to _list (record)

Eddie

**Bill Dandreta** · Jul 18 '05, 11:57 AM

Re: Parsing by Line Data

python1 wrote:[color=blue]
> ...lines in a file, for example:
>
> 01A\n
> 02B\n
> 01A\n
> 02B\n
> 02C\n
> 01A\n
> 02B\n
> .
> .
> .
>
> The lines beginning with '01' are the 'header' records, whereas the
> lines beginning with '02' are detail. There can be several detail lines
> to a header.
>
> I'm looking for a way to put the '01' and subsequent '02' line data into
> one list, and breaking into another list when the next '01' record is
> found.
>
> How would you do this? I'm used to using 'readlines()' to pull the file
> data line by line, but in this case, determining the break-point will
> need to be done by reading the '01' from the line ahead. Would you need
> to read the whole file into a string and use a regex to break where a
> '\n01' is found?[/color]

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r' )
lines = myinput.readlin es()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

**python1** · Jul 18 '05, 11:57 AM

Re: Parsing by Line Data

Eddie Corns wrote:[color=blue]
> python1 <python1@spamle ss.net> writes:
>
>[color=green]
>>Having slight trouble conceptualizing a way to write this script. The
>>problem is that I have a bunch of lines in a file, for example:[/color]
>
>[color=green]
>>01A\n
>>02B\n
>>01A\n
>>02B\n
>>02C\n
>>01A\n
>>02B\n
>>.
>>.
>>.[/color]
>
>[color=green]
>>The lines beginning with '01' are the 'header' records, whereas the
>>lines beginning with '02' are detail. There can be several detail lines
>>to a header.[/color]
>
>[color=green]
>>I'm looking for a way to put the '01' and subsequent '02' line data into
>>one list, and breaking into another list when the next '01' record is found.[/color]
>
>[color=green]
>>How would you do this? I'm used to using 'readlines()' to pull the file
>>data line by line, but in this case, determining the break-point will
>>need to be done by reading the '01' from the line ahead. Would you need
>>to read the whole file into a string and use a regex to break where a
>>'\n01' is found?[/color]
>
>
> def gen_records(src ):
> rec = []
> for line in src:
> if line.startswith ('01'):
> if rec: yield rec
> rec = [line]
> else:
> rec.append(line )
> if rec:yield rec
>
> inf = file('input-file')
> for record in gen_records (inf):
> do_something_to _list (record)
>
> Eddie[/color]

Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)

**python1** · Jul 18 '05, 11:57 AM

Re: Parsing by Line Data

Bill Dandreta wrote:
[color=blue]
> python1 wrote:
>[color=green]
>> ...lines in a file, for example:
>>
>> 01A\n
>> 02B\n
>> 01A\n
>> 02B\n
>> 02C\n
>> 01A\n
>> 02B\n
>> .
>> .
>> .
>>
>> The lines beginning with '01' are the 'header' records, whereas the
>> lines beginning with '02' are detail. There can be several detail
>> lines to a header.
>>
>> I'm looking for a way to put the '01' and subsequent '02' line data
>> into one list, and breaking into another list when the next '01'
>> record is found.
>>
>> How would you do this? I'm used to using 'readlines()' to pull the
>> file data line by line, but in this case, determining the break-point
>> will need to be done by reading the '01' from the line ahead. Would
>> you need to read the whole file into a string and use a regex to break
>> where a '\n01' is found?[/color]
>
>
> First let me prface my remarks by saying I am not much of a programmer
> so this may not be the best way to solve this but I would use a
> dictionary someting like this (untested):
>
> myinput = open(myfile,'r' )
> lines = myinput.readlin es()
> myinput.close()
>
> mydict = {}
> index = -1
>
> for l in lines:
> if l[0:2] == '01'
> counter = 0
> index += 1
> mydict[(index,counter)] = l[2:]
> else:
> mydict[(index,counter)] = l[2:]
> counter += 1
>
> You can easy extract the data with a nested loop.
>
> Bill[/color]

Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.

Thanks again.

**Mitja** · Jul 18 '05, 11:58 AM

Re: Parsing by Line Data

python1 <python1@spamle ss.net>
(news:casjot020 q7@enews3.newsg uy.com) wrote:[color=blue]
> Having slight trouble conceptualizing a way to write this script. The
> problem is that I have a bunch of lines in a file, for example:
>
> 01A\n
> 02B\n
> 01A\n
> 02B\n
> 02C\n
> 01A\n
> 02B\n
> .
> .
> .
>
> The lines beginning with '01' are the 'header' records, whereas the
> lines beginning with '02' are detail. There can be several detail
> lines
> to a header.
>
> I'm looking for a way to put the '01' and subsequent '02' line data
> into one list, and breaking into another list when the next '01'
> record is found.[/color]

I'd probably do something like
records = ('\n'+open('foo .data').read).s plit('\n01')

You can later do
structured=[record.split('\ n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.
[color=blue]
> How would you do this? I'm used to using 'readlines()' to pull the
> file data line by line, but in this case, determining the break-point
> will
> need to be done by reading the '01' from the line ahead. Would you
> need
> to read the whole file into a string and use a regex to break where a
> '\n01' is found?[/color]

Parsing by Line Data

Parsing by Line Data

Comment

Comment

Comment

Comment

Comment