Parsing by Line Data

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • python1

    Parsing by Line Data

    Having slight trouble conceptualizing a way to write this script. The
    problem is that I have a bunch of lines in a file, for example:

    01A\n
    02B\n
    01A\n
    02B\n
    02C\n
    01A\n
    02B\n
    ..
    ..
    ..

    The lines beginning with '01' are the 'header' records, whereas the
    lines beginning with '02' are detail. There can be several detail lines
    to a header.

    I'm looking for a way to put the '01' and subsequent '02' line data into
    one list, and breaking into another list when the next '01' record is found.

    How would you do this? I'm used to using 'readlines()' to pull the file
    data line by line, but in this case, determining the break-point will
    need to be done by reading the '01' from the line ahead. Would you need
    to read the whole file into a string and use a regex to break where a
    '\n01' is found?
  • Eddie Corns

    #2
    Re: Parsing by Line Data

    python1 <python1@spamle ss.net> writes:
    [color=blue]
    >Having slight trouble conceptualizing a way to write this script. The
    >problem is that I have a bunch of lines in a file, for example:[/color]
    [color=blue]
    >01A\n
    >02B\n
    >01A\n
    >02B\n
    >02C\n
    >01A\n
    >02B\n
    >.
    >.
    >.[/color]
    [color=blue]
    >The lines beginning with '01' are the 'header' records, whereas the
    >lines beginning with '02' are detail. There can be several detail lines
    >to a header.[/color]
    [color=blue]
    >I'm looking for a way to put the '01' and subsequent '02' line data into
    >one list, and breaking into another list when the next '01' record is found.[/color]
    [color=blue]
    >How would you do this? I'm used to using 'readlines()' to pull the file
    >data line by line, but in this case, determining the break-point will
    >need to be done by reading the '01' from the line ahead. Would you need
    >to read the whole file into a string and use a regex to break where a
    >'\n01' is found?[/color]

    def gen_records(src ):
    rec = []
    for line in src:
    if line.startswith ('01'):
    if rec: yield rec
    rec = [line]
    else:
    rec.append(line )
    if rec:yield rec

    inf = file('input-file')
    for record in gen_records (inf):
    do_something_to _list (record)

    Eddie

    Comment

    • Bill Dandreta

      #3
      Re: Parsing by Line Data

      python1 wrote:[color=blue]
      > ...lines in a file, for example:
      >
      > 01A\n
      > 02B\n
      > 01A\n
      > 02B\n
      > 02C\n
      > 01A\n
      > 02B\n
      > .
      > .
      > .
      >
      > The lines beginning with '01' are the 'header' records, whereas the
      > lines beginning with '02' are detail. There can be several detail lines
      > to a header.
      >
      > I'm looking for a way to put the '01' and subsequent '02' line data into
      > one list, and breaking into another list when the next '01' record is
      > found.
      >
      > How would you do this? I'm used to using 'readlines()' to pull the file
      > data line by line, but in this case, determining the break-point will
      > need to be done by reading the '01' from the line ahead. Would you need
      > to read the whole file into a string and use a regex to break where a
      > '\n01' is found?[/color]

      First let me prface my remarks by saying I am not much of a programmer
      so this may not be the best way to solve this but I would use a
      dictionary someting like this (untested):

      myinput = open(myfile,'r' )
      lines = myinput.readlin es()
      myinput.close()

      mydict = {}
      index = -1

      for l in lines:
      if l[0:2] == '01'
      counter = 0
      index += 1
      mydict[(index,counter)] = l[2:]
      else:
      mydict[(index,counter)] = l[2:]
      counter += 1

      You can easy extract the data with a nested loop.

      Bill

      Comment

      • python1

        #4
        Re: Parsing by Line Data

        Eddie Corns wrote:[color=blue]
        > python1 <python1@spamle ss.net> writes:
        >
        >[color=green]
        >>Having slight trouble conceptualizing a way to write this script. The
        >>problem is that I have a bunch of lines in a file, for example:[/color]
        >
        >[color=green]
        >>01A\n
        >>02B\n
        >>01A\n
        >>02B\n
        >>02C\n
        >>01A\n
        >>02B\n
        >>.
        >>.
        >>.[/color]
        >
        >[color=green]
        >>The lines beginning with '01' are the 'header' records, whereas the
        >>lines beginning with '02' are detail. There can be several detail lines
        >>to a header.[/color]
        >
        >[color=green]
        >>I'm looking for a way to put the '01' and subsequent '02' line data into
        >>one list, and breaking into another list when the next '01' record is found.[/color]
        >
        >[color=green]
        >>How would you do this? I'm used to using 'readlines()' to pull the file
        >>data line by line, but in this case, determining the break-point will
        >>need to be done by reading the '01' from the line ahead. Would you need
        >>to read the whole file into a string and use a regex to break where a
        >>'\n01' is found?[/color]
        >
        >
        > def gen_records(src ):
        > rec = []
        > for line in src:
        > if line.startswith ('01'):
        > if rec: yield rec
        > rec = [line]
        > else:
        > rec.append(line )
        > if rec:yield rec
        >
        > inf = file('input-file')
        > for record in gen_records (inf):
        > do_something_to _list (record)
        >
        > Eddie[/color]

        Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)

        Comment

        • python1

          #5
          Re: Parsing by Line Data

          Bill Dandreta wrote:
          [color=blue]
          > python1 wrote:
          >[color=green]
          >> ...lines in a file, for example:
          >>
          >> 01A\n
          >> 02B\n
          >> 01A\n
          >> 02B\n
          >> 02C\n
          >> 01A\n
          >> 02B\n
          >> .
          >> .
          >> .
          >>
          >> The lines beginning with '01' are the 'header' records, whereas the
          >> lines beginning with '02' are detail. There can be several detail
          >> lines to a header.
          >>
          >> I'm looking for a way to put the '01' and subsequent '02' line data
          >> into one list, and breaking into another list when the next '01'
          >> record is found.
          >>
          >> How would you do this? I'm used to using 'readlines()' to pull the
          >> file data line by line, but in this case, determining the break-point
          >> will need to be done by reading the '01' from the line ahead. Would
          >> you need to read the whole file into a string and use a regex to break
          >> where a '\n01' is found?[/color]
          >
          >
          > First let me prface my remarks by saying I am not much of a programmer
          > so this may not be the best way to solve this but I would use a
          > dictionary someting like this (untested):
          >
          > myinput = open(myfile,'r' )
          > lines = myinput.readlin es()
          > myinput.close()
          >
          > mydict = {}
          > index = -1
          >
          > for l in lines:
          > if l[0:2] == '01'
          > counter = 0
          > index += 1
          > mydict[(index,counter)] = l[2:]
          > else:
          > mydict[(index,counter)] = l[2:]
          > counter += 1
          >
          > You can easy extract the data with a nested loop.
          >
          > Bill[/color]

          Thanks Bill. Will use this script in place of Eddie's if python is sub
          2.2 on our Aix box.

          Thanks again.

          Comment

          • Mitja

            #6
            Re: Parsing by Line Data

            python1 <python1@spamle ss.net>
            (news:casjot020 q7@enews3.newsg uy.com) wrote:[color=blue]
            > Having slight trouble conceptualizing a way to write this script. The
            > problem is that I have a bunch of lines in a file, for example:
            >
            > 01A\n
            > 02B\n
            > 01A\n
            > 02B\n
            > 02C\n
            > 01A\n
            > 02B\n
            > .
            > .
            > .
            >
            > The lines beginning with '01' are the 'header' records, whereas the
            > lines beginning with '02' are detail. There can be several detail
            > lines
            > to a header.
            >
            > I'm looking for a way to put the '01' and subsequent '02' line data
            > into one list, and breaking into another list when the next '01'
            > record is found.[/color]

            I'd probably do something like
            records = ('\n'+open('foo .data').read).s plit('\n01')

            You can later do
            structured=[record.split('\ n') for record in records]
            to get a list of lists. '01' is stripped from structured[0] and there may be
            other flaws, but I guess the concept is clear.
            [color=blue]
            > How would you do this? I'm used to using 'readlines()' to pull the
            > file data line by line, but in this case, determining the break-point
            > will
            > need to be done by reading the '01' from the line ahead. Would you
            > need
            > to read the whole file into a string and use a regex to break where a
            > '\n01' is found?[/color]


            Comment

            Working...