Newbie code review of parsing program Please

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • len

    Newbie code review of parsing program Please

    I have created the following program to read a text file which happens
    to be a cobol filed definition. The program then outputs to a file
    what is essentially a file which is a list definition which I can
    later
    copy and past into a python program. I will eventually expand the
    program
    to also output an SQL script to create a SQL file in MySQL

    The program still need a little work, it does not handle the following
    items
    yet;

    1. It does not handle OCCURS yet.
    2. It does not handle REDEFINE yet.
    3. GROUP structures will need work.
    4. Does not create SQL script yet.

    It is my anticipation that any files created out of this program may
    need
    manual tweeking but I have a large number of cobol file definitions
    which
    I may need to work with and this seemed like a better solution than
    hand
    typing each list definition and SQL create file script by hand.

    What I would like is if some kind soul could review my code and give
    me
    some suggestions on how I might improve it. I think the use of
    regular
    expression might cut the code down or at least simplify the parsing
    but
    I'm just starting to read those chapters in the book;)

    *** SAMPLE INPUT FILE ***

    000100 FD SALESMEN-FILE
    000200 LABEL RECORDS ARE STANDARD
    000300 VALUE OF FILENAME IS "SALESMEN".
    000400
    000500 01 SALESMEN-RECORD.
    000600 05 SALESMEN-NO PIC 9(3).
    000700 05 SALESMEN-NAME PIC X(30).
    000800 05 SALESMEN-TERRITORY PIC X(30).
    000900 05 SALESMEN-QUOTA PIC S9(7) COMP.
    001000 05 SALESMEN-1ST-BONUS PIC S9(5)V99 COMP.
    001100 05 SALESMEN-2ND-BONUS PIC S9(5)V99 COMP.
    001200 05 SALESMEN-3RD-BONUS PIC S9(5)V99 COMP.
    001300 05 SALESMEN-4TH-BONUS PIC S9(5)V99 COMP.

    *** PROGRAM CODE ***

    #!/usr/bin/python

    import sys

    f_path = '/home/lenyel/Bruske/MCBA/Internet/'
    f_name = sys.argv[1]

    fd = open(f_path + f_name, 'r')

    def fmtline(fieldli ne):
    size = ''
    type = ''
    dec = ''
    codeline = []
    if fieldline.count ('COMP.') 0:
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    num = fieldline[3][left:right].lstrip()
    if fieldline[3].count('V'):
    left = fieldline[3].find('V') + 1
    dec = int(len(fieldli ne[3][left:]))
    size = ((int(num) + int(dec)) / 2) + 1
    else:
    size = (int(num) / 2) + 1
    dec = 0
    type = 'Pdec'
    elif fieldline[3][0] in ('X', '9'):
    dec = 0
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    size = int(fieldline[3][left:right].lstrip('0'))
    if fieldline[3][0] == 'X':
    type = 'Xstr'
    else:
    type = 'Xint'
    else:
    dec = 0
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    size = int(fieldline[3][left:right].lstrip('0'))
    if fieldline[3][0] == 'X':
    type = 'Xint'
    codeline.append (fieldline[1].replace('-', '_').replace('. ',
    '').lower())
    codeline.append (size)
    codeline.append (type)
    codeline.append (dec)
    return codeline

    wrkfd = []
    rec_len = 0

    for line in fd:
    if line[6] == '*': # drop comment lines
    continue
    newline = line.split()
    if len(newline) == 1: # drop blank line
    continue
    newline = newline[1:]
    if 'FILENAME' in newline:
    filename = newline[-1].replace('"','' ).lower()
    filename = filename.replac e('.','')
    output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
    +'.fd', 'w')
    code = filename + ' = [\n'
    output.write(co de)
    elif newline[0].isdigit() and 'PIC' in newline:
    wrkfd.append(fm tline(newline))
    rec_len += wrkfd[-1][1]

    fd.close()

    fmtfd = []

    for wrkline in wrkfd[:-1]:
    fmtline = str(tuple(wrkli ne)) + ',\n'
    output.write(fm tline)

    fmtline = tuple(wrkfd[-1])
    fmtline = str(fmtline) + '\n'
    output.write(fm tline)

    lastline = ']\n'
    output.write(la stline)

    lenrec = filename + '_len = ' + str(rec_len)
    output.write(le nrec)

    output.close()

    *** RESULTING OUTPUT ***

    salesmen = [
    ('salesmen_no', 3, 'Xint', 0),
    ('salesmen_name ', 30, 'Xstr', 0),
    ('salesmen_terr itory', 30, 'Xstr', 0),
    ('salesmen_quot a', 4, 'Pdec', 0),
    ('salesmen_1st_ bonus', 4, 'Pdec', 2),
    ('salesmen_2nd_ bonus', 4, 'Pdec', 2),
    ('salesmen_3rd_ bonus', 4, 'Pdec', 2),
    ('salesmen_4th_ bonus', 4, 'Pdec', 2)
    ]
    salesmen_len = 83

    If you find this code useful please feel free to use any or all of it
    at your own risk.

    Thanks
    Len S
  • Mark Tolonen

    #2
    Re: Newbie code review of parsing program Please


    "len" <lsumnler@gmail .comwrote in message
    news:fc3ef718-edc4-4892-8418-3eeff0975edc@u1 8g2000pro.googl egroups.com...
    >I have created the following program to read a text file which happens
    to be a cobol filed definition. The program then outputs to a file
    what is essentially a file which is a list definition which I can
    later
    copy and past into a python program. I will eventually expand the
    program
    to also output an SQL script to create a SQL file in MySQL
    >
    The program still need a little work, it does not handle the following
    items
    yet;
    >
    1. It does not handle OCCURS yet.
    2. It does not handle REDEFINE yet.
    3. GROUP structures will need work.
    4. Does not create SQL script yet.
    >
    It is my anticipation that any files created out of this program may
    need
    manual tweeking but I have a large number of cobol file definitions
    which
    I may need to work with and this seemed like a better solution than
    hand
    typing each list definition and SQL create file script by hand.
    >
    What I would like is if some kind soul could review my code and give
    me
    some suggestions on how I might improve it. I think the use of
    regular
    expression might cut the code down or at least simplify the parsing
    but
    I'm just starting to read those chapters in the book;)
    >
    *** SAMPLE INPUT FILE ***
    >
    000100 FD SALESMEN-FILE
    000200 LABEL RECORDS ARE STANDARD
    000300 VALUE OF FILENAME IS "SALESMEN".
    000400
    000500 01 SALESMEN-RECORD.
    000600 05 SALESMEN-NO PIC 9(3).
    000700 05 SALESMEN-NAME PIC X(30).
    000800 05 SALESMEN-TERRITORY PIC X(30).
    000900 05 SALESMEN-QUOTA PIC S9(7) COMP.
    001000 05 SALESMEN-1ST-BONUS PIC S9(5)V99 COMP.
    001100 05 SALESMEN-2ND-BONUS PIC S9(5)V99 COMP.
    001200 05 SALESMEN-3RD-BONUS PIC S9(5)V99 COMP.
    001300 05 SALESMEN-4TH-BONUS PIC S9(5)V99 COMP.
    >
    *** PROGRAM CODE ***
    >
    #!/usr/bin/python
    >
    import sys
    >
    f_path = '/home/lenyel/Bruske/MCBA/Internet/'
    f_name = sys.argv[1]
    >
    fd = open(f_path + f_name, 'r')
    >
    def fmtline(fieldli ne):
    size = ''
    type = ''
    dec = ''
    codeline = []
    if fieldline.count ('COMP.') 0:
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    num = fieldline[3][left:right].lstrip()
    if fieldline[3].count('V'):
    left = fieldline[3].find('V') + 1
    dec = int(len(fieldli ne[3][left:]))
    size = ((int(num) + int(dec)) / 2) + 1
    else:
    size = (int(num) / 2) + 1
    dec = 0
    type = 'Pdec'
    elif fieldline[3][0] in ('X', '9'):
    dec = 0
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    size = int(fieldline[3][left:right].lstrip('0'))
    if fieldline[3][0] == 'X':
    type = 'Xstr'
    else:
    type = 'Xint'
    else:
    dec = 0
    left = fieldline[3].find('(') + 1
    right = fieldline[3].find(')')
    size = int(fieldline[3][left:right].lstrip('0'))
    if fieldline[3][0] == 'X':
    type = 'Xint'
    codeline.append (fieldline[1].replace('-', '_').replace('. ',
    '').lower())
    codeline.append (size)
    codeline.append (type)
    codeline.append (dec)
    return codeline
    >
    wrkfd = []
    rec_len = 0
    >
    for line in fd:
    if line[6] == '*': # drop comment lines
    continue
    newline = line.split()
    if len(newline) == 1: # drop blank line
    continue
    newline = newline[1:]
    if 'FILENAME' in newline:
    filename = newline[-1].replace('"','' ).lower()
    filename = filename.replac e('.','')
    output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
    +'.fd', 'w')
    code = filename + ' = [\n'
    output.write(co de)
    elif newline[0].isdigit() and 'PIC' in newline:
    wrkfd.append(fm tline(newline))
    rec_len += wrkfd[-1][1]
    >
    fd.close()
    >
    fmtfd = []
    >
    for wrkline in wrkfd[:-1]:
    fmtline = str(tuple(wrkli ne)) + ',\n'
    output.write(fm tline)
    >
    fmtline = tuple(wrkfd[-1])
    fmtline = str(fmtline) + '\n'
    output.write(fm tline)
    >
    lastline = ']\n'
    output.write(la stline)
    >
    lenrec = filename + '_len = ' + str(rec_len)
    output.write(le nrec)
    >
    output.close()
    >
    *** RESULTING OUTPUT ***
    >
    salesmen = [
    ('salesmen_no', 3, 'Xint', 0),
    ('salesmen_name ', 30, 'Xstr', 0),
    ('salesmen_terr itory', 30, 'Xstr', 0),
    ('salesmen_quot a', 4, 'Pdec', 0),
    ('salesmen_1st_ bonus', 4, 'Pdec', 2),
    ('salesmen_2nd_ bonus', 4, 'Pdec', 2),
    ('salesmen_3rd_ bonus', 4, 'Pdec', 2),
    ('salesmen_4th_ bonus', 4, 'Pdec', 2)
    ]
    salesmen_len = 83
    >
    If you find this code useful please feel free to use any or all of it
    at your own risk.
    >
    Thanks
    Len S
    You might want to check out the pyparsing library.

    -Mark

    Comment

    • len

      #3
      Re: Newbie code review of parsing program Please

      On Nov 16, 12:40 pm, "Mark Tolonen" <M8R-yft...@mailinat or.comwrote:
      "len" <lsumn...@gmail .comwrote in message
      >
      news:fc3ef718-edc4-4892-8418-3eeff0975edc@u1 8g2000pro.googl egroups.com...
      >
      >
      >
      >
      >
      I have created the following program to read a text file which happens
      to be a cobol filed definition.  The program then outputs to a file
      what is essentially a file which is a list definition which I can
      later
      copy and past into a python program.  I will eventually expand the
      program
      to also output an SQL script to create a SQL file in MySQL
      >
      The program still need a little work, it does not handle the following
      items
      yet;
      >
      1.  It does not handle OCCURS yet.
      2.  It does not handle REDEFINE yet.
      3.  GROUP structures will need work.
      4.  Does not create SQL script yet.
      >
      It is my anticipation that any files created out of this program may
      need
      manual tweeking but I have a large number of cobol file definitions
      which
      I may need to work with and this seemed like a better solution than
      hand
      typing each list definition and SQL create file script by hand.
      >
      What I would like is if some kind soul could review my code and give
      me
      some suggestions on how I might improve it.  I think the use of
      regular
      expression might cut the code down or at least simplify the parsing
      but
      I'm just starting to read those chapters in the book;)
      >
      *** SAMPLE INPUT FILE ***
      >
      000100 FD  SALESMEN-FILE
      000200     LABEL RECORDS ARE STANDARD
      000300     VALUE OF FILENAME IS "SALESMEN".
      000400
      000500 01  SALESMEN-RECORD.
      000600     05  SALESMEN-NO                PIC 9(3).
      000700     05  SALESMEN-NAME              PIC X(30)..
      000800     05  SALESMEN-TERRITORY         PIC X(30).
      000900     05  SALESMEN-QUOTA             PIC S9(7) COMP.
      001000     05  SALESMEN-1ST-BONUS         PIC S9(5)V99 COMP.
      001100     05  SALESMEN-2ND-BONUS         PIC S9(5)V99 COMP.
      001200     05  SALESMEN-3RD-BONUS         PIC S9(5)V99 COMP.
      001300     05  SALESMEN-4TH-BONUS         PIC S9(5)V99 COMP.
      >
      *** PROGRAM CODE ***
      >
      #!/usr/bin/python
      >
      import sys
      >
      f_path = '/home/lenyel/Bruske/MCBA/Internet/'
      f_name = sys.argv[1]
      >
      fd = open(f_path + f_name, 'r')
      >
      def fmtline(fieldli ne):
         size = ''
         type = ''
         dec = ''
         codeline = []
         if fieldline.count ('COMP.') 0:
             left = fieldline[3].find('(') + 1
             right = fieldline[3].find(')')
             num = fieldline[3][left:right].lstrip()
             if fieldline[3].count('V'):
                 left = fieldline[3].find('V') + 1
                 dec = int(len(fieldli ne[3][left:]))
                 size = ((int(num) + int(dec)) / 2) + 1
             else:
                 size = (int(num) / 2) + 1
                 dec = 0
             type = 'Pdec'
         elif fieldline[3][0] in ('X', '9'):
             dec = 0
             left = fieldline[3].find('(') + 1
             right = fieldline[3].find(')')
             size = int(fieldline[3][left:right].lstrip('0'))
             if fieldline[3][0] == 'X':
                 type = 'Xstr'
             else:
                 type = 'Xint'
         else:
             dec = 0
             left = fieldline[3].find('(') + 1
             right = fieldline[3].find(')')
             size = int(fieldline[3][left:right].lstrip('0'))
             if fieldline[3][0] == 'X':
                 type = 'Xint'
         codeline.append (fieldline[1].replace('-', '_').replace('. ',
      '').lower())
         codeline.append (size)
         codeline.append (type)
         codeline.append (dec)
         return codeline
      >
      wrkfd = []
      rec_len = 0
      >
      for line in fd:
         if line[6] == '*':      # drop comment lines
             continue
         newline = line.split()
         if len(newline) == 1:   # drop blank line
             continue
         newline = newline[1:]
         if 'FILENAME' in newline:
             filename = newline[-1].replace('"','' ).lower()
             filename = filename.replac e('.','')
             output = open('/home/lenyel/Bruske/MCBA/Internet/'+filename
      +'.fd', 'w')
             code = filename + ' = [\n'
             output.write(co de)
         elif newline[0].isdigit() and 'PIC' in newline:
             wrkfd.append(fm tline(newline))
             rec_len += wrkfd[-1][1]
      >
      fd.close()
      >
      fmtfd = []
      >
      for wrkline in wrkfd[:-1]:
         fmtline = str(tuple(wrkli ne)) + ',\n'
         output.write(fm tline)
      >
      fmtline = tuple(wrkfd[-1])
      fmtline = str(fmtline) + '\n'
      output.write(fm tline)
      >
      lastline = ']\n'
      output.write(la stline)
      >
      lenrec = filename + '_len = ' + str(rec_len)
      output.write(le nrec)
      >
      output.close()
      >
      *** RESULTING OUTPUT ***
      >
      salesmen = [
      ('salesmen_no', 3, 'Xint', 0),
      ('salesmen_name ', 30, 'Xstr', 0),
      ('salesmen_terr itory', 30, 'Xstr', 0),
      ('salesmen_quot a', 4, 'Pdec', 0),
      ('salesmen_1st_ bonus', 4, 'Pdec', 2),
      ('salesmen_2nd_ bonus', 4, 'Pdec', 2),
      ('salesmen_3rd_ bonus', 4, 'Pdec', 2),
      ('salesmen_4th_ bonus', 4, 'Pdec', 2)
      ]
      salesmen_len = 83
      >
      If you find this code useful please feel free to use any or all of it
      at your own risk.
      >
      Thanks
      Len S
      >
      You might want to check out the pyparsing library.
      >
      -Mark
      Thanks Mark I will check in out right now.

      Len

      Comment

      • Steve Holden

        #4
        Re: Newbie code review of parsing program Please

        Mark Tolonen wrote:
        >
        "len" <lsumnler@gmail .comwrote in message
        news:fc3ef718-edc4-4892-8418-3eeff0975edc@u1 8g2000pro.googl egroups.com...
        [...]
        >
        You might want to check out the pyparsing library.
        >
        And you might want to trim your messages to avoid quoting irrelevant
        stuff. This is not directed personally at Mark, but at all readers.

        Loads of us do it, and I wish we'd stop it. It's poor netiquette because
        it forces people to skip past stuff that isn't relevant to the point
        being made. It's also a global wste of bandwidth and storage space,
        though that's less important than it used to be.

        regards
        Steve
        --
        Steve Holden +1 571 484 6266 +1 800 494 3119
        Holden Web LLC http://www.holdenweb.com/

        Comment

        • Lawrence D'Oliveiro

          #5
          Re: Newbie code review of parsing program Please

          len wrote:
          if fieldline.count ('COMP.') 0:
          I take it you're only handling a particular subset of COBOL constructs: thus, "COMP" is never "COMPUTATIO NAL" or "USAGE IS COMPUTATIONAL", and it always occurs just before the full-stop (can't remember enough COBOL syntax to be sure if anything else can go afterwards).
          elif newline[0].isdigit() and 'PIC' in newline:
          Similarly, "PIC" is never "PICTURE" or "PICTURE IS".

          Aargh, I think I have to stop. I'm remembering more than I ever wanted to about COBOL. Must ... rip ... brain ... out ...

          Comment

          • Mark Tolonen

            #6
            Re: Newbie code review of parsing program Please


            "Steve Holden" <steve@holdenwe b.comwrote in message
            news:mailman.40 99.1226863455.3 487.python-list@python.org ...
            Mark Tolonen wrote:
            >>
            >"len" <lsumnler@gmail .comwrote in message
            >news:fc3ef71 8-edc4-4892-8418-3eeff0975edc@u1 8g2000pro.googl egroups.com...
            [...]
            >>
            >You might want to check out the pyparsing library.
            >>
            And you might want to trim your messages to avoid quoting irrelevant
            stuff. This is not directed personally at Mark, but at all readers.
            >
            Loads of us do it, and I wish we'd stop it. It's poor netiquette because
            it forces people to skip past stuff that isn't relevant to the point
            being made. It's also a global wste of bandwidth and storage space,
            though that's less important than it used to be.
            Point taken...or I could top post ;^)

            -Mark

            Comment

            • Lawrence D'Oliveiro

              #7
              Re: Newbie code review of parsing program Please

              Mark Tolonen wrote:
              Point taken...or I could top post ;^)
              A: A Rolls seats six.
              Q: What's the saddest thing about seeing a Rolls with five top-posters in it going over a cliff?

              Comment

              • John Machin

                #8
                Re: Newbie code review of parsing program Please

                On Nov 17, 7:11 pm, Lawrence D'Oliveiro <l...@geek-
                central.gen.new _zealandwrote:
                Mark Tolonen wrote:
                Point taken...or I could top post ;^)
                >
                A: A Rolls seats six.
                Q: What's the saddest thing about seeing a Rolls with five top-posters init going over a cliff?
                +1 but you forgot the boot & the roof rack AND if it was a really old
                one there'd be space for a few on the running boards (attached like
                the Norwegian Blue parrot)

                Comment

                • Paul McGuire

                  #9
                  Re: Newbie code review of parsing program Please

                  On Nov 16, 12:53 pm, len <lsumn...@gmail .comwrote:
                  On Nov 16, 12:40 pm, "Mark Tolonen" <M8R-yft...@mailinat or.comwrote:
                  >
                  >
                  You might want to check out the pyparsing library.
                  >
                  -Mark
                  >
                  Thanks Mark I will check in out right now.
                  >
                  Len
                  Len -

                  Here is a rough pyparsing starter for your problem:

                  from pyparsing import *

                  COMP = Optional("USAGE IS") + oneOf("COMP COMPUTATIONAL")
                  PIC = oneOf("PIC PICTURE") + Optional("IS")
                  PERIOD,LPAREN,R PAREN = map(Suppress,". ()")

                  ident = Word(alphanums. upper()+"_-")
                  integer = Word(nums).setP arseAction(lamb da t:int(t[0]))
                  lineNum = Suppress(Option al(LineEnd()) + LineStart() + Word(nums))

                  rep = LPAREN + integer + RPAREN
                  repchars = "X" + rep
                  repchars.setPar seAction(lambda tokens: ['X']*tokens[1])
                  strdecl = Combine(OneOrMo re(repchars | "X"))

                  SIGN = Optional("S")
                  repdigits = "9" + rep
                  repdigits.setPa rseAction(lambd a tokens: ['9']*tokens[1])
                  intdecl = SIGN("sign") + Combine(OneOrMo re(repdigits | "9"))
                  ("intpart")
                  realdecl = SIGN("sign") + Combine(OneOrMo re(repdigits | "9"))
                  ("intpart") + "V" + \
                  Combine(OneOrMo re("9" + rep | "9"))("realpart ")

                  type = Group((strdecl | realdecl | intdecl) +
                  Optional(COMP(" COMP")))

                  fieldDecl = lineNum + "05" + ident("name") + \
                  PIC + type("type") + PERIOD
                  structDecl = lineNum + "01" + ident("name") + PERIOD + \
                  OneOrMore(Group (fieldDecl))("f ields")

                  It prints out:

                  SALESMEN-RECORD
                  SALESMEN-NO ['999']
                  SALESMEN-NAME ['XXXXXXXXXXXXXX XXXXXXXXXXXXXXX X']
                  SALESMEN-TERRITORY ['XXXXXXXXXXXXXX XXXXXXXXXXXXXXX X']
                  SALESMEN-QUOTA ['S', '9999999', 'COMP']
                  SALESMEN-1ST-BONUS ['S', '99999', 'V', '99', 'COMP']
                  SALESMEN-2ND-BONUS ['S', '99999', 'V', '99', 'COMP']
                  SALESMEN-3RD-BONUS ['S', '99999', 'V', '99', 'COMP']
                  SALESMEN-4TH-BONUS ['S', '99999', 'V', '99', 'COMP']

                  I too have some dim, dark, memories of COBOL. I seem to recall having
                  to infer from the number of digits in an integer or real what size the
                  number would be. I don't have that logic implemented, but here is an
                  extension to the above program, which shows you where you could put
                  this kind of type inference logic (insert this code before the call to
                  searchString):

                  class TypeDefn(object ):
                  @staticmethod
                  def intType(tokens) :
                  self = TypeDefn()
                  self.str = "int(%d)" % (len(tokens.int part),)
                  self.isSigned = bool(tokens.sig n)
                  return self
                  @staticmethod
                  def realType(tokens ):
                  self = TypeDefn()
                  self.str = "real(%d.%d )" % (len(tokens.int part),len
                  (tokens.realpar t))
                  self.isSigned = bool(tokens.sig n)
                  return self
                  @staticmethod
                  def charType(tokens ):
                  self = TypeDefn()
                  self.str = "char(%d)" % len(tokens)
                  self.isSigned = False
                  self.isComp = False
                  return self
                  def __repr__(self):
                  return ("+-" if self.isSigned else "") + self.str
                  intdecl.setPars eAction(TypeDef n.intType)
                  realdecl.setPar seAction(TypeDe fn.realType)
                  strdecl.setPars eAction(TypeDef n.charType)

                  This prints:

                  SALESMEN-RECORD
                  SALESMEN-NO [int(3)]
                  SALESMEN-NAME [char(1)]
                  SALESMEN-TERRITORY [char(1)]
                  SALESMEN-QUOTA [+-int(7), 'COMP']
                  SALESMEN-1ST-BONUS [+-real(5.2), 'COMP']
                  SALESMEN-2ND-BONUS [+-real(5.2), 'COMP']
                  SALESMEN-3RD-BONUS [+-real(5.2), 'COMP']
                  SALESMEN-4TH-BONUS [+-real(5.2), 'COMP']

                  You can post more questions about pyparsing on the Discussion tab of
                  the pyparsing wiki home page.

                  Best of luck!
                  -- Paul

                  Comment

                  • len

                    #10
                    Re: Newbie code review of parsing program Please

                    On Nov 16, 9:57 pm, Lawrence D'Oliveiro <l...@geek-
                    central.gen.new _zealandwrote:
                    len wrote:
                        if fieldline.count ('COMP.') 0:
                    >
                    I take it you're only handling a particular subset of COBOL constructs: thus, "COMP" is never "COMPUTATIO NAL" or "USAGE IS COMPUTATIONAL", and it always occurs just before the full-stop (can't remember enough COBOL syntax to  be sure if anything else can go afterwards).
                    >
                        elif newline[0].isdigit() and 'PIC' in newline:
                    >
                    Similarly, "PIC" is never "PICTURE" or "PICTURE IS".
                    >
                    Aargh, I think I have to stop. I'm remembering more than I ever wanted toabout COBOL. Must ... rip ... brain ... out ...
                    Most of the cobol code originally comes from packages and is
                    relatively consistant.

                    Thanks
                    Len

                    Comment

                    • len

                      #11
                      Re: Newbie code review of parsing program Please

                      Thanks Paul

                      I will be going over your code today. I started looking at Pyparsing
                      last night
                      and it just got to late and my brain started to fog over. I would
                      really like
                      to thank you for taking the time to provide me with the code sample
                      I'm sure it
                      will really help. Again thank you very much.

                      Len

                      On Nov 17, 8:01 am, Paul McGuire <pt...@austin.r r.comwrote:
                      On Nov 16, 12:53 pm, len <lsumn...@gmail .comwrote:
                      >
                      On Nov 16, 12:40 pm, "Mark Tolonen" <M8R-yft...@mailinat or.comwrote:
                      >
                      You might want to check out the pyparsing library.
                      >
                      -Mark
                      >
                      Thanks Mark I will check in out right now.
                      >
                      Len
                      >
                      Len -
                      >
                      Here is a rough pyparsing starter for your problem:
                      >
                      from pyparsing import *
                      >
                      COMP = Optional("USAGE IS") + oneOf("COMP COMPUTATIONAL")
                      PIC = oneOf("PIC PICTURE") + Optional("IS")
                      PERIOD,LPAREN,R PAREN = map(Suppress,". ()")
                      >
                      ident = Word(alphanums. upper()+"_-")
                      integer = Word(nums).setP arseAction(lamb da t:int(t[0]))
                      lineNum = Suppress(Option al(LineEnd()) + LineStart() + Word(nums))
                      >
                      rep = LPAREN + integer + RPAREN
                      repchars = "X" + rep
                      repchars.setPar seAction(lambda tokens: ['X']*tokens[1])
                      strdecl = Combine(OneOrMo re(repchars | "X"))
                      >
                      SIGN = Optional("S")
                      repdigits = "9" + rep
                      repdigits.setPa rseAction(lambd a tokens: ['9']*tokens[1])
                      intdecl = SIGN("sign") + Combine(OneOrMo re(repdigits | "9"))
                      ("intpart")
                      realdecl = SIGN("sign") + Combine(OneOrMo re(repdigits | "9"))
                      ("intpart") + "V" + \
                                      Combine(OneOrMo re("9" + rep | "9"))("realpart ")
                      >
                      type = Group((strdecl | realdecl | intdecl) +
                                      Optional(COMP(" COMP")))
                      >
                      fieldDecl = lineNum + "05" + ident("name") + \
                                      PIC + type("type") + PERIOD
                      structDecl = lineNum + "01" + ident("name") + PERIOD + \
                                      OneOrMore(Group (fieldDecl))("f ields")
                      >
                      It prints out:
                      >
                      SALESMEN-RECORD
                         SALESMEN-NO ['999']
                         SALESMEN-NAME ['XXXXXXXXXXXXXX XXXXXXXXXXXXXXX X']
                         SALESMEN-TERRITORY ['XXXXXXXXXXXXXX XXXXXXXXXXXXXXX X']
                         SALESMEN-QUOTA ['S', '9999999', 'COMP']
                         SALESMEN-1ST-BONUS ['S', '99999', 'V', '99', 'COMP']
                         SALESMEN-2ND-BONUS ['S', '99999', 'V', '99', 'COMP']
                         SALESMEN-3RD-BONUS ['S', '99999', 'V', '99', 'COMP']
                         SALESMEN-4TH-BONUS ['S', '99999', 'V', '99', 'COMP']
                      >
                      I too have some dim, dark, memories of COBOL.  I seem to recall having
                      to infer from the number of digits in an integer or real what size the
                      number would be.  I don't have that logic implemented, but here is an
                      extension to the above program, which shows you where you could put
                      this kind of type inference logic (insert this code before the call to
                      searchString):
                      >
                      class TypeDefn(object ):
                          @staticmethod
                          def intType(tokens) :
                              self = TypeDefn()
                              self.str = "int(%d)" % (len(tokens.int part),)
                              self.isSigned = bool(tokens.sig n)
                              return self
                          @staticmethod
                          def realType(tokens ):
                              self = TypeDefn()
                              self.str = "real(%d.%d )" % (len(tokens.int part),len
                      (tokens.realpar t))
                              self.isSigned = bool(tokens.sig n)
                              return self
                          @staticmethod
                          def charType(tokens ):
                              self = TypeDefn()
                              self.str = "char(%d)" % len(tokens)
                              self.isSigned = False
                              self.isComp = False
                              return self
                          def __repr__(self):
                              return ("+-" if self.isSigned else "") + self.str
                      intdecl.setPars eAction(TypeDef n.intType)
                      realdecl.setPar seAction(TypeDe fn.realType)
                      strdecl.setPars eAction(TypeDef n.charType)
                      >
                      This prints:
                      >
                      SALESMEN-RECORD
                         SALESMEN-NO [int(3)]
                         SALESMEN-NAME [char(1)]
                         SALESMEN-TERRITORY [char(1)]
                         SALESMEN-QUOTA [+-int(7), 'COMP']
                         SALESMEN-1ST-BONUS [+-real(5.2), 'COMP']
                         SALESMEN-2ND-BONUS [+-real(5.2), 'COMP']
                         SALESMEN-3RD-BONUS [+-real(5.2), 'COMP']
                         SALESMEN-4TH-BONUS [+-real(5.2), 'COMP']
                      >
                      You can post more questions about pyparsing on the Discussion tab of
                      the pyparsing wiki home page.
                      >
                      Best of luck!
                      -- Paul

                      Comment

                      Working...