parser recommendation

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Filipe Fernandes

    parser recommendation

    I have a project that uses a proprietary format and I've been using
    regex to extract information from it. I haven't hit any roadblocks
    yet, but I'd like to use a parsing library rather than maintain my own
    code base of complicated regex's. I've been intrigued by the parsers
    available in python, which may add some much needed flexibility.

    I've briefly looked at PLY and pyparsing. There are several others,
    but too many to enumerate. My understanding is that PLY (although
    more difficult to use) has much more flexibility than pyparsing. I'm
    basically looking to make an informed choice. Not just for this
    project, but for the long haul. I'm not afraid of using a difficult
    (to use or learn) parser either if it buys me something like
    portability (with other languages) or flexibility).

    I've been to a few websites that enumerate the parsers, but not all
    that very helpful when it came to comparisons...




    I'm not looking to start a flame war... I'd just like some honest opinions.. ;)

    thanks,
    filipe
  • Paul McGuire

    #2
    Re: parser recommendation

    On Jun 3, 8:43 am, "Filipe Fernandes" <fernandes...@g mail.comwrote:
    >
    I've briefly looked at PLY and pyparsing.  There are several others,
    but too many to enumerate.  My understanding is that PLY (although
    more difficult to use) has much more flexibility than pyparsing.  I'm
    basically looking to make an informed choice.  Not just for this
    project, but for the long haul.  I'm not afraid of using a difficult
    (to use or learn) parser either if it buys me something like
    portability (with other languages) or flexibility).
    >
    Short answer: try them both. Learning curve on pyparsing is about a
    day, maybe two. And if you are already familiar with regex, PLY
    should not seem too much of a stretch. PLY parsers will probably be
    faster running than pyparsing parsers, but I think pyparsing parsers
    will be quicker to work up and get running.

    Longer answer: PLY is of the lex/yacc school of parsing libraries
    (PLY=Python Lex/Yacc). Use regular expressions to define terminal
    token specifications (a la lex). Then use "t_XXX" and "p_XXX" methods
    to build up the parsing logic - docstrings in these methods capture
    regex or BNF grammar definitions. In contrast, pyparsing is of the
    combinator school of parsers. Within your Python code, you compose
    your parser using '+' and '|' operations, building up the parser using
    pyparsing classes such as Literal, Word, OneOrMore, Group, etc. Also,
    pyparsing is 100% Python, so you wont have any portability issues
    (don't know about PLY).

    Here is a link to a page with a PLY and pyparsing example (although
    not strictly a side-by-side comparison): http://www.rexx.com/~dkuhlman/python_201/.
    For comparison, here is a pyparsing version of the PLY parser on that
    page (this is a recursive grammar, not necessarily a good beginner's
    example for pyparsing):
    ===============
    term = Word(alphas,alp hanums)

    func_call = Forward()
    func_call_list = Forward()
    comma = Literal(",").su ppress()
    func_call_list << Group( func_call + Optional(comma +
    func_call_list) )

    lpar = Literal("(").su ppress()
    rpar = Literal(")").su ppress()
    func_call << Group( term + lpar +
    Optional(func_c all_list,defaul t=[""]) + rpar )
    command = func_call

    prog = OneOrMore(comma nd)

    comment = "#" + restOfLine
    prog.ignore( comment )
    =============== =
    With the data set given at Dave Kuhlman's web page, here is the
    output:
    [['aaa', ['']],
    ['bbb', [['ccc', ['']]]],
    ['ddd',
    [['eee', ['']],
    [['fff', [['ggg', ['']], [['hhh', ['']], [['iii', ['']]]]]]]]]]

    Pyparsing makes some judicious assumptions about how you will want to
    parse, most significant being that whitespace can be ignored during
    parsing (this *can* be overridden in the parser definition).
    Pyparsing also supports token grouping (for building parse trees),
    parse-time callbacks (called 'parse actions'), and assigning names
    within subexpressions (called 'results names'), which really helps in
    working with the tokens returned from the parsing process.

    If you learn both, you may find that pyparsing is a good way to
    quickly prototype a particular parsing problem, which you can then
    convert to PLY for performance if necessary. The pyparsing prototype
    will be an efficient way to work out what the grammar "kinks" are, so
    that when you get around to PLY-ifying it, you already have a clear
    picture of what the parser needs to do.

    But, really, "more flexible"? I wouldn't really say that was the big
    difference between the two.

    Cheers,
    -- Paul

    (More pyparsing info at http://pyparsing.wikispaces.com.)

    Comment

    • Filipe Fernandes

      #3
      Re: parser recommendation

      On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <ptmcg@austin.r r.comwrote:
      If you learn both, you may find that pyparsing is a good way to
      quickly prototype a particular parsing problem, which you can then
      convert to PLY for performance if necessary. The pyparsing prototype
      will be an efficient way to work out what the grammar "kinks" are, so
      that when you get around to PLY-ifying it, you already have a clear
      picture of what the parser needs to do.
      >
      Thanks (both Paul and Kay) for responding. I'm still looking at Trail
      in EasyExtend and pyparsing is very nicely objected oriented but PLY
      does seems to have the speed advantage, so I'm leaning towards PLY

      But I do have more questions... when reading the ply.py header (in
      2.5) I found the following paragraph...

      # The current implementation is only somewhat object-oriented. The
      # LR parser itself is defined in terms of an object (which allows multiple
      # parsers to co-exist). However, most of the variables used during table
      # construction are defined in terms of global variables. Users shouldn't
      # notice unless they are trying to define multiple parsers at the same
      # time using threads (in which case they should have their head examined).

      Now, I'm invariably going to have to use threads... I'm not exactly
      sure what the author is alluding to, but my guess is that to overcome
      this limitation I need to acquire a thread lock first before
      "defining/creating" a parser object before I can use it?

      Has anyone ran into this issue....? This would definitely be a
      showstopper (for PLY anyway), if I couldn't create multiple parsers
      because of threads. I'm not saying I need more than one, I'm just not
      comfortable with that limitation.

      I have a feeling I'm just misunderstandin g since it doesn't seem to
      hold you back from creating multiple parsers under a single process.

      filipe

      Comment

      • Kay Schluehr

        #4
        Re: parser recommendation

        On 3 Jun., 19:34, "Filipe Fernandes" <fernandes...@g mail.comwrote:
        # The current implementation is only somewhat object-oriented. The
        # LR parser itself is defined in terms of an object (which allows multiple
        # parsers to co-exist). However, most of the variables used during table
        # construction are defined in terms of global variables. Users shouldn't
        # notice unless they are trying to define multiple parsers at the same
        # time using threads (in which case they should have their head examined).
        >
        Now, I'm invariably going to have to use threads... I'm not exactly
        sure what the author is alluding to, but my guess is that to overcome
        this limitation I need to acquire a thread lock first before
        "defining/creating" a parser object before I can use it?
        Nope. It just says that the parser-table construction itself relies on
        global state. But you will most likely build your parser offline in a
        separate run.

        Comment

        • Paul McGuire

          #5
          Re: parser recommendation

          On Jun 3, 12:34 pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:
          On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <pt...@austin.r r.comwrote:
          But I do have more questions... when reading the ply.py header (in
          2.5) I found the following paragraph...
          >
          # The current implementation is only somewhat object-oriented. The
          # LR parser itself is defined in terms of an object (which allows multiple
          # parsers to co-exist).  However, most of the variables used during table
          # construction are defined in terms of global variables.  Users shouldn't
          # notice unless they are trying to define multiple parsers at the same
          # time using threads (in which case they should have their head examined).
          >
          Now, I'm invariably going to have to use threads...  I'm not exactly
          sure what the author is alluding to, but my guess is that to overcome
          this limitation I need to acquire a thread lock first before
          "defining/creating" a parser object before I can use it?
          >
          Has anyone ran into this issue....?  This would definitely be a
          showstopper (for PLY anyway), if I couldn't create multiple parsers
          because of threads.  I'm not saying I need more than one, I'm just not
          comfortable with that limitation.
          >
          I have a feeling I'm just misunderstandin g since it doesn't seem to
          hold you back from creating multiple parsers under a single process.
          >
          filipe
          You can use pyparsing from any thread, and you can create multiple
          parsers each running in a separate thread, but you cannot concurrently
          use one parser from two different threads. Some users work around
          this by instantiating a separate parser per thread using pickle to
          quickly construct the parser at thread start time.

          -- Paul


          Comment

          • Filipe Fernandes

            #6
            Re: parser recommendation

            On Jun 3, 12:34 pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:
            >On Tue, Jun 3, 2008 at 10:41 AM, Paul McGuire <pt...@austin.r r.comwrote:
            >But I do have more questions... when reading the ply.py header (in
            >2.5) I found the following paragraph...
            >>
            ># The current implementation is only somewhat object-oriented. The
            ># LR parser itself is defined in terms of an object (which allows multiple
            ># parsers to co-exist). However, most of the variables used during table
            ># construction are defined in terms of global variables. Users shouldn't
            ># notice unless they are trying to define multiple parsers at the same
            ># time using threads (in which case they should have their head examined).
            >>
            >Now, I'm invariably going to have to use threads... I'm not exactly
            >sure what the author is alluding to, but my guess is that to overcome
            >this limitation I need to acquire a thread lock first before
            >"defining/creating" a parser object before I can use it?
            >>
            >Has anyone ran into this issue....? This would definitely be a
            >showstopper (for PLY anyway), if I couldn't create multiple parsers
            >because of threads. I'm not saying I need more than one, I'm just not
            >comfortable with that limitation.
            >>
            On Tue, Jun 3, 2008 at 1:53 PM, Kay Schluehr <kay.schluehr@g mx.netwrote:
            Nope. It just says that the parser-table construction itself relies on
            global state. But you will most likely build your parser offline in a
            separate run.
            Thanks Kay for the context.., I misunderstood completely, but your
            last sentence coupled with a few running examples, cleared things
            right up...

            On Tue, Jun 3, 2008 at 4:36 PM, Paul McGuire <ptmcg@austin.r r.comwrote:
            You can use pyparsing from any thread, and you can create multiple
            parsers each running in a separate thread, but you cannot concurrently
            use one parser from two different threads. Some users work around
            this by instantiating a separate parser per thread using pickle to
            quickly construct the parser at thread start time.
            I didn't know that pyparsing wasn't thread safe. I kind of just
            assumed because of it's OO approach. Thanks for the work around. I
            haven't given up on pyparsing, although I'm now heavily leaning
            towards PLY as an end solution since lex and yacc parsing is available
            on other platforms as well.

            Thanks Kay and Paul for the advice... I'm still using the first two I
            started looking at, but I'm much for confident in the choices made.

            filipe

            Comment

            • rurpy@yahoo.com

              #7
              Re: parser recommendation

              On Jun 3, 2:55 pm, "Filipe Fernandes" <fernandes...@g mail.comwrote:
              I haven't given up on pyparsing, although I'm now heavily leaning
              towards PLY as an end solution since lex and yacc parsing is available
              on other platforms as well.
              Keep in mind that PLY's "compatibil ity" with YACC is functional,
              not syntactical. That is, you can not take a YACC file, replace
              the actions with Python actions and feed it to PLY.

              It's a shame that the Python world has no truly YACC compatible
              parser like YAPP in the Perl world.

              Comment

              • Alan Isaac

                #8
                Re: parser recommendation

                One other possibility:
                SimpleParse (for speed).
                <URL:http://simpleparse.sou rceforge.net/>
                It is very nice.
                Alan Isaac

                Comment

                • Kay Schluehr

                  #9
                  Re: parser recommendation

                  On 6 Jun., 01:58, Alan Isaac <ais...@america n.eduwrote:
                  One other possibility:
                  SimpleParse (for speed).
                  <URL:http://simpleparse.sou rceforge.net/>
                  It is very nice.
                  Alan Isaac
                  How does SimpleParse manage left-factorings, left-recursion and other
                  ambiguities?

                  For example according to [1] there are two non-terminals

                  UNICODEESCAPEDC HAR_16

                  and

                  UNICODEESCAPEDC HAR_32

                  with an equal initial section of 4 token. How does SimpleParse detect
                  when it has to use the second production?

                  [1] http://simpleparse.sourceforge.net/s..._grammars.html

                  Comment

                  Working...