Pyparsing Question

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ant

    Pyparsing Question

    Hi all,

    I have a question on PyParsing. I am trying to create a parser for a
    hierarchical todo list format, but have hit a stumbling block. I have
    parsers for the header of the list (title and description), and the body
    (recursive descent on todo items).

    Individually they are working fine, combined they throw an exception.
    The code follows:

    #!/usr/bin/python
    # parser.py
    import pyparsing as pp

    def grammar():
    underline = pp.Word("=").su ppress()
    dotnum = pp.Combine(pp.W ord(pp.nums) + ".")
    textline = pp.Combine(pp.G roup(pp.Word(pp .alphas, pp.printables) +
    pp.restOfLine))
    number = pp.Group(pp.One OrMore(dotnum))

    headtitle = textline
    headdescription = pp.ZeroOrMore(t extline)
    head = pp.Group(headti tle + underline + headdescription )

    taskname = pp.OneOrMore(do tnum) + textline
    task = pp.Forward()
    subtask = pp.Group(dotnum + task)
    task << (taskname + pp.ZeroOrMore(s ubtask))
    maintask = pp.Group(pp.Lin eStart() + task)

    parser = pp.OneOrMore(ma intask)

    return head, parser

    text = """


    My Title
    ========

    Text on a longer line of several words.
    More test
    and more.

    """

    text2 = """

    1. Task 1
    1.1. Subtask
    1.1.1. More tasks.
    1.2. Another subtask
    2. Task 2
    2.1. Subtask again"""

    head, parser = grammar()

    print head.parseStrin g(text)
    print parser.parseStr ing(text2)

    comb = head + pp.OneOrMore(pp .LineStart() + pp.restOfLine) + parser
    print comb.parseStrin g(text + text2)

    #============== =============== =============== =============== ========

    Now the first two print statements output the parse tree as I would
    expect, but the combined parser fails with an exception:

    Traceback (most recent call last):
    File "parser.py" , line 50, in ?
    print comb.parseStrin g(text + text2)
    ..
    .. [Stacktrace snipped]
    ..
    raise exc
    pyparsing.Parse Exception: Expected start of line (at char 81), (line:9,
    col:1)

    Any help appreciated!

    Cheers,

    --
    Ant.
  • Paul McGuire

    #2
    Re: Pyparsing Question

    On May 16, 6:43 am, Ant <ant...@gmail.c omwrote:
    Hi all,
    >
    I have a question on PyParsing. I am trying to create a parser for a
    hierarchical todo list format, but have hit a stumbling block. I have
    parsers for the header of the list (title and description), and the body
    (recursive descent on todo items).
    >
    LineStart *really* wants to be parsed at the beginning of a line.
    Your textline reads up to but not including the LineEnd. Try making
    these changes.

    1. Change textline to:

    textline = pp.Combine(
    pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
    \
    pp.LineEnd().su ppress()

    2. Change comb to:

    comb = head + parser

    With these changes, my version of your code runs ok.

    -- Paul

    Comment

    • castironpi

      #3
      Re: Pyparsing Question

      On May 16, 6:43 am, Ant <ant...@gmail.c omwrote:
      Hi all,
      >
      I have a question on PyParsing. I am trying to create a parser for a
      hierarchical todo list format, but have hit a stumbling block. I have
      parsers for the header of the list (title and description), and the body
      (recursive descent on todo items).
      >
      Individually they are working fine, combined they throw an exception.
      The code follows:
      >
      #!/usr/bin/python
      # parser.py
      import pyparsing as pp
      >
      def grammar():
           underline = pp.Word("=").su ppress()
           dotnum = pp.Combine(pp.W ord(pp.nums) + ".")
           textline = pp.Combine(pp.G roup(pp.Word(pp .alphas, pp.printables) +
      pp.restOfLine))
           number = pp.Group(pp.One OrMore(dotnum))
      >
           headtitle = textline
           headdescription = pp.ZeroOrMore(t extline)
           head = pp.Group(headti tle + underline + headdescription )
      >
           taskname = pp.OneOrMore(do tnum) + textline
           task = pp.Forward()
           subtask = pp.Group(dotnum + task)
           task << (taskname + pp.ZeroOrMore(s ubtask))
           maintask = pp.Group(pp.Lin eStart() + task)
      >
           parser = pp.OneOrMore(ma intask)
      >
           return head, parser
      >
      text = """
      >
      My Title
      ========
      >
      Text on a longer line of several words.
      More test
      and more.
      >
      """
      >
      text2 = """
      >
      1. Task 1
           1.1. Subtask
               1.1.1. More tasks.
           1.2. Another subtask
      2. Task 2
           2.1. Subtask again"""
      >
      head, parser = grammar()
      >
      print head.parseStrin g(text)
      print parser.parseStr ing(text2)
      >
      comb = head + pp.OneOrMore(pp .LineStart() + pp.restOfLine) + parser
      print comb.parseStrin g(text + text2)
      >
      #============== =============== =============== =============== ========
      >
      Now the first two print statements output the parse tree as I would
      expect, but the combined parser fails with an exception:
      >
      Traceback (most recent call last):
         File "parser.py" , line 50, in ?
           print comb.parseStrin g(text + text2)
      .
      . [Stacktrace snipped]
      .
           raise exc
      pyparsing.Parse Exception: Expected start of line (at char 81), (line:9,
      col:1)
      >
      Any help appreciated!
      >
      Cheers,
      >
      --
      Ant.
      I hold that the + operator should be overloaded for strings to include
      newlines. Python 3.0 print has parentheses around it; wouldn't it
      make sense to take them out?

      Comment

      • Ant

        #4
        Re: Pyparsing Question

        Hi Paul,
        LineStart *really* wants to be parsed at the beginning of a line.
        Your textline reads up to but not including the LineEnd. Try making
        these changes.
        >
        1. Change textline to:
        >
        textline = pp.Combine(
        pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
        \
        pp.LineEnd().su ppress()
        Ah - so restOfLine excludes the actual line ending does it?
        2. Change comb to:
        >
        comb = head + parser
        Yes - I'd got this originally. I added the garbage to try to fix the
        problem and forgot to take it back out! Thanks for the advice - it works
        fine now, and will provide a base for extending the list format.

        Thanks,

        Ant...

        Comment

        • castironpi

          #5
          Re: Pyparsing Question

          On May 16, 10:45 am, Ant <ant...@gmail.c omwrote:
          Hi Paul,
          >
          LineStart *really* wants to be parsed at the beginning of a line.
          Your textline reads up to but not including the LineEnd.  Try making
          these changes.
          >
          1. Change textline to:
          >
               textline = pp.Combine(
                  pp.Group(pp.Wor d(pp.alphas, pp.printables) + pp.restOfLine)) +
          \
                  pp.LineEnd().su ppress()
          >
          Ah - so restOfLine excludes the actual line ending does it?
          >
          2. Change comb to:
          >
              comb = head + parser
          >
          Yes - I'd got this originally. I added the garbage to try to fix the
          problem and forgot to take it back out! Thanks for the advice - it works
            fine now, and will provide a base for extending the list format.
          >
          Thanks,
          >
          Ant...
          There is a possibility that spirals can come from doubles, which could
          be non-trivially useful, in par. in the Java library. I won't see a
          cent. Can anyone start a thread to spin letters, and see what the
          team looks like? I want to animate spinners. It's across
          dimensions. (per something.) Swipe a cross in a fluid. I'm draw
          crosses. Animate cubes to draw crosses. I.e. swipe them.

          Comment

          • James A. Donald

            #6
            scaling problems

            I am just getting into python, and know little about it, and am
            posting to ask on what beaches the salt water crocodiles hang out.

            1. Looks to me that python will not scale to very large programs,
            partly because of the lack of static typing, but mostly because there
            is no distinction between creating a new variable and utilizing an
            existing variable, so the interpreter fails to catch typos and name
            collisions. I am inclined to suspect that when a successful small
            python program turns into a large python program, it rapidly reaches
            ninety percent complete, and remains ninety percent complete forever.

            2. It is not clear to me how a python web application scales. Python
            is inherently single threaded, so one will need lots of python
            processes on lots of computers, with the database software handling
            parallel accesses to the same or related data. One could organize it
            as one python program for each url, and one python process for each
            http request, but that involves a lot of overhead starting up and
            shutting down python processes. Or one could organize it as one
            python program for each url, but if one gets a lot of http requests
            for one url, a small number of python processes will each sequentially
            handle a large number of those requests. What I am really asking is:
            Are there python web frameworks that scale with hardware and how do
            they handle scaling?

            Please don't read this as "Python sucks, everyone should program in
            machine language expressed as binary numbers". I am just asking where
            the problems are.
            --
            ----------------------
            We have the right to defend ourselves and our property, because
            of the kind of animals that we are. True law derives from this
            right, not from the arbitrary power of the omnipotent state.

            http://www.jim.com/ James A. Donald

            Comment

            • Reid Priedhorsky

              #7
              Re: scaling problems

              On Tue, 20 May 2008 10:47:50 +1000, James A. Donald wrote:
              >
              1. Looks to me that python will not scale to very large programs,
              partly because of the lack of static typing, but mostly because there
              is no distinction between creating a new variable and utilizing an
              existing variable, so the interpreter fails to catch typos and name
              collisions. I am inclined to suspect that when a successful small
              python program turns into a large python program, it rapidly reaches
              ninety percent complete, and remains ninety percent complete forever.
              I find this frustrating too, but not to the extent that I choose a
              different language. pylint helps but it's not as good as a nice, strict
              compiler.
              2. It is not clear to me how a python web application scales. Python
              is inherently single threaded, so one will need lots of python
              processes on lots of computers, with the database software handling
              parallel accesses to the same or related data. One could organize it
              as one python program for each url, and one python process for each
              http request, but that involves a lot of overhead starting up and
              shutting down python processes. Or one could organize it as one
              python program for each url, but if one gets a lot of http requests
              for one url, a small number of python processes will each sequentially
              handle a large number of those requests. What I am really asking is:
              Are there python web frameworks that scale with hardware and how do
              they handle scaling?
              This sounds like a good match for Apache with mod_python.

              Reid

              Comment

              • David Stanek

                #8
                Re: scaling problems

                On Mon, May 19, 2008 at 8:47 PM, James A. Donald <jamesd@echeque .comwrote:
                I am just getting into python, and know little about it, and am
                posting to ask on what beaches the salt water crocodiles hang out.
                >
                1. Looks to me that python will not scale to very large programs,
                partly because of the lack of static typing, but mostly because there
                is no distinction between creating a new variable and utilizing an
                existing variable, so the interpreter fails to catch typos and name
                collisions. I am inclined to suspect that when a successful small
                python program turns into a large python program, it rapidly reaches
                ninety percent complete, and remains ninety percent complete forever.
                I can assure you that in practice this is not a problem. If you do
                proper unit testing then you will catch many, if not all, of the
                errors that static typing catches. There are also tools like PyLint,
                PyFlakes and pep8.py will also catch many of those mistakes.

                2. It is not clear to me how a python web application scales. Python
                is inherently single threaded, so one will need lots of python
                processes on lots of computers, with the database software handling
                parallel accesses to the same or related data. One could organize it
                as one python program for each url, and one python process for each
                http request, but that involves a lot of overhead starting up and
                shutting down python processes. Or one could organize it as one
                python program for each url, but if one gets a lot of http requests
                for one url, a small number of python processes will each sequentially
                handle a large number of those requests. What I am really asking is:
                Are there python web frameworks that scale with hardware and how do
                they handle scaling?
                What is the difference if you have a process with 10 threads or 10
                separate processes running in parallel? Apache is a good example of a
                server that may be configured to use multiple processes to handle
                requests. And from what I hear is scales just fine.

                I think you are looking at the problem wrong. The fundamentals are the
                same between threads and processes. You simply have a pool of workers
                that handle requests. Any process is capable of handling any request.
                The key to scalability is that the processes are persistent and not
                forked for each request.

                Please don't read this as "Python sucks, everyone should program in
                machine language expressed as binary numbers". I am just asking where
                the problems are.
                The only real problem I have had with process pools is that sharing
                resources is harder. It is harder to create things like connection
                pools.


                --
                David

                Comment

                • Ben Finney

                  #9
                  Re: scaling problems

                  James A. Donald <jamesd@echeque .comwrites:
                  I am just getting into python, and know little about it
                  Welcome to Python, and this forum.
                  and am posting to ask on what beaches the salt water crocodiles hang
                  out.
                  Heh. You want to avoid them, or hang out with them? :-)
                  1. Looks to me that python will not scale to very large programs,
                  partly because of the lack of static typing, but mostly because there
                  is no distinction between creating a new variable and utilizing an
                  existing variable,
                  This seems quite a non sequitur. How do you see a connection between
                  these properties and "will not scale to large programs"?
                  so the interpreter fails to catch typos and name collisions.
                  These errors are a small subset of possible errors. If writing a large
                  program, an automated testing suite is essential, and can catch far
                  more errors than the compiler can hope to catch. If you run a static
                  code analyser, you'll be notified of unused names and other simple
                  errors that are often caught by static-declaration compilers.
                  I am inclined to suspect that when a successful small python program
                  turns into a large python program, it rapidly reaches ninety percent
                  complete, and remains ninety percent complete forever.
                  You may want to look at the Python success stories before suspecting
                  that, <URL:http://www.python.org/about/success/>.
                  2. It is not clear to me how a python web application scales.
                  I'll leave this one for others to speak to; I don't have experience
                  with large web applications.

                  --
                  \ "I was gratified to be able to answer promptly and I did. I |
                  `\ said I didn't know." -- Mark Twain, _Life on the Mississippi_ |
                  _o__) |
                  Ben Finney

                  Comment

                  • Carl Banks

                    #10
                    Re: scaling problems

                    On May 19, 8:47 pm, James A. Donald <jam...@echeque .comwrote:
                    1. Looks to me that python will not scale to very large programs,
                    partly because of the lack of static typing, but mostly because there
                    is no distinction between creating a new variable and utilizing an
                    existing variable, so the interpreter fails to catch typos and name
                    collisions.
                    This factor is scale-neutral. You can expect the number of such bugs
                    to be proportional to the lines of code.

                    It might not scale up well if you engage in poor programming practives
                    (for example, importing lots of unqualified globals with tiny,
                    undescriptive names directly into every module's namespace), but if
                    you do that you have worse problems than accidental name collisions.

                    I am inclined to suspect that when a successful small
                    python program turns into a large python program, it rapidly reaches
                    ninety percent complete, and remains ninety percent complete forever.
                    Unlike most C++/Java/VB/Whatever programs which finish and ship, and
                    are never patched or improved or worked on ever again?

                    2. It is not clear to me how a python web application scales. Python
                    is inherently single threaded,
                    No it isn't.

                    It has some limitations in threading, but many programs make good use
                    of threads nonetheless. In fact for something like a web app Python's
                    threading limitations are relatively unimportant, since they tend to
                    be I/O-bound under heavy load.

                    [snip rest]


                    Carl Banks

                    Comment

                    • James A. Donald

                      #11
                      Re: scaling problems

                      1. Looks to me that python will not scale to very large programs,
                      partly because of the lack of static typing, but mostly because there
                      is no distinction between creating a new variable and utilizing an
                      existing variable,
                      Ben Finney
                      This seems quite a non sequitur. How do you see a connection between
                      these properties and "will not scale to large programs"?
                      The larger the program, the greater the likelihood of inadvertent name
                      collisions creating rare and irreproducible interactions between
                      different and supposedly independent parts of the program that each
                      work fine on their own, and supposedly cannot possibly interact.
                      These errors are a small subset of possible errors. If writing a large
                      program, an automated testing suite is essential, and can catch far
                      more errors than the compiler can hope to catch. If you run a static
                      code analyser, you'll be notified of unused names and other simple
                      errors that are often caught by static-declaration compilers.
                      That is handy, but the larger the program, the bigger the problem with
                      names that are over used, rather than unused.

                      --
                      ----------------------
                      We have the right to defend ourselves and our property, because
                      of the kind of animals that we are. True law derives from this
                      right, not from the arbitrary power of the omnipotent state.

                      http://www.jim.com/ James A. Donald

                      Comment

                      • James A. Donald

                        #12
                        Re: scaling problems

                        2. It is not clear to me how a python web application scales. Python
                        is inherently single threaded, so one will need lots of python
                        processes on lots of computers, with the database software handling
                        parallel accesses to the same or related data. One could organize it
                        as one python program for each url, and one python process for each
                        http request, but that involves a lot of overhead starting up and
                        shutting down python processes. Or one could organize it as one
                        python program for each url, but if one gets a lot of http requests
                        for one url, a small number of python processes will each sequentially
                        handle a large number of those requests. What I am really asking is:
                        Are there python web frameworks that scale with hardware and how do
                        they handle scaling?
                        Reid Priedhorsky
                        This sounds like a good match for Apache with mod_python.
                        I would hope that it is, but the question that I would like to know is
                        how does mod_python handle the problem - how do python programs and
                        processes relate to web pages and http requests when one is using
                        mod_python, and what happens when one has quite a lot of web pages and
                        a very large number of http requests?
                        --
                        ----------------------
                        We have the right to defend ourselves and our property, because
                        of the kind of animals that we are. True law derives from this
                        right, not from the arbitrary power of the omnipotent state.

                        http://www.jim.com/ James A. Donald

                        Comment

                        • James A. Donald

                          #13
                          Re: scaling problems

                          On Mon, 19 May 2008 21:04:28 -0400, "David Stanek"
                          <dstanek@dstane k.comwrote:
                          What is the difference if you have a process with 10 threads or 10
                          separate processes running in parallel? Apache is a good example of a
                          server that may be configured to use multiple processes to handle
                          requests. And from what I hear is scales just fine.
                          >
                          I think you are looking at the problem wrong. The fundamentals are the
                          same between threads and processes.
                          I am not planning to write a web server framework, but to use one.
                          Doubtless a python framework could be written to have satisfactory
                          scaling properties, but what are the scaling properties of the ones
                          that have been written?

                          --
                          ----------------------
                          We have the right to defend ourselves and our property, because
                          of the kind of animals that we are. True law derives from this
                          right, not from the arbitrary power of the omnipotent state.

                          http://www.jim.com/ James A. Donald

                          Comment

                          • Arnaud Delobelle

                            #14
                            Re: scaling problems

                            James A. Donald <jamesd@echeque .comwrites:
                            Ben Finney
                            The larger the program, the greater the likelihood of inadvertent name
                            collisions creating rare and irreproducible interactions between
                            different and supposedly independent parts of the program that each
                            work fine on their own, and supposedly cannot possibly interact.
                            >
                            >These errors are a small subset of possible errors. If writing a large
                            >program, an automated testing suite is essential, and can catch far
                            >more errors than the compiler can hope to catch. If you run a static
                            >code analyser, you'll be notified of unused names and other simple
                            >errors that are often caught by static-declaration compilers.
                            >
                            That is handy, but the larger the program, the bigger the problem with
                            names that are over used, rather than unused.
                            Fortunately for each file that you group functionality in (called a
                            'module'), Python creates a brand new namespace where it puts all the
                            names defined in that file. That makes name collision unlikely,
                            provided that you don't write gigantic modules with plenty of globals
                            in them (which would be very unnatural in Python), and don't use from
                            mymodule import * too liberally.

                            Why not download a largish project in Python (a web framework for
                            instance, since you have a particular interest in this), study the
                            code and see if your concerns seem founded?

                            Arnaud
                            --
                            ----------------------
                            We have the right to defend ourselves and our property, because
                            of the kind of animals that we are. True law derives from this
                            right, not from the arbitrary power of the omnipotent state.
                            --
                            La propriete, c'est le vol !
                            - Pierre-Joseph Proudhon

                            Comment

                            • Marc 'BlackJack' Rintsch

                              #15
                              Re: scaling problems

                              On Tue, 20 May 2008 13:57:26 +1000, James A. Donald wrote:
                              The larger the program, the greater the likelihood of inadvertent name
                              collisions creating rare and irreproducible interactions between
                              different and supposedly independent parts of the program that each
                              work fine on their own, and supposedly cannot possibly interact.
                              How should such collisions happen? You don't throw all your names into
                              the same namespace!?

                              Ciao,
                              Marc 'BlackJack' Rintsch

                              Comment

                              Working...