Parser Generator?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jack

    Parser Generator?

    Hi all, I need to do syntax parsing of simple naturual languages,
    for example, "weather of London" or "what is the time", simple
    things like these, with Unicode support in the syntax.

    In Java, there are JavaCC, Antlr, etc. I wonder what people use
    in Python? Antlr also has Python support but I'm not sure how good
    it is. Comments/hints are welcome.


  • Diez B. Roggisch

    #2
    Re: Parser Generator?

    Jack schrieb:
    Hi all, I need to do syntax parsing of simple naturual languages,
    for example, "weather of London" or "what is the time", simple
    things like these, with Unicode support in the syntax.
    >
    In Java, there are JavaCC, Antlr, etc. I wonder what people use
    in Python? Antlr also has Python support but I'm not sure how good
    it is. Comments/hints are welcome.
    There are several options. I personally like spark.py, the most common
    answer is pyparsing, and don't forget to check out NLTK, the natural
    language toolkit.

    Diez

    Comment

    • beginner

      #3
      Re: Parser Generator?

      On Aug 18, 5:22 pm, "Jack" <nos...@invalid .comwrote:
      Hi all, I need to do syntax parsing of simple naturual languages,
      for example, "weather of London" or "what is the time", simple
      things like these, with Unicode support in the syntax.
      >
      In Java, there are JavaCC, Antlr, etc. I wonder what people use
      in Python? Antlr also has Python support but I'm not sure how good
      it is. Comments/hints are welcome.
      Antlr seems to be able to generate python code, too.

      Comment

      • Tommy Nordgren

        #4
        Re: Parser Generator?


        On 19 aug 2007, at 00.22, Jack wrote:
        Hi all, I need to do syntax parsing of simple naturual languages,
        for example, "weather of London" or "what is the time", simple
        things like these, with Unicode support in the syntax.
        >
        In Java, there are JavaCC, Antlr, etc. I wonder what people use
        in Python? Antlr also has Python support but I'm not sure how good
        it is. Comments/hints are welcome.
        >
        >
        --
        http://mail.python.org/mailman/listinfo/python-list
        Antlr can generate Python code.
        However, I don't think a parser generator is suitable for generating
        natural language parsers.
        They are intended to generate code for computer language parsers.
        However, for examples on parsing imperative English sentences, I
        suggest taking a look
        at the class library for TADS 3 (Text Adventure Development System)
        <http://www.tads.org>
        The lanuge has a syntax reminding of c++ and Java.
        -----------------------------------------------------
        An astronomer to a colleague:
        -I can't understsnad how you can go to the brothel as often as you
        do. Not only is it a filthy habit, but it must cost a lot of money too.
        -Thats no problem. I've got a big government grant for the study of
        black holes.
        Tommy Nordgren
        tommy.nordgren@ comhem.se



        Comment

        • Jack

          #5
          Re: Parser Generator?

          Thanks for all the replies!

          SPARK looks promising. Its doc doesn't say if it handles unicode
          (CJK in particular) encoding though.

          Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/

          There's also PyGgy http://lava.net/~newsham/pyggy/

          I may also give Antlr a try.

          If anyone has experiences using any of the parser generators with CJK
          languages, I'd be very interested in hearing that.

          Jack


          "Jack" <nospam@invalid .comwrote in message
          news:abKdnVoQMu 2o7FrbnZ2dnUVZ_ gqdnZ2d@comcast .com...
          Hi all, I need to do syntax parsing of simple naturual languages,
          for example, "weather of London" or "what is the time", simple
          things like these, with Unicode support in the syntax.
          >
          In Java, there are JavaCC, Antlr, etc. I wonder what people use
          in Python? Antlr also has Python support but I'm not sure how good
          it is. Comments/hints are welcome.
          >

          Comment

          • samwyse

            #6
            Re: Parser Generator?

            Jack wrote:
            Thanks for all the replies!
            >
            SPARK looks promising. Its doc doesn't say if it handles unicode
            (CJK in particular) encoding though.
            >
            Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/
            >
            There's also PyGgy http://lava.net/~newsham/pyggy/
            >
            I may also give Antlr a try.
            >
            If anyone has experiences using any of the parser generators with CJK
            languages, I'd be very interested in hearing that.
            I'm going to echo Tommy's reply. If you want to parse natural language,
            conventional parsers are going to be worse than useless (because you'll
            keep thinking, "Just one more tweak and this time it'll work for
            sure!"). Instead, go look at what the interactive fiction community
            uses. They analyse the statement in multiple passes, first picking out
            the verbs, then the noun phrases. Some of their parsers can do
            on-the-fly domain-specific spelling correction, etc, and all of them can
            ask the user for clarification. (I'm currently cobbling together
            something similar for pre-teen users.)

            Comment

            • Jack

              #7
              Re: Parser Generator?

              Thanks for the suggestion. I understand that more work is needed for natural
              language
              understanding. What I want to do is actually very simple - I pre-screen the
              user
              typed text. If it's a simple syntax my code understands, like, Weather in
              London, I'll
              redirect it to a weather site. Or, if it's "What is ... " I'll probably
              redirect it to wikipedia.
              Otherwise, I'll throw it to a search engine. So, extremelyl simple stuff ...

              "samwyse" <dejanews@email .comwrote in message
              news:xHWxi.1073 $vU4.633@nlpi06 8.nbdc.sbc.com. ..
              Jack wrote:
              >Thanks for all the replies!
              >>
              >SPARK looks promising. Its doc doesn't say if it handles unicode
              >(CJK in particular) encoding though.
              >>
              >Yapps also looks powerful: http://theory.stanford.edu/~amitp/yapps/
              >>
              >There's also PyGgy http://lava.net/~newsham/pyggy/
              >>
              >I may also give Antlr a try.
              >>
              >If anyone has experiences using any of the parser generators with CJK
              >languages, I'd be very interested in hearing that.
              >
              I'm going to echo Tommy's reply. If you want to parse natural language,
              conventional parsers are going to be worse than useless (because you'll
              keep thinking, "Just one more tweak and this time it'll work for sure!").
              Instead, go look at what the interactive fiction community uses. They
              analyse the statement in multiple passes, first picking out the verbs,
              then the noun phrases. Some of their parsers can do on-the-fly
              domain-specific spelling correction, etc, and all of them can ask the user
              for clarification. (I'm currently cobbling together something similar for
              pre-teen users.)

              Comment

              • Alex Martelli

                #8
                Re: Parser Generator?

                Jack <nospam@invalid .comwrote:
                Thanks for the suggestion. I understand that more work is needed for natural
                language
                understanding. What I want to do is actually very simple - I pre-screen the
                user
                typed text. If it's a simple syntax my code understands, like, Weather in
                London, I'll
                redirect it to a weather site. Or, if it's "What is ... " I'll probably
                redirect it to wikipedia.
                Otherwise, I'll throw it to a search engine. So, extremelyl simple stuff ...
                <http://nltk.sourceforg e.net/index.php/Main_Page>

                """
                NLTK — the Natural Language Toolkit — is a suite of open source Python
                modules, data sets and tutorials supporting research and development in
                natural language processing.
                """


                Alex

                Comment

                • Jack

                  #9
                  Re: Parser Generator?

                  Very interesting work. Thanks for the link!

                  "Alex Martelli" <aleax@mac.comw rote in message
                  news:1i33g9h.1i 48q26dmp7l5N%al eax@mac.com...
                  <http://nltk.sourceforg e.net/index.php/Main_Page>
                  >
                  """
                  NLTK ¡ª the Natural Language Toolkit ¡ª is a suite of open source Python
                  modules, data sets and tutorials supporting research and development in
                  natural language processing.
                  """
                  >
                  >
                  Alex

                  Comment

                  • Jason Evans

                    #10
                    Re: Parser Generator?

                    On Aug 18, 3:22 pm, "Jack" <nos...@invalid .comwrote:
                    Hi all, I need to do syntax parsing of simple naturual languages,
                    for example, "weather of London" or "what is the time", simple
                    things like these, with Unicode support in the syntax.
                    >
                    In Java, there are JavaCC, Antlr, etc. I wonder what people use
                    in Python? Antlr also has Python support but I'm not sure how good
                    it is. Comments/hints are welcome.
                    I use Parsing.py. I like it a lot, probably because I wrote it.



                    Jason

                    Comment

                    • Jack

                      #11
                      Re: Parser Generator?

                      Thanks Jason. Does Parsing.py support Unicode characters (especially CJK)?
                      I'll take a look.

                      "Jason Evans" <joevans@gmail. comwrote in message
                      news:1187834836 .115735.111160@ e9g2000prf.goog legroups.com...
                      On Aug 18, 3:22 pm, "Jack" <nos...@invalid .comwrote:
                      >Hi all, I need to do syntax parsing of simple naturual languages,
                      >for example, "weather of London" or "what is the time", simple
                      >things like these, with Unicode support in the syntax.
                      >>
                      >In Java, there are JavaCC, Antlr, etc. I wonder what people use
                      >in Python? Antlr also has Python support but I'm not sure how good
                      >it is. Comments/hints are welcome.
                      >
                      I use Parsing.py. I like it a lot, probably because I wrote it.
                      >

                      >
                      Jason
                      >

                      Comment

                      • Paul McGuire

                        #12
                        Re: Parser Generator?

                        On Aug 18, 11:37 pm, "Jack" <nos...@invalid .comwrote:
                        Thanks for all the replies!
                        >
                        SPARK looks promising. Its doc doesn't say if it handles unicode
                        (CJK in particular) encoding though.
                        >
                        Yapps also looks powerful:http://theory.stanford.edu/~amitp/yapps/
                        >
                        There's also PyGgyhttp://lava.net/~newsham/pyggy/
                        >
                        I may also give Antlr a try.
                        >
                        If anyone has experiences using any of the parser generators with CJK
                        languages, I'd be very interested in hearing that.
                        >
                        Jack
                        >
                        "Jack" <nos...@invalid .comwrote in message
                        >
                        news:abKdnVoQMu 2o7FrbnZ2dnUVZ_ gqdnZ2d@comcast .com...
                        >
                        >
                        >
                        Hi all, I need to do syntax parsing of simple naturual languages,
                        for example, "weather of London" or "what is the time", simple
                        things like these, with Unicode support in the syntax.
                        >
                        In Java, there are JavaCC, Antlr, etc. I wonder what people use
                        in Python? Antlr also has Python support but I'm not sure how good
                        it is. Comments/hints are welcome.- Hide quoted text -
                        >
                        - Show quoted text -
                        Jack -

                        Pyparsing was already mentioned once on this thread. Here is an
                        application using pyparsing that parses Chinese characters to convert
                        to English Python.



                        -- Paul

                        Comment

                        • Jason Evans

                          #13
                          Re: Parser Generator?

                          On Aug 24, 1:21 pm, "Jack" <nos...@invalid .comwrote:
                          "Jason Evans" <joev...@gmail. comwrote in message>
                          Thanks Jason. Does Parsing.py support Unicode characters (especially CJK)?
                          I'll take a look.
                          Parsers typically deal with tokens rather than individual characters,
                          so the scanner that creates the tokens is the main thing that Unicode
                          matters to. I have written Unicode-aware scanners for use with
                          Parsing-based parsers, with no problems. This is pretty easy to do,
                          since Python has built-in support for Unicode strings.

                          Jason

                          Comment

                          • Jack

                            #14
                            Re: Parser Generator?

                            Good to know, thanks Paul.
                            !
                            "Paul McGuire" <ptmcg@austin.r r.comwrote in message
                            Pyparsing was already mentioned once on this thread. Here is an
                            application using pyparsing that parses Chinese characters to convert
                            to English Python.
                            >

                            >
                            -- Paul

                            Comment

                            • Jack

                              #15
                              Re: Parser Generator?

                              Thanks Json. There seem to be a few options that I can pursue. Having a hard
                              time
                              chooing one now :)

                              "Jason Evans" <joevans@gmail. comwrote in message
                              news:1188159150 .755231.239390@ o80g2000hse.goo glegroups.com.. .
                              On Aug 24, 1:21 pm, "Jack" <nos...@invalid .comwrote:
                              >"Jason Evans" <joev...@gmail. comwrote in message>>
                              >Thanks Jason. Does Parsing.py support Unicode characters (especially
                              >CJK)?
                              >I'll take a look.
                              >
                              Parsers typically deal with tokens rather than individual characters,
                              so the scanner that creates the tokens is the main thing that Unicode
                              matters to. I have written Unicode-aware scanners for use with
                              Parsing-based parsers, with no problems. This is pretty easy to do,
                              since Python has built-in support for Unicode strings.
                              >
                              Jason
                              >

                              Comment

                              Working...