Writing a parser the right way?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • beza1e1

    Writing a parser the right way?

    I'm writing a parser for english language. This is a simple function to
    identify, what kind of sentence we have. Do you think, this class
    wrapping is right to represent the result of the function? Further
    parsing then checks isinstance(text , Declarative).

    -------------------
    class Sentence(str): pass
    class Declarative(Sen tence): pass
    class Question(Senten ce): pass
    class Command(Sentenc e): pass

    def identify_senten ce(text):
    text = text.strip()
    if text[-1] == '.':
    return Declarative(tex t)
    elif text[-1] == '!':
    return Command(text)
    elif text[-1] == '?':
    return Question(text)
    return text
    -------------------

    At first i just returned the class, then i decided to derive Sentence
    from str, so i can insert the text as well.

  • Ben Sizer

    #2
    Re: Writing a parser the right way?

    beza1e1 wrote:[color=blue]
    > I'm writing a parser for english language. This is a simple function to
    > identify, what kind of sentence we have. Do you think, this class
    > wrapping is right to represent the result of the function? Further
    > parsing then checks isinstance(text , Declarative).
    >
    > -------------------
    > class Sentence(str): pass
    > class Declarative(Sen tence): pass
    > class Question(Senten ce): pass
    > class Command(Sentenc e): pass[/color]

    As far as the parser is concerned, making these separate classes is
    unnecessary when you could just store the sentence type as a normal
    data member of Sentence. So the answer to your question is no, in my
    opinion.

    However, when you come to actually use the resulting Sentence objects,
    perhaps the behaviour is different? If you're looking to use a standard
    interface to Sentences but are going to be doing substantially
    different processing depending on which sentence type you have, then
    yes, this class hierarchy may be useful to you.

    --
    Ben Sizer

    Comment

    • beza1e1

      #3
      Re: Writing a parser the right way?

      Well, a declarative sentence is essentially subject-predicate-object,
      while a question is predicate-subject-object. This is important in
      further processing. So perhaps i should code this order into the
      classes? I need to think a little bit more about this.

      Thanks for your feed for thought! :)

      Comment

      • Christopher Subich

        #4
        Re: Writing a parser the right way?

        beza1e1 wrote:[color=blue]
        > Well, a declarative sentence is essentially subject-predicate-object,
        > while a question is predicate-subject-object. This is important in
        > further processing. So perhaps i should code this order into the
        > classes? I need to think a little bit more about this.[/color]

        A question is subject-predicate-object?

        That was unknown by me.

        Honestly, if you're trying a general English parser, good luck.

        Comment

        • Paul McGuire

          #5
          Re: Writing a parser the right way?

          "beza1e1" <andreas.zwinka u@googlemail.co m> wrote in message
          news:1127300661 .440587.287950@ g47g2000cwa.goo glegroups.com.. .[color=blue]
          > I'm writing a parser for english language. This is a simple function to
          > identify, what kind of sentence we have. Do you think, this class
          > wrapping is right to represent the result of the function? Further
          > parsing then checks isinstance(text , Declarative).
          >
          > -------------------
          > class Sentence(str): pass
          > class Declarative(Sen tence): pass
          > class Question(Senten ce): pass
          > class Command(Sentenc e): pass
          >
          > def identify_senten ce(text):
          > text = text.strip()
          > if text[-1] == '.':
          > return Declarative(tex t)
          > elif text[-1] == '!':
          > return Command(text)
          > elif text[-1] == '?':
          > return Question(text)
          > return text
          > -------------------
          >
          > At first i just returned the class, then i decided to derive Sentence
          > from str, so i can insert the text as well.
          >[/color]
          Andreas -

          Are you trying to parse any English sentence, or just a limited form of
          them? Parsing *any* English sentence (or question or interjection or
          command) is a ***huge*** undertaking - Google for "natural language" and you
          will find many efforts (with substantial time and money and manpower
          resources) working on this problem. Applications range from automated
          language translation to helpdesk automated analysis. I really suggest you
          do a bit of research on this topic, just to get an idea of how big this job
          is. Here's a Wikipedia link:


          Here are some simple examples, that quickly go beyond
          subject-predicate-object:

          I drive a truck.
          I drive a red truck.
          I drive a red truck to work.
          I drive a red truck to the shop to work on it.
          I drive a red truck to the shop to have some work done on it.
          I drive a red truck very fast.
          I drive a red truck through a red light.

          Then factor in other sentences (past and future tenses, past and future
          perfect tenses, figurative metaphors) and parsing general English is a major
          job. The favorite test case of the natural language folks is "Time flies
          like an arrow," which early auto-translation software converted to "Temporal
          insects enjoy a pointed projectile."

          On the other hand, if you plan to limit the type and/or content of the
          sentences being parsed (such as computer system commands or adventure game
          inputs, or descriptions of physical objects), then you can scope out a
          reasonable capability by choosing a vocabulary of known verbs and objects,
          and avoiding ambiguities (such as "set", as in "I set the set of glasses
          next to the TV set," or "lead" as in "Lead me to the store that sells lead
          pencils.").

          Hope this sheds some light on your task,
          -- Paul


          Comment

          • Steven Bethard

            #6
            Re: Writing a parser the right way?

            Christopher Subich wrote:[color=blue]
            > beza1e1 wrote:
            >[color=green]
            >> Well, a declarative sentence is essentially subject-predicate-object,
            >> while a question is predicate-subject-object. This is important in
            >> further processing. So perhaps i should code this order into the
            >> classes? I need to think a little bit more about this.[/color]
            >
            > A question is subject-predicate-object?
            >
            > That was unknown by me.
            >
            > Honestly, if you're trying a general English parser, good luck.[/color]

            I second that. Have you read any of the natural language processing
            reasearch in this area? There are a variety of English parsers already
            available? Googling for "charniak parser" or "collins parser" should
            get you something. I believe Dan Bikel has one too. Those are trained
            on Wall Street Journal text. You might also look into Minipar, which is
            rule-based and not as WSJ specific.

            STeVe

            Comment

            • beza1e1

              #7
              Re: Writing a parser the right way?

              Thanks for the hints. I just found NLTK and MontyLingua.

              And yes, it is just adventure game language. This means every tense
              except present tense is discarded as "not changing world". Furthermore
              the parser will make a lot of assumptions, which are perhaps 90% right,
              not perfect:

              if word[-2:] == "ly":
              return Adverb(word)

              Note that uppercase words are identified before, so Willy is parsed
              correctly as a noun. On the other hand "silly boy", will not return a
              correct result.

              Currently it is just a proof-of-concept. Maybe i can integrate a better
              parser engine later. The idea is a kind of mud, where you talk correct
              sentences instead of "go north". I envision a difference like Diablo to
              Pen&Paper. I'd call it more a collaborative story telling game, than a
              actual RPG.

              I fed it your sentences, Paul. Result:
              <['I', 'drive', 'a']> <['red']> <['truck']>
              should be:
              <['I']> <['drive']> <['a', 'red', 'truck']>

              Verbs are the tricky part i think. There is no way to recognice them.
              So i will have to get a database ... work to do. ;)

              Comment

              • Steven Bethard

                #8
                Re: Writing a parser the right way?

                beza1e1 wrote:[color=blue]
                > Verbs are the tricky part i think. There is no way to recognice them.
                > So i will have to get a database ... work to do. ;)[/color]

                Try the Brill tagger[1] or MXPOST[2].

                STeVe

                [1] http://www.cs.jhu.edu/~brill/code.html
                [2] ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz

                Comment

                Working...