problem with pyparsing - suppress

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • kc2ine
    New Member
    • Feb 2010
    • 5

    problem with pyparsing - suppress

    Hi,
    I want learn pyparsing and got stuck with this:
    Code:
    from pyparsing import *
    
    body = ZeroOrMore(word)
    begin = Keyword('begin story').suppress()
    end = Keyword('end story').suppress()
    
    sentence = begin + body + end 
    print sentence.parseString("begin story once upon a time end story")
    I'm getting error :
    ParseException: Expected "end story" (at char 36), (line:1, col:37)

    I don't get it why? end story is there isn't it?
    thanks
    Last edited by bvdet; Mar 2 '10, 02:29 PM. Reason: Add code tags
  • Glenton
    Recognized Expert Contributor
    • Nov 2008
    • 391

    #2
    Hi

    Your line 3 should be "body = ZeroOrMore(Word (alphas))", right? Or did you already define word=Word(alpha s).

    Anyway, this is a classic "gotcha" in regular expressions. It always takes the longest string it can that matches the characteristics .

    For example if you run it with sentence defined as begin+body you get
    ['once','upon',' a','time','end' ,'story']. In other words the "end story" is matched by the body. Then it comes to the end of the string and goes, "but where's the "end story" that was prophesied." *

    It needs something to help it differentiate. D*mmit, man, it's a string-parser, not a mind-reader!!** But if you gave it something to work with:
    Code:
    from pyparsing import *
    
    word=Word(alphas) 
    body = ZeroOrMore(word)
    begin = Keyword('begin story').suppress()
    end = Keyword('$end story').suppress()
     
    sentence = begin + body + end 
    print sentence.parseString("begin story once upon a time $end story")
    Good luck!

    *might be over anthropomorphis ing the string parser
    **might not be an exact Star Trek quote

    Comment

    • kc2ine
      New Member
      • Feb 2010
      • 5

      #3
      LOL, it's not mind reader? shoot :)

      but what if I want to have 'end story' ending tag without the dollar sign... :(

      thanks Glenton anyway.

      Comment

      • Glenton
        Recognized Expert Contributor
        • Nov 2008
        • 391

        #4
        Well, if you know it ends with ' end story', you could just use string slicing.
        Code:
        from pyparsing import *
         
        word=Word(alphas) 
        body = ZeroOrMore(word)
        begin = Keyword('begin story').suppress()
         
        sentence = begin + body
        
        myString="begin story once upon a time end story"
        print sentence.parseString(myString[:-10])

        Comment

        • ptmcg
          New Member
          • Mar 2010
          • 1

          #5
          Don't make pyparsing read your mind - just tell it!

          This is a very common issue with learning pyparsing. Pyparsing does not do any right-to-left backtracking like regex'es do. It is purely left-to-right. So make sure your repetition does not accidentally include the terminating sentinel value.

          See embedded comments below:

          Code:
          from pyparsing import * 
          
          # define these up front
          begin = Keyword('begin story').suppress() 
          end = Keyword('end story').suppress() 
          word=Word(alphas)  
          
          # what you *really* mean by 'body' - you want
          # ZeroOrMore words, as long as they aren't 'end story' -
          # so just say that
          body = ZeroOrMore(~end + word) 
          
          # the rest is just like you had it
          sentence = begin + body + end  
          print sentence.parseString("begin story once upon a time end story")
          prints:
          Code:
          ['once', 'upon', 'a', 'time']
          -- Paul

          Comment

          Working...