problem with pyparsing - suppress

**Glenton** · Mar 3 '10, 01:37 AM

Hi

Your line 3 should be "body = ZeroOrMore(Word (alphas))", right? Or did you already define word=Word(alpha s).

Anyway, this is a classic "gotcha" in regular expressions. It always takes the longest string it can that matches the characteristics .

For example if you run it with sentence defined as begin+body you get
['once','upon',' a','time','end' ,'story']. In other words the "end story" is matched by the body. Then it comes to the end of the string and goes, "but where's the "end story" that was prophesied." *

It needs something to help it differentiate. D*mmit, man, it's a string-parser, not a mind-reader!!** But if you gave it something to work with:

Code:

from pyparsing import *

word=Word(alphas) 
body = ZeroOrMore(word)
begin = Keyword('begin story').suppress()
end = Keyword('$end story').suppress()
 
sentence = begin + body + end 
print sentence.parseString("begin story once upon a time $end story")

Good luck!

*might be over anthropomorphis ing the string parser
**might not be an exact Star Trek quote

**kc2ine** · Mar 4 '10, 08:05 PM

LOL, it's not mind reader? shoot :)

but what if I want to have 'end story' ending tag without the dollar sign... :(

thanks Glenton anyway.

**Glenton** · Mar 5 '10, 02:57 AM

Well, if you know it ends with ' end story', you could just use string slicing.

Code:

from pyparsing import *
 
word=Word(alphas) 
body = ZeroOrMore(word)
begin = Keyword('begin story').suppress()
 
sentence = begin + body

myString="begin story once upon a time end story"
print sentence.parseString(myString[:-10])

**ptmcg** · Mar 9 '10, 12:39 AM

Don't make pyparsing read your mind - just tell it!

This is a very common issue with learning pyparsing. Pyparsing does not do any right-to-left backtracking like regex'es do. It is purely left-to-right. So make sure your repetition does not accidentally include the terminating sentinel value.

See embedded comments below:

Code:

from pyparsing import * 

# define these up front
begin = Keyword('begin story').suppress() 
end = Keyword('end story').suppress() 
word=Word(alphas)  

# what you *really* mean by 'body' - you want
# ZeroOrMore words, as long as they aren't 'end story' -
# so just say that
body = ZeroOrMore(~end + word) 

# the rest is just like you had it
sentence = begin + body + end  
print sentence.parseString("begin story once upon a time end story")

prints:

Code:

['once', 'upon', 'a', 'time']

-- Paul

problem with pyparsing - suppress

problem with pyparsing - suppress

Comment

Comment

Comment

Comment