Proposed API change for pyparsing CaselessLiteral - could break existing code

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul McGuire

    Proposed API change for pyparsing CaselessLiteral - could break existing code

    ***This is of especial interest for those who are using the pyparsing
    module, and have defined grammars that make use of CaselessLiteral .***

    One of the bugfix requests I recently got for pyparsing was to fix the
    tokens returned by CaselessLiteral . CaselessLiteral is an interesting
    special case of Literal, since it matches a large number of possible input
    tokens. That is:

    CaselessLiteral ("abcd")

    could match abcd, abcD, abCD, abCd, etc. When parsing something like SQL, a
    CaselessLiteral ('select') has even more options.

    Ordinarily, the other parsing classes in pyparsing return the original
    matched text from the input stream, but CaselessLiteral could make this
    unwieldy - if you had something more complex like this possible grammar for
    a Zork-type game:

    verb = ( CaselessLiteral ("pick") | CaselessLiteral ("turn") |
    CaselessLiteral ("drop") | <...and so on...> )

    then the code to process the tokens would have to neutralize the case of the
    input text, since it could be 'picK', 'Turn', 'DROP'. So I expect that the
    first thing the calling code would do with the results would be to convert
    to upcase or lowercase. Since this was so predictable, I decided to build
    that into the interface, that in the case of CaselessLiteral , the input text
    would *not* be returned as the matching text, but the original specifying
    text string would be returned instead - after all, the caller had already
    decided that case was not significant. This is how the documentation reads,
    that CaselessLiteral will return as its match text the original specifying
    text string.

    Unfortunately, I botched it, waaaaay back pre-1.0.0, when I first put
    pyparsing up on SourceForge. CaselessLiteral always returns the input text
    converted to uppercase. It is still a predictably-cased string, but not as
    documented. In fact, if someone were to specify CaselessLiteral ("pick") and
    get returned "PICK", this could make for other problems.

    So for those of you who are still reading, and who use pyparsing, and have
    CaselessLiteral s in your code, and test on the returned text, what choice
    would you prefer:

    1. Keep the current behavior, and just change the docs.
    2. Fix the current behavior to match the docs, and fix up any code that uses
    it.

    My personal preference is #2. We are still early in pyparsing's code life -
    it has only been generally available for about 4 months - and I think it
    really is the preferred way to go.

    On the other hand, pyparsing has been downloaded almost 900 times from SF,
    and so I want to find out if this will end up making me lots of enemies. :)

    Overall, this has been a gratifying experience - I have gotten many e-mails
    from people who feel this tool is easy to pick up, and fills a common need.
    I want to continue in those 900 peoples' good graces, or at least as many of
    them as I can.

    -- Paul


  • Dan  Dang  Griffith

    #2
    Re: Proposed API change for pyparsing CaselessLiteral - could break existing code

    "Paul McGuire" <ptmcg@austin.r r._bogus_.com> wrote in message news:<geDic.227 72$hR1.20302@fe 2.texas.rr.com> ...[color=blue]
    > So for those of you who are still reading, and who use pyparsing, and have
    > CaselessLiteral s in your code, and test on the returned text, what choice
    > would you prefer:
    >
    > 1. Keep the current behavior, and just change the docs.
    > 2. Fix the current behavior to match the docs, and fix up any code that uses
    > it.
    >
    > My personal preference is #2. We are still early in pyparsing's code life -
    > it has only been generally available for about 4 months - and I think it
    > really is the preferred way to go.[/color]

    I'm +1 on #2, i.e., change the code to match the docs.
    I have a use case where the parser is acting as a "cleanup"
    to the input, making it conform to a coding standard,
    and CaselessLiteral working as described in the docs
    would be perfect. I suppose I could make a setParseAction
    that would convert it to the appropriate case, but that
    would slow it down, plus I'd have to keep track of the
    literal in two places (well, I suppose I could define
    a name for the value and import it from my grammar and
    from the code that has the parse actions, but still...).

    Thanks for pyparsing.
    --dang

    Comment

    Working...