regular expression reverse match?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ron Adam

    regular expression reverse match?


    Is it possible to match a string to regular expression pattern instead
    of the other way around?

    For example, instead of finding a match within a string, I want to
    find out, (pass or fail), if a string is a partial match to an re.

    Given an re of 'abcd and a bunch of other stuff'

    This is what i'm looking for:

    string / result
    'a' / pass
    'ab' / pass
    'abc' / pass
    'abd' / fail
    'aaaa' / fail
    'abcd and a bunch of other stuff and then some' / fail

    Is there a way to form a regular expression that will do this?

    I'm hoping to use more complex regular expressions than this. But
    the left to right precedence will still be the same.

    _Ron Adam




  • Emile van Sebille

    #2
    Re: regular expression reverse match?


    "Ron Adam" <radam2@tampaba y.rr.com> wrote in message
    news:fqcupvkic6 3mrb5hv1hbn1g06 fggkdl91g@4ax.c om...[color=blue]
    >
    > Is it possible to match a string to regular expression pattern instead
    > of the other way around?[/color]

    You can abuse the implmentation details to discover the original search
    string.
    [color=blue]
    >
    > For example, instead of finding a match within a string, I want to
    > find out, (pass or fail), if a string is a partial match to an re.
    >
    > Given an re of 'abcd and a bunch of other stuff'[/color]

    I'll assume you mean comething like:

    x = re.compile('abc d and a bunch of other stuff')

    [color=blue]
    >
    > This is what i'm looking for:
    >
    > string / result
    > 'a' / pass
    > 'ab' / pass
    > 'abc' / pass
    > 'abd' / fail
    > 'aaaa' / fail
    > 'abcd and a bunch of other stuff and then some' / fail
    >
    > Is there a way to form a regular expression that will do this?[/color]

    for k,v in re._cache.items ():
    if v == x:
    ss=k[0]
    break

    Then it's a normal:
    [color=blue][color=green][color=darkred]
    >>> re.match('a',ss )[/color][/color][/color]
    <_sre.SRE_Mat ch object at 0x009828E0>[color=blue][color=green][color=darkred]
    >>> re.match('abd', ss)
    >>>[/color][/color][/color]


    Not sure that's what you're looking for, but reasonably sure it won't work
    in all cases.

    HTH,

    Emile van Sebille
    emile@fenx.com


    Comment

    • Jay Dorsey

      #3
      Re: regular expression reverse match?

      Ron Adam wrote:
      [color=blue]
      > Is it possible to match a string to regular expression pattern instead
      > of the other way around?
      >
      > For example, instead of finding a match within a string, I want to
      > find out, (pass or fail), if a string is a partial match to an re.
      >
      > Given an re of 'abcd and a bunch of other stuff'
      >
      > This is what i'm looking for:
      >
      > string / result
      > 'a' / pass
      > 'ab' / pass
      > 'abc' / pass
      > 'abd' / fail
      > 'aaaa' / fail
      > 'abcd and a bunch of other stuff and then some' / fail
      >[/color]

      How about:
      [color=blue][color=green][color=darkred]
      >>> matcher = "abcd and a bunch of other stuff"
      >>> phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch of[/color][/color][/color]
      other stuff and then some"][color=blue][color=green][color=darkred]
      >>> for phrase in phrases:[/color][/color][/color]
      .... if phrase == matcher[:len(phrase)]: print "pass"
      .... else: print "fail"
      ....
      pass
      pass
      pass
      fail
      fail
      fail


      Jay


      Comment

      • Ron Adam

        #4
        Re: regular expression reverse match?

        On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
        <emile@fenx.com > wrote:
        [color=blue]
        >
        >"Ron Adam" <radam2@tampaba y.rr.com> wrote in message
        >news:fqcupvkic 63mrb5hv1hbn1g0 6fggkdl91g@4ax. com...[color=green]
        >>
        >> Is it possible to match a string to regular expression pattern instead
        >> of the other way around?[/color]
        >
        >You can abuse the implmentation details to discover the original search
        >string.
        >[/color]


        I already know the search string... Or will once I understand the how
        to form them to do what I want. <hopefully> What I don't know is
        the completed string that will match to it beforehand. Or at least
        that's the idea.

        I'm writing an interactive keyboard input routine and want to use re
        to specify the allowed input. Some things are easy like only allowing
        digits or characters.

        So far I can check for %i and %f first to specify simple ints and
        floats. A test cast operation with an exception handles those
        perfectly.

        As it is, the program checks the input buffer after each keystroke and
        determines if the buffer is acceptable against a pattern as the string
        is being built. A reverse match. It allows new keystrokes to be
        added to the buffer as long as it's within the pattern.

        I was hoping to use regular expression patterns as a function argument
        to specify a wide range of input patterns.

        I'm still a little new to Python and a lot new to the regular
        expressions although I've used variations of them before.


        So how do I say.... accept only 1 character out of set [YyNn] but
        only one character and do not match Ye, yy nn etc...

        .... accept only 10 of any type characters but not 11

        .... accept only the letters of "a particular string" sequentially and
        then no more.

        .... accept a number in the form of "nnn-nnn-nnn" sequentially with
        required dashes.

        ..... accept any sequence of characters and numbers as long as they
        contain at least 1 of each type, cap letter, small letter, and digit
        and be a minimum of 6 characters long. ie.... a password check.

        It's probably easy to do all of these given a completed string first.

        This started out as a python learning project.. <grin> but it's sort
        turned into a real interesting coding situation.




        Thanks for the reply.

        _Ron Adam



        [color=blue][color=green]
        >>
        >> For example, instead of finding a match within a string, I want to
        >> find out, (pass or fail), if a string is a partial match to an re.
        >>
        >> Given an re of 'abcd and a bunch of other stuff'[/color]
        >
        >I'll assume you mean comething like:
        >
        >x = re.compile('abc d and a bunch of other stuff')
        >
        >[color=green]
        >>
        >> This is what i'm looking for:
        >>
        >> string / result
        >> 'a' / pass
        >> 'ab' / pass
        >> 'abc' / pass
        >> 'abd' / fail
        >> 'aaaa' / fail
        >> 'abcd and a bunch of other stuff and then some' / fail
        >>
        >> Is there a way to form a regular expression that will do this?[/color]
        >
        >for k,v in re._cache.items ():
        > if v == x:
        > ss=k[0]
        > break
        >
        >Then it's a normal:
        >[color=green][color=darkred]
        >>>> re.match('a',ss )[/color][/color]
        ><_sre.SRE_Matc h object at 0x009828E0>[color=green][color=darkred]
        >>>> re.match('abd', ss)
        >>>>[/color][/color]
        >[/color]

        I'll give it a try... not sure either. Like I said, this is kind of
        new to me. :)

        [color=blue]
        >
        >Not sure that's what you're looking for, but reasonably sure it won't work
        >in all cases.
        >
        >HTH,
        >
        >Emile van Sebille
        >emile@fenx.c om
        >[/color]

        Comment

        • Ron Adam

          #5
          Re: regular expression reverse match?

          On Tue, 28 Oct 2003 21:19:56 -0600, Jay Dorsey <jay@jaydorsey. com>
          wrote:
          [color=blue]
          >
          >How about:
          >[color=green][color=darkred]
          > >>> matcher = "abcd and a bunch of other stuff"
          > >>> phrases = ["a", "ab", "abc", "abd", "aaaa", "abcd and a bunch of[/color][/color]
          >other stuff and then some"][color=green][color=darkred]
          > >>> for phrase in phrases:[/color][/color]
          >... if phrase == matcher[:len(phrase)]: print "pass"
          >... else: print "fail"
          >...
          >pass
          >pass
          >pass
          >fail
          >fail
          >fail
          >
          >
          >Jay
          >[/color]

          Hi Jay, That was an overly simple example I gave I think. The
          'pattern' will in most case not be just a simple string. And I'll
          only be testing on one string at a time as it forms.

          It's an interactive input routine. So as the person types, I want to
          test the buffer and accept or reject each key according to the
          pattern. Pass or fail.


          This method would work though if it's possible to expand a regular
          expression out to a string of single matching re characters I think.
          Then it would be a matter of doing re.match('ibuff er',
          pattern[:len(ibuffer)].

          I just don't know enough yet, but am learning quickly though,
          thanks.

          _Ron



          Comment

          • Ron Adam

            #6
            Re: regular expression reverse match?

            On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
            <emile@fenx.com > wrote:
            [color=blue]
            >I'll assume you mean comething like:
            >
            >x = re.compile('abc d and a bunch of other stuff')
            >[color=green]
            >>
            >> This is what i'm looking for:
            >>
            >> string / result
            >> 'a' / pass
            >> 'ab' / pass
            >> 'abc' / pass
            >> 'abd' / fail
            >> 'aaaa' / fail
            >> 'abcd and a bunch of other stuff and then some' / fail
            >>
            >> Is there a way to form a regular expression that will do this?[/color]
            >
            >for k,v in re._cache.items ():
            > if v == x:
            > ss=k[0]
            > break
            >
            >Then it's a normal:
            >[color=green][color=darkred]
            >>>> re.match('a',ss )[/color][/color]
            ><_sre.SRE_Matc h object at 0x009828E0>[color=green][color=darkred]
            >>>> re.match('abd', ss)
            >>>>[/color][/color]
            >[/color]


            Hi again Emile,

            I tried it and get an error.
            Here's my result. Am I missing something?
            [color=blue][color=green][color=darkred]
            >>>
            >>> import re
            >>> x = re.compile('abc d and a bunch of other stuff')
            >>> for k,v in re._cache.items ():[/color][/color][/color]
            if v==x:
            ss=k[0]
            break


            Traceback (most recent call last):
            File "<pyshell#2 1>", line 1, in ?
            for k,v in re._cache.items ():
            AttributeError: 'module' object has no attribute '_cache'[color=blue][color=green][color=darkred]
            >>>[/color][/color][/color]


            With a trial and error method, (works for me eventually), this is what
            I've been able to get to work so far. I found the '$' is what I
            needed to limit the buffer length to the pattern.

            e = kb_input('Enter "Y" or "N" please: ', '[YyNn]$')
            f = kb_input('Enter "yes" please: ', 'y$|ye$|yes$')
            g = kb_input('Enter "yes" or "no": ', '(y$|ye$|yes$)| (n$|no$)')
            h = kb_input('Enter a 5 digit number:','\d$|\ d{2}$\d{3}$\d{4 }$\d{5}$')

            New problem: Is there a way to expand an re from:

            'yes$' to 'y$|ye$|yes$'

            and

            '(yes$)|(no$)' to '(y$|ye$|yes$)| (n$|no$)'

            and

            '\d{30}$' to '\d{1}$|\d{2}$| \d{3}$|\d{4}$|\ d{5}$| .......'


            Other expressions that I might use would be:

            '\d{3}-\d{3}-d\{4}$' to match a phone number

            or '\c{40}' to specify a character string 40 characters long.

            but if I have to manually expend these to the above formats they can
            get pretty long.

            ''abcd and a bunch of other stuff' becomes...
            'a$|ab$|abc$|ab cd$|abcd $|abcd a$|abc... etc... etc... ....stuff$'

            Well at least I know what the re's look like now. Time to sleep on it
            and see what tomorrow brings.

            _Ron


            Comment

            • Emile van Sebille

              #7
              Re: regular expression reverse match?


              "Ron Adam" <radam2@tampaba y.rr.com> wrote in message
              news:1jrupvoprk 35v633ecb2lnlu5 r1eu0sceb@4ax.c om...[color=blue]
              > On Tue, 28 Oct 2003 20:09:38 -0800, "Emile van Sebille"
              > <emile@fenx.com > wrote:
              >[color=green]
              > >I'll assume you mean comething like:
              > >
              > >x = re.compile('abc d and a bunch of other stuff')
              > >[color=darkred]
              > >>
              > >> This is what i'm looking for:
              > >>
              > >> string / result
              > >> 'a' / pass
              > >> 'ab' / pass
              > >> 'abc' / pass
              > >> 'abd' / fail
              > >> 'aaaa' / fail
              > >> 'abcd and a bunch of other stuff and then some' / fail
              > >>
              > >> Is there a way to form a regular expression that will do this?[/color]
              > >
              > >for k,v in re._cache.items ():
              > > if v == x:
              > > ss=k[0]
              > > break
              > >
              > >Then it's a normal:
              > >[color=darkred]
              > >>>> re.match('a',ss )[/color]
              > ><_sre.SRE_Matc h object at 0x009828E0>[color=darkred]
              > >>>> re.match('abd', ss)
              > >>>>[/color]
              > >[/color]
              >
              >
              > Hi again Emile,
              >
              > I tried it and get an error.
              > Here's my result. Am I missing something?[/color]

              See, I told you it wouldn't work ;-) I abused an implementation detail
              that apparently doesn't exist on the version you've got.

              [snip]
              [color=blue]
              > New problem: Is there a way to expand an re from:
              >
              > 'yes$' to 'y$|ye$|yes$'
              >
              > and
              >
              > '(yes$)|(no$)' to '(y$|ye$|yes$)| (n$|no$)'
              >
              > and
              >
              > '\d{30}$' to '\d{1}$|\d{2}$| \d{3}$|\d{4}$|\ d{5}$| .......'
              >
              >
              > Other expressions that I might use would be:
              >
              > '\d{3}-\d{3}-d\{4}$' to match a phone number
              >[/color]

              You'll probably want to build a validation routine. Here's one quick idea:

              import re

              def oksofar(pattern , test, sample):
              teststr = "".join([test,sample[len(test):]])
              return not not re.match(patter n, teststr)

              for p,t,s in [
              (r'^(\d{5})$', '123456', '56789'),
              (r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
              (r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
              ]:
              print p,t,s,not not re.match(p, s)
              for ii in range(len(t)):
              print ii,t[:ii+1], oksofar(p, t[:ii+1], s)


              HTH,

              Emile van Sebille
              emile@fenx.com


              Comment

              • Diez B. Roggisch

                #8
                Re: regular expression reverse match?

                Hi,
                [color=blue]
                > As it is, the program checks the input buffer after each keystroke and
                > determines if the buffer is acceptable against a pattern as the string
                > is being built. A reverse match. It allows new keystrokes to be
                > added to the buffer as long as it's within the pattern.[/color]

                I don't think its possible with regular expressions, or at least the way you
                are using them right now, that is to come from a "in the end, I want to
                look it like this, but it shall accept everything that prefixes a valid
                result".

                That won't work - the reason is simply that regular expressions are
                equivalent to finite state automata. I don't know if you are familiar with
                these, but the consist of states, which are nodes, and transitions between
                these, which are edges/arrows between the states.

                Now ususally there is one special starting state S, and there should be at
                least on reachable end-state. As the names suggest, recoginizing a
                specified String starts at S, and then every read character advances the
                internal state until one of the end-states is reached _and_ there is no
                more input.

                Now lets see at a simple example:

                "abc"

                Here every character becomes a state, and the automata looks like this:

                S-a->(a)-b->(b)-c->[c]

                The -*-> means that this transition is taken when * is matched by the
                current input. So -a-> is taken when an "a" is input.

                [c] is an end-state.

                Now what you want is, that _all_ states that are legal are also end-states.
                That is very easy to accomplish when you implement the automata directly
                like this:

                S-a->[a]-b->[b]-c->[c]

                However, its way more complicated to create this as a rex:

                "a(b(c)?)?"

                This looks straightforward , so you might be successful to create rexes like
                this using a preprocessing step - but the more complicated your rex gets,
                this approach will be hard to follow. Actually, I have currently no idea.
                Which doesn't mean its not doable, I just don't have enough time to think
                about it right now :)

                It looks to me, that if you need this feature not on a per-case base where
                you can think about your rexes more thoroughly, you have to sort of roll
                out your own rex-implmenetation. Which isn't too hard, but definitely
                something thats more than just a couple of lines.

                Regards,

                Diez

                Comment

                • Ron Adam

                  #9
                  Re: regular expression reverse match?

                  On Wed, 29 Oct 2003 00:48:46 -0800, "Emile van Sebille"
                  <emile@fenx.com > wrote:


                  [snip]
                  [color=blue]
                  >
                  >See, I told you it wouldn't work ;-) I abused an implementation detail
                  >that apparently doesn't exist on the version you've got.
                  >[/color]

                  It's not the first time something doesn't work on a winxp system. :-)


                  [snip]
                  [color=blue]
                  >
                  >You'll probably want to build a validation routine. Here's one quick idea:
                  >
                  >import re
                  >
                  >def oksofar(pattern , test, sample):
                  > teststr = "".join([test,sample[len(test):]])
                  > return not not re.match(patter n, teststr)
                  >
                  >for p,t,s in [
                  > (r'^(\d{5})$', '123456', '56789'),
                  > (r'^([A-Z]{2}\d(2))$', 'AB4x', 'XY12'),
                  > (r'^(\d{5})-(\d{4})$', '55555-1234', '55555-1212')
                  > ]:
                  > print p,t,s,not not re.match(p, s)
                  > for ii in range(len(t)):
                  > print ii,t[:ii+1], oksofar(p, t[:ii+1], s)
                  >
                  >
                  >HTH,
                  >
                  >Emile van Sebille
                  >emile@fenx.c om
                  >[/color]


                  Ok, it took me a while to see what you did. Using a sample to
                  complete the partial input is a good Idea, Thanks! :-)

                  This works in the case of regular expressions that are linear and only
                  have one possible path.

                  Doesn't work with expressions that have more than one possible path,
                  such as r'yes$|no$' . I think that's what Diez was trying to explain
                  to me. That this could become rather complex when grouping and
                  alternate cases are present.



                  It seems there are three possible routs to take on this.

                  1. Process the input so it will match the re. This will require
                  generating a set of samples, 1 for each re branch. Then matching with
                  each sample. Using the method above.

                  So now the problem becomes.... <starting to feel as though I fell in
                  a rabbit hole> .... is there a way to generate samples where there
                  is one of each for each re branch?

                  2. Process the re so it will match a subset of all possible final end
                  points. ie... a less than operation.

                  if this were a math problem it might look something like this.

                  string <= yes$ | no$
                  string = (y | ye | yes)$ | (n | no)$
                  string = (y$ | ye$ | yes$) | (n$ | no$)
                  string = y$ | ye$ | yes$ | n$ | no$

                  I was hoping there was a way to do this already. Since there isn't,
                  it would require building a parser to convert strings of 'abc' to
                  'a|ab|abc' and handle distributed operations with ^,$ and other
                  special characters. And there are probably exceptions that will
                  complicate the process. :-/

                  This is method is probably only worth doing if it had wider
                  applications. <shrug> I don't know enough (yet) to know.


                  3. Process both the input and the re.

                  1. string <= yes$|no$

                  2. string <= yes$
                  string <= no$

                  3. generate a sample list, 1 sample for each re
                  4. test the input using each sample for a possible match

                  This might be doable.... will have to think on it and do a little
                  experimenting.




                  Comment

                  • Ron Adam

                    #10
                    Re: regular expression reverse match?

                    On Wed, 29 Oct 2003 11:32:45 +0100, "Diez B. Roggisch"
                    <nospam-deets@web.de> wrote:
                    [color=blue]
                    >Hi,
                    >[color=green]
                    >> As it is, the program checks the input buffer after each keystroke and
                    >> determines if the buffer is acceptable against a pattern as the string
                    >> is being built. A reverse match. It allows new keystrokes to be
                    >> added to the buffer as long as it's within the pattern.[/color]
                    >
                    >I don't think its possible with regular expressions, or at least the way you
                    >are using them right now, that is to come from a "in the end, I want to
                    >look it like this, but it shall accept everything that prefixes a valid
                    >result".
                    >
                    >That won't work - the reason is simply that regular expressions are
                    >equivalent to finite state automata. I don't know if you are familiar with
                    >these, but the consist of states, which are nodes, and transitions between
                    >these, which are edges/arrows between the states.[/color]

                    I'm not familiar with the terms, but I am familiar with tree data
                    structures.


                    [clipped good explanation]

                    [color=blue]
                    >However, its way more complicated to create this as a rex:
                    >
                    >"a(b(c)?)?"
                    >
                    >This looks straightforward , so you might be successful to create rexes like
                    >this using a preprocessing step - but the more complicated your rex gets,
                    >this approach will be hard to follow. Actually, I have currently no idea.
                    >Which doesn't mean its not doable, I just don't have enough time to think
                    >about it right now :)[/color]

                    I'm still definitely learning this, and appreciate the help.

                    In reply to Emily, I compared rex to simplifying a math problem like
                    this. I'm hoping this will lead to a method that will work.
                    [color=blue]
                    > if this were a math problem it might look something like this.
                    >
                    > string <= yes$ | no$
                    > string = (y | ye | yes)$ | (n | no)$
                    > string = (y$ | ye$ | yes$) | (n$ | no$)
                    > string = y$ | ye$ | yes$ | n$ | no$[/color]

                    Using the same approach for the example above might be something like:

                    s <= a( b (c)? )?
                    s <= a( b | bc )?
                    s <= a | ab | abc
                    s = a | (a | ab) | ( a | ab | abc)
                    s = a | a | ab | a | ab | abc
                    s = a | ab | abc

                    I have no idea if what I'm doing here is actually valid. I'm sort of
                    thinking it out and learning as I go. Are there rules and methods to
                    manipulating regular expressions in this manner, rex algebra? Or is
                    this a subset of set theory? I think my statistics instructor did
                    something similar to this. It was a few years ago.

                    [color=blue]
                    >It looks to me, that if you need this feature not on a per-case base where
                    >you can think about your rexes more thoroughly, you have to sort of roll
                    >out your own rex-implmenetation. Which isn't too hard, but definitely
                    >something thats more than just a couple of lines.
                    >
                    >Regards,
                    >
                    >Diez[/color]

                    Looks like I only <obvious understatement> need to create a few
                    functions to manipulate rex expressions. Simpler than a full
                    rex-emplementations , but still definitely more than a few lines of
                    code.

                    _Ron Adam



                    Comment

                    Working...