trying to match a string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andrew Freeman

    #16
    Re: trying to match a string

    John Machin wrote:
    On Jul 20, 5:00 am, Andrew Freeman <alif...@gmail. comwrote:
    >
    >Andrew Freeman wrote:
    >>
    >>John Machin wrote:
    >>>
    >>>A couple of points:
    >>>(1) Instead of search(r'^blahb lah', ...) use match(r'blahbla h', ...)
    >>>(2) You need to choose your end-anchor correctly; your pattern is
    >>>permitting a newline at the end:
    >>>>
    >I forgot to change search to match. This should be better:
    >>
    >def match(var):
    > if re.match(r'[LRM]*\Z', var):
    > return True
    > else:
    > return False
    >>
    >
    A bit wordy ...
    >
    if blahblah:
    return True
    else:
    return False
    >
    can in total generality be replaced by:
    >
    return blahblah
    >
    >
    >
    >I was also thinking if you had a list of these items needing to be
    >verified you could use this:
    >>
    >
    You could, but I suggest you don't use it in a job interview :-)
    >
    >
    > >>l = ['LLMMRR', '00thLL', 'L', '\n']
    >>
    >
    (1) Don't use 'L'.lower() as a name; it slows down reading as people
    need to fire up their mental parser to distinguish it from the result
    of 3 - 2
    >
    >
    > >>out = []
    > >>map(lambda i: match(i)==False or out.append(i), l)
    >>
    (2) Read PEP 8
    (3) blahblah == False ==not blahblah
    (4) You didn't show the output from map() i.e. something like [None,
    True, None, True]
    (5) or out.append(...) is a baroque use of a side-effect, and is quite
    unnecessary. If you feel inexorably drawn to following the map way,
    read up on the filter and reduce functions. Otherwise learn about list
    comprehensions and generators.
    >
    >
    > >>print out
    >['LLMMRR', 'L']
    >>
    >>
    >
    Consider this:
    >
    >
    >>>import re
    >>>alist = ['LLMMRR', '00thLL', 'L', '\n']
    >>>zeroplusLR M = re.compile(r'[LRM]*\Z').match
    >>>filter(zerop lusLRM, alist)
    >>>>
    ['LLMMRR', 'L']
    >
    >>>[x for x in alist if zeroplusLRM(x)]
    >>>>
    ['LLMMRR', 'L']
    >
    Thank you for the pointers!
    (1) Depending on the typeface I totally agree, Courier New has a nearly
    indistinguishab le 1 and l, I'm using Dejavu Sans Mono (Bitstream Vera
    based). I was just thinking of it as a generic variable name for some
    input. I'm fairly new to python and programming in general, it's more of
    a hobby.

    (2-3) This is actually the first time I've used map, maybe I should not
    give extra examples, I was actually using it as a learning tool for
    myself. I'm very thankful the mailing list has such skilled
    contributers, such as yourself, but I assume that it can't hurt to give
    working code, even though the style is less than perfect.

    (3) Personally I think map(lambda i: match(i)==False or out.append(i),
    l) is a little more readable than map(lambda i: not match(i) or
    out.append(i), l) even if "baroque", your use of filter is obviously
    much clearer than either.

    (4) I highly doubt that this code was actually to be used in an
    interactive session, the False/True output was truncated intentionally,
    it's an obvious, but superfluous output (unless you were to rely on this
    by attaching it to a variable which might lead to sorting issues).

    (5) Thank you very much, I've read of the filter and reduce functions,
    but haven't used them enough to recognize their usefulness.

    I did realize that a list comprehension would be useful, but wanted to
    try map()
    I put together a generic matcher that returns either a list of True data
    (if the input is a list or tuple) or a boolean value:

    def match(ex, var):
    "ex is the regular expression to match for, var the iterable or
    string to return a list of matching items or a boolean value respectively."
    ex = re.compile(ex). match
    if isinstance(var, (list, tuple)):
    return filter(ex, var)
    else:
    return bool(ex(var))

    I believe this is fairly clean and succinct code, it would help my
    learning immensely if you feel there is a more succinct, generic way of
    writing this function.
    --
    Andrew

    Comment

    • John Machin

      #17
      Re: trying to match a string

      On Jul 20, 11:14 am, Andrew Freeman <alif...@gmail. comwrote:
      John Machin wrote:
      (4) I highly doubt that this code was actually to be used in an
      interactive session,
      The offending code is a nonsense wherever it is used.
      the False/True output was truncated intentionally,
      What meaning are you attaching to "truncated" ?
      it's an obvious, but superfluous output (unless you were to rely on this
      by attaching it to a variable which might lead to sorting issues).
      >
      I put together a generic matcher that returns either a list of True data
      (if the input is a list or tuple) or a boolean value:
      >
      def match(ex, var):
      "ex is the regular expression to match for, var the iterable or
      string to return a list of matching items or a boolean value respectively."
      ex = re.compile(ex). match
      You lose clarity by rebinding ex like that, and you gain nothing.

      if isinstance(var, (list, tuple)):
      return filter(ex, var)
      else:
      return bool(ex(var))
      >
      I believe this is fairly clean and succinct code, it would help my
      learning immensely if you feel there is a more succinct, generic way of
      writing this function.
      You have created a function which does two quite different things
      depending on whether one of the arguments is one of only two of the
      many kinds of iterables and which has a rather generic (match what?)
      and misleading (something which filters matches is called "match"??)
      name. The loss of clarity and ease of understanding caused by the
      readers having to find the code for the function so that they can
      puzzle through it means that the most succinct, generic and
      *recommended* way of writing this function would be not to write it at
      all.

      Write a function which returns a MatchObject. In the unlikely event
      that that anyone really wants to put bools in a list and sort them,
      then they can wrap bool() around it. Give it a meaningful name e.g.
      match_LRM.

      You want to check if a single variable refers to a valid LRM string?
      Use match_LRM(the_v ariable). Nice and clear.

      You want to filter out of some iterable all the occurrences of valid
      LRM strings? Use filter (whose name indicates its task) or a generator
      or list comprehension ... what [x for x in some_iterable if
      match_LRM(x)] does should be screamingly obvious i.e. have less chance
      than filter of needing a trip to the manual.

      HTH,
      John

      Comment

      • Andrew Freeman

        #18
        Re: trying to match a string

        John Machin wrote:
        On Jul 20, 11:14 am, Andrew Freeman <alif...@gmail. comwrote:
        >
        >John Machin wrote:
        >>
        >
        >
        >(4) I highly doubt that this code was actually to be used in an
        >interactive session,
        >>
        >
        The offending code is a nonsense wherever it is used.
        >
        >
        >the False/True output was truncated intentionally,
        >>
        >
        What meaning are you attaching to "truncated" ?
        >
        I'm attaching the meaning of "deleted the line (manually (not in
        python))" to truncated, I'm actually using ipython, but though it would
        be a good practice to type it out as if it come from the standard
        interpretor. I also though it would be OK to leave out some output which
        I considered unnecessary.

        Comment

        • oj

          #19
          Re: trying to match a string

          On Jul 19, 3:04 am, Andrew Freeman <alif...@gmail. comwrote:
          let me revise it please:
          >
          To show if valid:
          >
          if re.search(r'^[LRM]*$', 'LM'):
              print 'Valid'
          Fine, this works, although match instead of search blah blah blah as
          has already been mentioned. I still think searching for one invalid
          character is more elegant then trying to match the entire string, but
          that's just personal preference, I guess.
          >
          To show if invalid,
          >
          if re.search(r'^[^LRM]*$', '0'):
              print 'Inalid'
          No. This is wrong. This only matches strings that consist entirely of
          characters that are not L, R or M:
          >>import re
          >>if re.search(r'^[^LRM]*$', 'ZZZLZZZ'):
          ... print "Invalid"
          ...
          >>>
          This doesn't print "Invalid" because there is one non-invalid
          character there, which is clearly not what the OP wanted.

          Comment

          • Fredrik Lundh

            #20
            Re: trying to match a string

            oj wrote:
            Fine, this works, although match instead of search blah blah blah as
            has already been mentioned. I still think searching for one invalid
            character is more elegant then trying to match the entire string, but
            that's just personal preference, I guess.
            The drawback is that it's a lot easier to mess up the edge cases if you
            do that (as this thread has shown). The small speedup you get in
            typical cases is quickly offset by extra debugging/testing time (or, for
            that matter, arguing with c.l.py:ers over more or less contrived ways to
            interpret the original post).

            Guess it's up to personal preferences for how to best help others.
            Unless the OP explicitly asks for something else, I prefer to use simple
            and straight-forward solutions with reasonable execution behaviour over
            clever tricks or odd-ball solutions; it's not a JAPH contest, after all.

            </F>

            Comment

            • oj

              #21
              Re: trying to match a string

              On Jul 21, 11:04 am, Fredrik Lundh <fred...@python ware.comwrote:
              The drawback is that it's a lot easier to mess up the edge cases if you
              do that (as this thread has shown).  The small speedup you get in
              typical cases is quickly offset by extra debugging/testing time (or, for
              that matter, arguing with c.l.py:ers over more or less contrived ways to
              interpret the original post).
              I disagree, from this thread, most of the erroneous solutions have
              been attempts to match the entire string.
              Guess it's up to personal preferences for how to best help others.
              Unless the OP explicitly asks for something else, I prefer to use simple
              and straight-forward solutions with reasonable execution behaviour over
              clever tricks or odd-ball solutions; it's not a JAPH contest, after all.
              [^LRM] *is* a simple and straight-forward regex - it isn't attempting
              to do any clever tricks or anything odd-ball.

              That said, I still think the sets solution is more elegant then the
              regex solutions.

              Comment

              Working...