Regex problem, match if line contains <a>, unless it also contains <b>

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • James Dyer

    Regex problem, match if line contains <a>, unless it also contains <b>

    I'm having problems getting a regex to work.
    Basically, given two search parameters ($search1 and $search2), it
    should allow me to filter a log file such that lines with the $search1
    string in are printed, unless the $search2 string is also in that line
    somewhere (either before or after $search1).

    I'm creating my regex like this:
    $compiled_regex = qr/^(?!.*$search2) $search1(?!.*$s earch2)/;

    I then use it:

    while( <> ) {
    next if( $_ !~ /$compiled_regex/ );
    print $_ . "\n";
    }

    With the following test data:

    2004-02-18 04:06:50 1AtIua-0001Hh-00 -> sysadmin@foobar .com R=lookuphost T=remot
    e_smtp H=mxhost-1.foo.bar [0.0.0.0]
    2004-02-19 04:02:02 1AtfNx-0008DC-00 -> sysadmin@foobar .com R=lookuphost T=remot
    e_smtp H=mxhost-1.foo.bar [0.0.0.0]
    2004-02-19 04:07:26 1AtfO5-0008Gs-00 -> sysadmin@foobar .com R=lookuphost T=remot
    e_smtp H=mxhost-1.foo.bar [0.0.0.0]

    If $search1 is set to 'sysadmin', and search2 is set to '0008Gs',
    none of the lines in the data are displayed, whereas I would expect the
    first two to be displayed.

    With this test data:
    foo
    foo foo
    foo foo foo
    foo bar
    bar foo
    foo bar foo
    foo bar bar
    bar foo bar
    bar foo foo
    bar
    bar bar
    bar bar bar

    $search1 set to 'foo', and $search2 set to 'bar', I get the
    expected results (foo, foo foo and foo foo foo displayed).

    I just can't figure out why nothing is being displayed in my first test case.
    My gut instinct is that it's got something to do with the special'ish
    characters in the data ('-', '>' etc.), but I'm not sure.

    Any thoughts?

    J
  • toylet

    #2
    Re: Regex problem, match if line contains &lt;a&gt;, unless it also contains&lt;b&g t;

    shouldn't it be ".*$search2 ?" rather than "?.*$search 2" ?
    [color=blue]
    > I'm creating my regex like this:
    > $compiled_regex = qr/^(?!.*$search2) $search1(?!.*$s earch2)/;[/color]

    --
    .~. Might, Courage, Vision. In Linux We Trust.
    / v \ http://www.linux-sxs.org
    /( _ )\ Linux 2.4.22-xfs
    ^ ^ 4:14pm up 5:47 1 user 1.01 1.00

    Comment

    • James Dyer

      #3
      Re: Regex problem, match if line contains &lt;a&gt;, unless it also contains &lt;b&gt;

      OK, I was being stupid, and really not thinking about what my regex
      was actually doing.
      I've now solved the problem - for those of you who are interested,
      this appears to work:

      $compiled_regex = qr/^(?!.*$search2) .*$search1/;

      J


      jad@hungover.or g (James Dyer) wrote in message news:<392df3a3. 0402190543.80b2 2ea@posting.goo gle.com>...[color=blue]
      > I'm having problems getting a regex to work.
      > Basically, given two search parameters ($search1 and $search2), it
      > should allow me to filter a log file such that lines with the $search1
      > string in are printed, unless the $search2 string is also in that line
      > somewhere (either before or after $search1).
      >
      > I'm creating my regex like this:
      > $compiled_regex = qr/^(?!.*$search2) $search1(?!.*$s earch2)/;
      >
      > I then use it:
      >
      > while( <> ) {
      > next if( $_ !~ /$compiled_regex/ );
      > print $_ . "\n";
      > }
      >
      > With the following test data:
      >
      > 2004-02-18 04:06:50 1AtIua-0001Hh-00 -> sysadmin@foobar .com R=lookuphost T=remot
      > e_smtp H=mxhost-1.foo.bar [0.0.0.0]
      > 2004-02-19 04:02:02 1AtfNx-0008DC-00 -> sysadmin@foobar .com R=lookuphost T=remot
      > e_smtp H=mxhost-1.foo.bar [0.0.0.0]
      > 2004-02-19 04:07:26 1AtfO5-0008Gs-00 -> sysadmin@foobar .com R=lookuphost T=remot
      > e_smtp H=mxhost-1.foo.bar [0.0.0.0]
      >
      > If $search1 is set to 'sysadmin', and search2 is set to '0008Gs',
      > none of the lines in the data are displayed, whereas I would expect the
      > first two to be displayed.
      >
      > With this test data:
      > foo
      > foo foo
      > foo foo foo
      > foo bar
      > bar foo
      > foo bar foo
      > foo bar bar
      > bar foo bar
      > bar foo foo
      > bar
      > bar bar
      > bar bar bar
      >
      > $search1 set to 'foo', and $search2 set to 'bar', I get the
      > expected results (foo, foo foo and foo foo foo displayed).
      >
      > I just can't figure out why nothing is being displayed in my first test case.
      > My gut instinct is that it's got something to do with the special'ish
      > characters in the data ('-', '>' etc.), but I'm not sure.
      >
      > Any thoughts?
      >
      > J[/color]

      Comment

      • toylet

        #4
        Re: Regex problem, match if line contains &lt;a&gt;, unless it also contains&lt;b&g t;

        > I've now solved the problem - for those of you who are interested,[color=blue]
        > this appears to work:
        > $compiled_regex = qr/^(?!.*$search2) .*$search1/;[/color]

        what's the meaning of "?!" in the regex?

        --
        .~. Might, Courage, Vision. In Linux We Trust.
        / v \ http://www.linux-sxs.org
        /( _ )\ Linux 2.4.22-xfs
        ^ ^ 7:42pm up 9:15 1 user 0.97 0.93

        Comment

        • toylet

          #5
          Re: Regex problem, match if line contains &lt;a&gt;, unless it also contains&lt;b&g t;

          >> I've now solved the problem - for those of you who are interested,[color=blue][color=green]
          >> this appears to work:
          >> $compiled_regex = qr/^(?!.*$search2) .*$search1/;[/color]
          >
          > what's the meaning of "?!" in the regex?[/color]

          I figured it out. need to force the context of the $! variable.

          print int($!) . $!;

          int($i) prints the error number, 2nd $! prints the message.


          --
          .~. Might, Courage, Vision. In Linux We Trust.
          / v \ http://www.linux-sxs.org
          /( _ )\ Linux 2.4.22-xfs
          ^ ^ 7:54pm up 9:27 1 user 1.00 0.94

          Comment

          • nobull@mail.com

            #6
            Re: Regex problem, match if line contains &lt;a&gt;, unless it also contains &lt;b&gt;

            jad@hungover.or g (James Dyer) wrote in message news:<392df3a3. 0402190543.80b2 2ea@posting.goo gle.com>...
            [color=blue]
            > $compiled_regex = qr/^(?!.*$search2) $search1(?!.*$s earch2)/;[/color]

            Ignoring the possiblity that $search1 maches a newline, the second
            (?!.*$search2) is redundant. It can never fail to match since the re
            engine wouldn't get as that far if there was a match for $search2
            anywhere in the data.

            $compiled_regex = qr/^(?!.*$search2) $search1/s;
            [color=blue]
            > 2004-02-18 04:06:50 1AtIua-0001Hh-00 -> sysadmin@foobar .com R=lookuphost T=remot[/color]
            [color=blue]
            > If $search1 is set to 'sysadmin', and search2 is set to '0008Gs',[/color]

            You are only looking for $search1 at the start of the string. You
            probably wanted.

            $compiled_regex = qr/^(?!.*$search2) .*$search1/s;

            Note - using a single regex for this is probably not a good idea
            unless you are forced into doing so by the fact that you are calling
            an existing function that you can't modify and that takes a single
            regex as an argument.

            If you are not compelled to use a single regex it is clearer, and
            probably faster to use two.

            /$search1/ && !/$search2/
            [color=blue]
            > Any thoughts?[/color]

            Well since you ask...

            This topic has been frequently discussed in the Perl newsgroups that
            exist on Usenet. I think you should have done a search before you
            posted. Having decided you wanted to post I think you should have
            done so to a newsgroup that still exists. This one doesn't (see FAQ)
            so very few people will see what you post here.

            Comment

            Working...