Regular expression to exclude lines?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Shannon Jacobs

    Regular expression to exclude lines?

    Sorry to ask what is surely a trivial question. Also sorry that I don't have
    my current code version on hand, but... Anyway, must be some problem with
    trying to do the negative. It seems like I get into these ruts each time I
    try to deal with regular expressions.

    All I'm trying to do is delete the lines which don't contain a particular
    string. Actually a filter to edit a log file. I can find and replace a thing
    with null, but can't figure out how to find the lines which do not contain
    the thing.

    Going further, I want to generalize and use a JavaScript variable containing
    the decision string, but first I need to worry about the not-within-a-line
    problem.

  • Thomas 'PointedEars' Lahn

    #2
    Re: Regular expression to exclude lines?

    Shannon Jacobs wrote:
    [color=blue]
    > Sorry to ask what is surely a trivial question.[/color]

    Hm, I don't think it is this trivial.
    [color=blue]
    > All I'm trying to do is delete the lines which don't contain a
    > particular string. Actually a filter to edit a log file. I can
    > find and replace a thing with null, but can't figure out how to
    > find the lines which do not contain the thing.[/color]

    Here's a quickhack that filters out of three lines the one that
    does not contain the word `line':

    alert("this is a line\nthis is a\nthis is a
    line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])

    But there must be a better a better way, IIRC there is something
    called `negative lookahead', supported from JavaScript 1.5 on,
    which I have yet not worked with.


    PointedEars

    Comment

    • Lasse Reichstein Nielsen

      #3
      Re: Regular expression to exclude lines?

      Thomas 'PointedEars' Lahn <PointedEars@we b.de> writes:
      [color=blue]
      > Shannon Jacobs wrote:
      >[color=green]
      >> Sorry to ask what is surely a trivial question.[/color]
      >
      > Hm, I don't think it is this trivial.[/color]

      Neither do I. Negative matches in regular expressions rarely are.
      [color=blue]
      > Here's a quickhack that filters out of three lines the one that
      > does not contain the word `line':
      >
      > alert("this is a line\nthis is a\nthis is a
      > line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])[/color]

      That's purely accidental. If you add a line in front, e.g.,
      "bad thing\nthis is a line\nthis is a\n this is a line", it matches
      the string containing the second and third line.
      [color=blue]
      > But there must be a better a better way, IIRC there is something
      > called `negative lookahead', supported from JavaScript 1.5 on,
      > which I have yet not worked with.[/color]

      Negative lookahead might be an easier way to do it.

      The hard way:

      /^([^l\n]*(l[^i]|li[^n]|lin[^e]))*([^l\n])*$/m
      (any "l" is not followed by "ine")

      With negative lookahead:
      /^([^l\n]*l(?!ine))*[^l\n]*$/m

      The "m" at the end makes "^" and "$" match beginning/end of line.

      These regexps only check for the letters "line", not whether they
      occur as a word. To do that, one must check for word boundaries around it:

      Hard:
      /^([^l\n]*(\bl([^i]|i[^n]|in[^e]|ine\B)|\Bl))*[^l\n]*$/m
      Easy:
      /^([^l\n]*(\bl(?!ine\b)| \Bl))*[^l\n]*$/m

      Any "l" right after a word boundary is not followed by ine+word boundary.

      To test this regexp, try:
      ---
      var regexp = /^([^l\n]*(\bl(?!ine\b)| \Bl))*[^l\n]*$/mg ;
      var lines = "nonline\nline\ nlinefeed\nwith line in the middle\n"+
      "no l-word here\n\npreviou s l-word was empty\nand ending in line";
      var dellines = lines.replace(r egexp,"---DELETED---");
      alert(lines);
      alert(dellines) ;
      ---


      A longer explanation of:
      /^([^l\n]*(\bl(?!ine\b)| \Bl))*[^l\n]*$/m
      ^ beginning of line
      ^ some non-l/non-newlines
      ^ either wordboundary + l not followed by "ine"+wordbound ary
      ^or l not after word boundary
      ^any number of times
      ^ and then some non-l/non-newlines again.

      Good luck:)
      /L
      --
      Lasse Reichstein Nielsen - lrn@hotpop.com
      DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
      'Faith without judgement merely degrades the spirit divine.'

      Comment

      • Evertjan.

        #4
        Re: Regular expression to exclude lines?

        Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javas cript:[color=blue]
        > Negative lookahead might be an easier way to do it.[/color]

        What about this non greedy "*?" form:

        <script>

        function replLine(x,t){
        t+="\n"
        var re = new RegExp("[^\n]*?"+x+"[^\n]*\n","g");
        t = t.replace(re ,"")
        return t.replace(/\n$/,"")
        }

        tx="bad thing\nthis is a line\nthis is a\n this is a line"

        alert(replLine( "thing",tx) )
        alert(replLine( "line",tx))

        </script>




        --
        Evertjan.
        The Netherlands.
        (Please change the x'es to dots in my emailaddress)

        Comment

        • Evertjan.

          #5
          Re: Regular expression to exclude lines?

          Evertjan. wrote on 24 nov 2003 in comp.lang.javas cript:
          [color=blue]
          > Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javas cript:[color=green]
          >> Negative lookahead might be an easier way to do it.[/color]
          >
          > What about this non greedy "*?" form:
          >
          > <script>
          >
          > function replLine(x,t){
          > t+="\n"
          > var re = new RegExp("[^\n]*?"+x+"[^\n]*\n","g");
          > t = t.replace(re ,"")
          > return t.replace(/\n$/,"")
          >}
          >
          > tx="bad thing\nthis is a line\nthis is a\n this is a line"
          >
          > alert(replLine( "thing",tx) )
          > alert(replLine( "line",tx))
          >
          > </script>
          >[/color]

          "All I'm trying to do is delete the lines which don't contain a
          particular string. "

          Wow, I missed the "n't"

          I will try again later.

          --
          Evertjan.
          The Netherlands.
          (Please change the x'es to dots in my emailaddress)

          Comment

          • Evertjan.

            #6
            Re: Regular expression to exclude lines?

            Evertjan. wrote on 24 nov 2003 in comp.lang.javas cript:
            [color=blue]
            > "All I'm trying to do is delete the lines which don't contain a
            > particular string. "
            >
            > Wow, I missed the "n't"
            >
            > I will try again later.
            >[/color]

            This better?

            <script>

            function replLine(x,t){
            var re = new RegExp(x,"");
            t+="\n"
            t = t.replace(
            /.*?\n/g,
            function($0,$1, $2)
            {return (!re.test($0))? $0:""}
            )
            return t.replace(/\n$/,"")
            }

            tx="bad thing\nthis is a line\nthis is a\n this is a line"

            alert(replLine( "thing",tx) )
            alert(replLine( "line",tx))

            </script>



            --
            Evertjan.
            The Netherlands.
            (Please change the x'es to dots in my emailaddress)

            Comment

            • Evertjan.

              #7
              Re: Regular expression to exclude lines?

              Evertjan. wrote on 24 nov 2003 in comp.lang.javas cript:
              [color=blue]
              > Evertjan. wrote on 24 nov 2003 in comp.lang.javas cript:
              >[color=green]
              >> "All I'm trying to do is delete the lines which don't contain a
              >> particular string. "
              >>
              >> Wow, I missed the "n't"
              >>
              >> I will try again later.
              >>[/color]
              >
              > This better?
              >
              > <script>
              >
              > function replLine(x,t){
              > var re = new RegExp(x,"");
              > t+="\n"
              > t = t.replace(
              > /.*?\n/g,
              > function($0,$1, $2)
              > {return (!re.test($0))? $0:""}
              > )
              > return t.replace(/\n$/,"")
              >}
              >
              > tx="bad thing\nthis is a line\nthis is a\n this is a line"
              >
              > alert(replLine( "thing",tx) )
              > alert(replLine( "line",tx))
              >
              > </script>
              >[/color]

              Monologue follows.

              Damn, forgot to remove the "!"

              <script>

              function replLine(x,t){
              var re = new RegExp(x,"");
              t+="\n"
              t = t.replace(
              /.*?\n/g,
              function($0,$1, $2)
              {return (re.test($0))?$ 0:""}
              )
              return t.replace(/\n$/,"")
              }

              tx="bad thing\nthis is a line\nthis is a\n this is a line"

              alert(replLine( "thing",tx) )
              alert(replLine( "line",tx))

              </script>



              --
              Evertjan.
              The Netherlands.
              (Please change the x'es to dots in my emailaddress)

              Comment

              • Lasse Reichstein Nielsen

                #8
                Re: Regular expression to exclude lines?

                "Evertjan." <exjxw.hannivoo rt@interxnl.net > writes:
                [color=blue]
                > Evertjan. wrote on 24 nov 2003 in comp.lang.javas cript:[/color]
                [color=blue][color=green]
                >> This better?[/color][/color]
                [color=blue]
                > Monologue follows.
                >
                > Damn, forgot to remove the "!"[/color]
                [color=blue]
                > function replLine(x,t){
                > var re = new RegExp(x,"");
                > t+="\n"
                > t = t.replace(
                > /.*?\n/g,
                > function($0,$1, $2)
                > {return (re.test($0))?$ 0:""}
                > )
                > return t.replace(/\n$/,"")
                > }[/color]

                This first splits the string into lines, and then replaces each line
                based on a second test.
                It should work (and seems to).

                I don't think you need a non-greedy match (.*?) since . doesn't match
                a newline character.


                Maybe you can get around adding the extra "\n" by using multiline
                matching: /^.*$/gm,
                It doesn't remove the newlines in the string though. None of my
                attempts have done that so far.

                This method uses several regexp matches, not just one (which is
                sometimes the better approach :), but the first is really just
                splitting into lines. You can use the split method for that.

                How about this:

                // returns new array containing only those elements that match re
                Array.prototype .filter = function filter(re) {
                var res = [];
                for (var i=0;i<this.leng th;i++) {
                if (re.test(this[i])) {res.push(this[i]);}
                }
                return res;
                }

                var tx="bad thing\nthis is a line\nthis is a\n this is a line";
                alert(tx.split( "\n").filte r(/line/).join("\n"));

                Sadly, adding properties to Array.prototype means that you can't
                (easily) use
                for (var i in this)
                to iterate through a sparse array. The filter method is enumerable,
                so it is also included.

                /L
                --
                Lasse Reichstein Nielsen - lrn@hotpop.com
                DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
                'Faith without judgement merely degrades the spirit divine.'

                Comment

                • Evertjan.

                  #9
                  Re: Regular expression to exclude lines?

                  Lasse Reichstein Nielsen wrote on 24 nov 2003 in comp.lang.javas cript:[color=blue]
                  > I don't think you need a non-greedy match (.*?) since . doesn't match
                  > a newline character.[/color]


                  True !

                  --
                  Evertjan.
                  The Netherlands.
                  (Please change the x'es to dots in my emailaddress)

                  Comment

                  • Dr John Stockton

                    #10
                    Re: Regular expression to exclude lines?

                    JRS: In article <3FC2262D.40607 01@PointedEars. de>, seen in
                    news:comp.lang. javascript, Thomas 'PointedEars' Lahn
                    <PointedEars@we b.de> posted at Mon, 24 Nov 2003 16:39:25 :-[color=blue]
                    >Shannon Jacobs wrote:
                    >[color=green]
                    >> Sorry to ask what is surely a trivial question.[/color]
                    >
                    >Hm, I don't think it is this trivial.
                    >[color=green]
                    >> All I'm trying to do is delete the lines which don't contain a
                    >> particular string. Actually a filter to edit a log file. I can
                    >> find and replace a thing with null, but can't figure out how to
                    >> find the lines which do not contain the thing.[/color]
                    >
                    >Here's a quickhack that filters out of three lines the one that
                    >does not contain the word `line':
                    >
                    >alert("this is a line\nthis is a\nthis is a
                    >line".match(/\n*[^\n]*\n*([^\n]*[^l][^i][^n][^e][^\n]*\n)*[^\n]*\n*/)[1])
                    >
                    >But there must be a better a better way, IIRC there is something
                    >called `negative lookahead', supported from JavaScript 1.5 on,
                    >which I have yet not worked with.[/color]


                    AIUI, the OP wants a file which is the previous file minus those lines
                    which do not contain the string. That code, after broken-string
                    correction, pops up a box showing the first unwanted line.

                    Javascript "alone" is not capable of file handling, AFAICS.

                    If the OP can read and write the file line by line, controlled by
                    javascript, and apply script to each line, then it is only necessary to
                    do (pseudo-code follows)

                    while not EoF(FI) do begin Readln(FI, S) ; // pascal
                    if ( ! /«string»/.test(S) ) continue // javascript
                    Writeln(FO, S) end ; // pascal




                    The OP has MSOE, which suggests Windows. If the job is to be run in
                    DOS, Windows, or UNIX, then the task is trivial using MiniTrue, which
                    IMHO is a most valuable tool. Example :

                    mtr -no~ jt.htm - e

                    will put, on standard output, all those lines of jt.htm which do not
                    contain the letter e. A RegExp can be used for the search, in place
                    of e. There may be a way of doing it without using standard output.

                    --
                    © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 MIME. ©
                    Web <URL:http://www.merlyn.demo n.co.uk/> - FAQish topics, acronyms, & links.
                    I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
                    free, DOS/Win/UNIX, <URL:http://www.idiotsdelig ht.net/minitrue/> Update soon?

                    Comment

                    • Mark Szlazak

                      #11
                      Re: Regular expression to exclude lines?

                      Try this to exclude lines that don't have "something" in them:

                      rx = /^(?:(?!\bsometh ing\b).)*$/gm;
                      output = input.replace(r x,'');



                      *** Sent via Developersdex http://www.developersdex.com ***
                      Don't just participate in USENET...get rewarded for it!

                      Comment

                      • Dr John Stockton

                        #12
                        Re: Regular expression to exclude lines?

                        JRS: In article <3fc1fe91$0$598 $44c9b20d@news3 .asahi-net.or.jp>, seen
                        in news:comp.lang. javascript, Shannon Jacobs <shanen@my-deja.com> posted
                        at Mon, 24 Nov 2003 21:50:22 :-
                        [color=blue]
                        >All I'm trying to do is delete the lines which don't contain a particular
                        >string. Actually a filter to edit a log file. I can find and replace a thing
                        >with null, but can't figure out how to find the lines which do not contain
                        >the thing.[/color]

                        Of course, for a *particular* string, not requiring a RegExp, DOS batch
                        provides the answer, and there must surely be a UNIX equivalent.

                        find "mystring" < old.log > new.log

                        It seems likely that someone has written a version of DOS find that
                        accepts RegExps; given RegExp code in library form, the job seems
                        trivial. UNIX has grep, which should do; and there are ports of grep to
                        DOS & Windows. Also, WSH has file I/O and RegExps, AIUI.

                        The important thing seems to be to not bother with a RegExp
                        substitution, but to work line-by-line and do a RegExp (or other) test
                        of acceptability.

                        --
                        © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk DOS 3.3, 6.20; Win98. ©
                        Web <URL:http://www.merlyn.demo n.co.uk/> - FAQqish topics, acronyms & links.
                        PAS EXE TXT ZIP via <URL:http://www.merlyn.demo n.co.uk/programs/00index.htm>
                        My DOS <URL:http://www.merlyn.demo n.co.uk/batfiles.htm> - also batprogs.htm.

                        Comment

                        • Thomas 'PointedEars' Lahn

                          #13
                          Re: Regular expression to exclude lines?

                          Dr John Stockton wrote:
                          [color=blue]
                          > Of course, for a *particular* string, not requiring a RegExp, DOS batch
                          > provides the answer, and there must surely be a UNIX equivalent.
                          >
                          > find "mystring" < old.log > new.log[/color]

                          grep -v 'mystring' old.log >new.log 2>&1

                          The single quotes are only required if special shell expressions are
                          used but not escaped. 2>&1 captures (error) messages in new.log, too.
                          If you do not want that, leave it out.


                          PointedEars

                          Comment

                          • Dr John Stockton

                            #14
                            Re: Regular expression to exclude lines?

                            JRS: In article <3FC4D461.30005 08@PointedEars. de>, seen in
                            news:comp.lang. javascript, Thomas 'PointedEars' Lahn
                            <PointedEars@we b.de> posted at Wed, 26 Nov 2003 17:27:13 :-[color=blue]
                            >Dr John Stockton wrote:
                            >[color=green]
                            >> Of course, for a *particular* string, not requiring a RegExp, DOS batch
                            >> provides the answer, and there must surely be a UNIX equivalent.
                            >>
                            >> find "mystring" < old.log > new.log[/color]
                            >
                            > grep -v 'mystring' old.log >new.log 2>&1
                            >
                            >The single quotes are only required if special shell expressions are
                            >used but not escaped. 2>&1 captures (error) messages in new.log, too.
                            >If you do not want that, leave it out.[/color]

                            I'm not aware of 2>&1 being valid in either DOS or Win98, and GREP is
                            not part of those systems but must be imported.

                            Why did you cut the part where I wrote "UNIX has grep, which should do;
                            and there are ports of grep to DOS & Windows." ?

                            IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
                            earlier post.

                            --
                            © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 MIME. ©
                            Web <URL:http://www.merlyn.demo n.co.uk/> - FAQish topics, acronyms, & links.
                            I find MiniTrue useful for viewing/searching/altering files, at a DOS prompt;
                            free, DOS/Win/UNIX, <URL:http://www.idiotsdelig ht.net/minitrue/> Update soon?

                            Comment

                            • Thomas 'PointedEars' Lahn

                              #15
                              Re: Regular expression to exclude lines?

                              Dr John Stockton wrote:
                              [color=blue]
                              > Thomas 'PointedEars' Lahn wrote:[color=green]
                              >>Dr John Stockton wrote:
                              >>[color=darkred]
                              >>> Of course, for a *particular* string, not requiring a RegExp, DOS batch
                              >>> provides the answer, and there must surely be a UNIX equivalent.
                              >>>
                              >>> find "mystring" < old.log > new.log[/color]
                              >>
                              >> grep -v 'mystring' old.log >new.log 2>&1
                              >>
                              >>The single quotes are only required if special shell expressions are
                              >>used but not escaped. 2>&1 captures (error) messages in new.log, too.
                              >>If you do not want that, leave it out.[/color]
                              >
                              > I'm not aware of 2>&1 being valid in either DOS or Win98,[/color]

                              It is only available in Cmd.exe of Windows NT-based systems and
                              Unices, generally saying in POSIX-compatible shells, of course.
                              [color=blue]
                              > and GREP is not part of those systems but must be imported.[/color]

                              Of course. My posting was only regarding "there must
                              surely be a UNIX equivalent". Here you are :)
                              [color=blue]
                              > Why did you cut the part where I wrote "UNIX has grep, which should do;
                              > and there are ports of grep to DOS & Windows." ?[/color]

                              Just oversaw it.
                              [color=blue]
                              > IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
                              > earlier post.[/color]

                              MiniTrue is not part of a basic installation of Unices,
                              though. (Instead, mtr refers to Matt's Traceroute.)


                              F'up2 poster

                              PointedEars

                              Comment

                              Working...