Regular expression to exclude lines?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Dr John Stockton

    #16
    Re: Regular expression to exclude lines?

    JRS: In article <3FC544FD.20501 03@PointedEars. de>, seen in
    news:comp.lang. javascript, Thomas 'PointedEars' Lahn
    <PointedEars@we b.de> posted at Thu, 27 Nov 2003 01:27:41 :-[color=blue]
    >Dr John Stockton wrote:
    >[color=green]
    >> Thomas 'PointedEars' Lahn wrote:[/color][/color]
    [color=blue][color=green][color=darkred]
    >>> grep -v 'mystring' old.log >new.log 2>&1
    >>>
    >>>The single quotes are only required if special shell expressions are
    >>>used but not escaped. 2>&1 captures (error) messages in new.log, too.
    >>>If you do not want that, leave it out.[/color]
    >>
    >> I'm not aware of 2>&1 being valid in either DOS or Win98,[/color]
    >
    >It is only available in Cmd.exe of Windows NT-based systems and
    >Unices, generally saying in POSIX-compatible shells, of course.[/color]

    There is no reason to assume that the OP is aware of that; or even of
    that part of that that is applicable to the system in question. a
    plausible but inapplicable or incorrect "solution" is worse than
    useless".

    [color=blue][color=green]
    >> IMHO, MiniTrue is more useful than GREP and SED; see my reply to your
    >> earlier post.[/color]
    >
    >MiniTrue is not part of a basic installation of Unices,
    >though. (Instead, mtr refers to Matt's Traceroute.)[/color]

    Indeed; nor of DOS; which is sufficiently clearly indicated by my
    signature to that article.

    --
    © John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 MIME ©
    Web <URL:http://www.uwasa.fi/~ts/http/tsfaq.html> -> Timo Salmi: Usenet Q&A.
    Web <URL:http://www.merlyn.demo n.co.uk/news-use.htm> : about usage of News.
    No Encoding. Quotes before replies. Snip well. Write clearly. Don't Mail News.

    Comment

    • Thomas 'PointedEars' Lahn

      #17
      Re: Regular expression to exclude lines?

      Dr John Stockton wrote:
      [color=blue]
      > Thomas 'PointedEars' Lahn wrote:[color=green]
      >> Dr John Stockton wrote:[color=darkred]
      >>> Thomas 'PointedEars' Lahn wrote:
      >>>> grep -v 'mystring' old.log >new.log 2>&1
      >>>>
      >>>> The single quotes are only required if special shell
      >>>> expressions are used but not escaped. 2>&1 captures (error)
      >>>> messages in new.log, too. If you do not want that, leave it
      >>>> out.
      >>>
      >>> I'm not aware of 2>&1 being valid in either DOS or Win98,[/color]
      >>
      >> It is only available in Cmd.exe of Windows NT-based systems and
      >> Unices, generally saying in POSIX-compatible shells, of course.[/color]
      >
      > There is no reason to assume that the OP is aware of that;[/color]

      There is no reason that she is interested in it, either.
      [color=blue]
      > or even of that part of that that is applicable to the system in
      > question.[/color]

      The system in question is unknown.
      [color=blue]
      > a plausible but inapplicable or incorrect "solution" is worse than
      > useless".[/color]

      This is a JavaScript newsgroup, not a shell script newsgroup. As the
      OP has not provided the system on which solutions are supposed to work,
      she will test if the provided solutions will work on her system.


      PointedEars

      Comment

      • Shannon Jacobs

        #18
        Re: Regular expression to exclude lines?

        Thomas 'PointedEars' Lahn wrote:[color=blue]
        > Dr John Stockton wrote:[/color]
        <snip>[color=blue][color=green]
        >> There is no reason to assume that the OP is aware of that;[/color]
        >
        > There is no reason that she is interested in it, either.[/color]
        <snip>

        Return of the original poster... Sorry I've been rather busy and haven't
        been able to follow this very interesting thread more closely. (The OP is
        male, by the way.) However, judging by the complexity of the discussion, it
        seems that there was some reason for my original perplexity, though I
        thought it a trivial notion.

        First let me try to clarify what I'm doing. I know that JavaScript has no
        access to the file system. The files are to be handled directly by the user
        of the utility. In the Windows environment, this is trivial with ^A, ^C, and
        ^V. I didn't mention that part because it's almost mindless now (for me).
        The actual steps for the file part are:

        Open the converter JavaScript form, then open the target file, ^A, ^C, click
        in the form, ^V, click on the convert button of the form, ^A, ^C, click back
        in the original file, ^A, ^V, and save the file. Done. (If anyone is curious
        and the results of these steps are not obvious enough, I can explain.)

        Now to clarify the JavaScript part. This example is from an existing utility
        that converts raw HTML into JavaScript. The variable HtmlText is the body of
        the file from an input field in the form. The critical function is:

        function jsFromHtml(Html Text) {
        HtmlText = HtmlText.replac e(/\"/g,"\\\"");
        HtmlText = HtmlText.replac e(/[\r\n]+/g,"\");\r\ndocu ment.writeln(\" ");
        return "document.write ln(\"" + HtmlText + "\");";
        }

        I know this is rather ugly code, and I'd also be interested in improvements,
        or even a completely different approach. My JavaScript skills are obviously
        rather limited, but this was adequate for my purposes at the time. Since
        it's probably not IOttMCO, I'll explain what it does. In the first
        executable line, the regular expression escapes all of the double quotes in
        the original HTML. In the next line, all of the embedded line breaks are
        replaced with the end and start of document.writel n statements, and then the
        last line puts one more start and end around the entire thing. The result is
        a block of JavaScript code which outputs the arbitrary HTML input. You stick
        that into a JavaScript function to create that block of HTML under program
        control wherever it is required. (I was especially unhappy with my treatment
        of line breaks, and believe this is not a properly general method, though it
        works.)

        My goal now is to do something similar, but excluding the lines that do not
        contain some string. I'm most interested in an elegant solution, though the
        discussion so far seems to suggest that there may be no better approach than
        parsing the input one line at a time...

        An additional wrinkle is that I'd like to generalize a bit by treating the
        decision string as a parameter returned in another field of the form.

        Comment

        • Lasse Reichstein Nielsen

          #19
          Re: Regular expression to exclude lines?

          "Shannon Jacobs" <shanen@my-deja.com> writes:
          [color=blue]
          > First let me try to clarify what I'm doing. I know that JavaScript has no
          > access to the file system. The files are to be handled directly by the user
          > of the utility. In the Windows environment, this is trivial with ^A, ^C, and
          > ^V.[/color]

          I have pages like that too, for colorizing HTML and Javascript :)
          [color=blue]
          > Now to clarify the JavaScript part. This example is from an existing utility
          > that converts raw HTML into JavaScript. The variable HtmlText is the body of
          > the file from an input field in the form. The critical function is:
          >
          > function jsFromHtml(Html Text) {
          > HtmlText = HtmlText.replac e(/\"/g,"\\\"");
          > HtmlText = HtmlText.replac e(/[\r\n]+/g,"\");\r\ndocu ment.writeln(\" ");
          > return "document.write ln(\"" + HtmlText + "\");";
          > }[/color]
          [color=blue]
          > I know this is rather ugly code, and I'd also be interested in improvements,
          > or even a completely different approach.[/color]

          I would use split:

          function jsFromHtml(Html Text) { // I would write HTML in all caps :)
          var inputLines = HtmlText.split(/[\r\n]+/);
          var outputLines = [];
          for (var i=0;i<inputLine s.length;i++) {
          var safeLine = inputLines[i].replace(/[\\"]/g,"\\$&");
          outputLines[i] = "document.write ln(\"" + safeLine + "\");" ;
          }
          return outputLine.join ("\n");
          }

          (I put a backslash in front of both double quotes and backslashes,
          since neither can occour alone in a string. If there are other
          characters that makes no sense in a string, they should be
          handled as well. Examples could be \t or \b).
          [color=blue]
          > My JavaScript skills are obviously rather limited,[/color]

          Not obviously. It works, and it's something I could find myself doing if
          I didn't have split available. It's possibly even faster to change all
          the quotes from the beginning instead of doing one replace per line.
          [color=blue]
          > but this was adequate for my purposes at the time. Since it's
          > probably not IOttMCO,[/color]

          You lost me there :) IOttMCO?
          [color=blue]
          > I'll explain what it does.[/color]

          It's fairly easy to read, as long as you can see what's inside a string
          and what's not :)
          [color=blue]
          > My goal now is to do something similar, but excluding the lines that do not
          > contain some string. I'm most interested in an elegant solution, though the
          > discussion so far seems to suggest that there may be no better approach than
          > parsing the input one line at a time...[/color]

          Nothing wrong with one line at a time. If you use the code I showed above,
          all you need is to wrap the content of the for loop in an if statement:

          if (!/badWord/.test(inputLine s[i])) {
          ... add to output ...
          }

          or

          if (inputLines[i].indexOf("badWo rd") == -1) {
          ... add to output ...
          }

          [color=blue]
          > An additional wrinkle is that I'd like to generalize a bit by treating the
          > decision string as a parameter returned in another field of the form.[/color]

          var testRE = RegExp(form.ele ments['otherField'].value);
          if (! testRE.test(inp utLines[i])) {
          ...
          }

          (To avoid problems or crashes, you might want to screen the other field's
          values for characters that are meaningful in regular expressions)
          or

          var testWord = form.elements['otherField'].value;
          if (inputLines[i].indexOf(testWo rd)==-1) {
          ... add to output ...
          }


          Good luck
          /L
          --
          Lasse Reichstein Nielsen - lrn@hotpop.com
          DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleD OM.html>
          'Faith without judgement merely degrades the spirit divine.'

          Comment

          • Thomas 'PointedEars' Lahn

            #20
            Re: Regular expression to exclude lines?

            Lasse Reichstein Nielsen wrote:
            [color=blue]
            > "Shannon Jacobs" <shanen@my-deja.com> writes:[color=green]
            >> but this was adequate for my purposes at the time. Since it's
            >> probably not IOttMCO,[/color]
            >
            > You lost me there :) IOttMCO?[/color]

            Intuitively Obvious to the Most Casual Observer

            http://babylon.com/ (the tool) or http://online.babylon.com/combo/
            (the website) are sometimes quite handy :)


            HTH

            PointedEars

            Comment

            • Mark Szlazak

              #21
              Re: Regular expression to exclude lines?

              "Shannon Jacobs" <shanen@my-deja.com> wrote in message news:<3fc7df81$ 0$11217$44c9b20 d@news3.asahi-net.or.jp>...
              [color=blue]
              > My goal now is to do something similar, but excluding the lines that do not
              > contain some string. I'm most interested in an elegant solution, though the
              > discussion so far seems to suggest that there may be no better approach than
              > parsing the input one line at a time...
              >
              > An additional wrinkle is that I'd like to generalize a bit by treating the
              > decision string as a parameter returned in another field of the form.[/color]

              I've tried twice to post a simple solution through Developers Dex but
              they haven't appeared in about two days. I'm assuming they're lost.
              Anyway, my previous post that did appear starts to point you to a
              solution. It doesn't require a seperate process to break up lines and
              it works. To remove lines without the substring "something" in them,
              here's that solution again.

              rx = /^(?:(?!\bsometh ing\b).)*$/gm;
              outText = inText.replace( rx,'');

              To make this regular expression dynamic, use the RegExp object
              constuctor.

              skip = 'something';
              pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
              rx = new RegExp(pattern, 'gm');
              outText = inText.replace( rx,'');

              Also, one of your posts talks about linefeeds and the \r\n pattern.
              This is OS dependent and linefeeds could also be just \r or \n.

              Comment

              • Shannon Jacobs

                #22
                Re: Regular expression to exclude lines?

                Mark Szlazak wrote:
                <snip of lengthy text describing goal of deleting lines that do not include
                a key string>[color=blue]
                >
                > rx = /^(?:(?!\bsometh ing\b).)*$/gm;
                > outText = inText.replace( rx,'');
                >
                > To make this regular expression dynamic, use the RegExp object
                > constuctor.
                >
                > skip = 'something';
                > pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
                > rx = new RegExp(pattern, 'gm');
                > outText = inText.replace( rx,'');
                >
                > Also, one of your posts talks about linefeeds and the \r\n pattern.
                > This is OS dependent and linefeeds could also be just \r or \n.[/color]

                Below is the working code. I'm extremely obliged I hope the embedded
                acknowledgment is sufficient, even though I don't expect to actively
                broadcast the code. You're certainly a guru in my JavaScript book. The only
                real change I had to make was the thing at the end to include the ends of
                the lines. Your original version left a blank line, while I wanted to remove
                those lines completely. By the way, I tested an earlier non-dynamic version
                with Opera and it worked fine. I'll test the dynamic version tomorrow.

                My main regret is that I still don't fully understand how it works... Rather
                embarrassing, but looks like I'll have to break out the Perl manual
                tomorrow.

                function keepSelectedLin es(keepString, blockOfText) {
                // based on tips from Mark Szlazak
                pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
                rx = new RegExp(pattern, 'gm');
                blockOfText = blockOfText.rep lace(rx,'');
                return blockOfText;
                }

                Comment

                • Thomas 'PointedEars' Lahn

                  #23
                  Re: Regular expression to exclude lines?

                  Shannon Jacobs wrote:
                  [color=blue]
                  > Below is the working code. [...]
                  > My main regret is that I still don't fully understand how it works... Rather
                  > embarrassing, but looks like I'll have to break out the Perl manual
                  > tomorrow.[/color]

                  Why, see the Reference:


                  [color=blue]
                  > function keepSelectedLin es(keepString, blockOfText) {
                  > // based on tips from Mark Szlazak
                  > pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';[/color]

                  This string literal contains a notation later to be used to create
                  a Regular Expression (RegExp object) that matchesthe beginning of
                  text (^) followed by none or more than one occurrences (*) of the following:

                  Match the following but don't remember the match (/?:/):
                  Match the previous only if the following does _not_ match (/?!/,
                  negative lookahead): Word boundary ("\\b" becoming /\b/) followed
                  by the value of `keepString' followed by a word boundary, followed
                  by any single character except the newline character (/./).

                  The above should match only if it is followed by the end of the
                  text followed by none or more than one occurrences (*) of any of
                  the characters ([...]) \r (carriage return) and \n (linefeed).
                  [color=blue]
                  > rx = new RegExp(pattern, 'gm');[/color]

                  This creates a RegExp object from the above string literal, matching
                  it on every single line instead of on the whole text ('m'; consider
                  multiline input), having /^/ and /$/ match the beginning and the end
                  of line instead of the beginning and the end of text, and matches all
                  occurrences, not only the first one ('g'; global).

                  However, it should be noted that it fails if the above string literal,
                  especially the value of the `keepString' argument, contains
                  single-escaped or certain double-escaped sequences, e.g. "C:\blurb"
                  which would then result in /C:blurb/mg in the RegExp, meaning "\b" as
                  the literal character `b', or "C:\\blurb" which would result in
                  /C:\blurb/mg, meaning /\b/ as word boundary. For this function, an input
                  of "C:\\\\blur b" would have to be used to get /C:\\blurb/ in the RegExp,
                  having /\\/ to match the literal backslash character (`\'), as it was
                  intended.

                  (AFAIS there is no general method with JavaScript to convert a string so
                  that it can be used as argument for the RegExp constructor function with
                  the resulting RegExp to match the string; simply inserting backslashes
                  will obviously not work as supposed in all cases.)
                  [color=blue]
                  > blockOfText = blockOfText.rep lace(rx,'');[/color]

                  Replaces matches of `rx' with the empty string (i.e. deletes the
                  matching substrings).
                  [color=blue]
                  > return blockOfText;[/color]

                  Returns the changed text.
                  [color=blue]
                  > }[/color]

                  HTH

                  PointedEars

                  Comment

                  • Shannon Jacobs

                    #24
                    Re: Regular expression to exclude lines?

                    Mark Szlazak wrote:
                    <snip of lengthy text describing goal of deleting lines that do not
                    include
                    a key string>[color=blue]
                    >
                    > rx = /^(??!\bsomethin g\b).)*$/gm;
                    > outText = inText.replace( rx,'');
                    >
                    > To make this regular expression dynamic, use the RegExp object
                    > constuctor.
                    >
                    > skip = 'something';
                    > pattern = '^(??!\\b' + skip + '\\b).)*$';
                    > rx = new RegExp(pattern, 'gm');
                    > outText = inText.replace( rx,'');
                    >
                    > Also, one of your posts talks about linefeeds and the \r\n pattern.
                    > This is OS dependent and linefeeds could also be just \r or \n.[/color]

                    Below is the working code. I'm extremely obliged and I hope the
                    embedded acknowledgment is sufficient, even though I don't expect to
                    actively broadcast the code. You're certainly a guru in my JavaScript
                    book. The only real change I had to make was the thing at the end to
                    include the ends of the lines. Your original version left a blank
                    line, while I wanted to remove those lines completely. By the way, I
                    tested an earlier non-dynamic version with Opera and it worked fine.
                    I'll test the dynamic version tomorrow.

                    My main regret is that I still don't fully understand how it works...
                    Rather embarrassing, but looks like I'll have to break out the Perl
                    manual tomorrow. [Actually, I did look at the manual, and still don7t
                    understand all of it, though I feel like the pair of \b is not really
                    required?]

                    function keepSelectedLin es(keepString, blockOfText) {
                    // based on tips from Mark Szlazak
                    pattern = '^(??!\\b' + keepString + '\\b).)*$[\r\n]*';
                    rx = new RegExp(pattern, 'gm');
                    blockOfText = blockOfText.rep lace(rx,'');
                    return blockOfText;
                    }

                    (Apologies if this post appears twice, but something strange is going
                    on here... My newsreader definitely thinks I posted this reply
                    yesterday, but it seems to have disappeared, just as Mr. Szlazak
                    reported some of his posts had disppeared. I rather suspect that the
                    spammers efforts are resulting in so much newsgroup pollution that
                    non-spam posts are getting caught in the crossfire. Hopefully the
                    Google routing will work better.)

                    Comment

                    • Mark Szlazak

                      #25
                      Re: Regular expression to exclude lines?

                      "Shannon Jacobs" <shanen@my-deja.com> wrote in message news:<3fcdde58$ 0$11227$44c9b20 d@news3.asahi-net.or.jp>...[color=blue]
                      > Mark Szlazak wrote:
                      > <snip of lengthy text describing goal of deleting lines that do not include
                      > a key string>[color=green]
                      > >
                      > > rx = /^(?:(?!\bsometh ing\b).)*$/gm;
                      > > outText = inText.replace( rx,'');
                      > >
                      > > To make this regular expression dynamic, use the RegExp object
                      > > constuctor.
                      > >
                      > > skip = 'something';
                      > > pattern = '^(?:(?!\\b' + skip + '\\b).)*$';
                      > > rx = new RegExp(pattern, 'gm');
                      > > outText = inText.replace( rx,'');
                      > >
                      > > Also, one of your posts talks about linefeeds and the \r\n pattern.
                      > > This is OS dependent and linefeeds could also be just \r or \n.[/color]
                      >
                      > Below is the working code. I'm extremely obliged I hope the embedded
                      > acknowledgment is sufficient, even though I don't expect to actively
                      > broadcast the code. You're certainly a guru in my JavaScript book. The only
                      > real change I had to make was the thing at the end to include the ends of
                      > the lines. Your original version left a blank line, while I wanted to remove
                      > those lines completely. By the way, I tested an earlier non-dynamic version
                      > with Opera and it worked fine. I'll test the dynamic version tomorrow.
                      >
                      > My main regret is that I still don't fully understand how it works... Rather
                      > embarrassing, but looks like I'll have to break out the Perl manual
                      > tomorrow.
                      >
                      > function keepSelectedLin es(keepString, blockOfText) {
                      > // based on tips from Mark Szlazak
                      > pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
                      > rx = new RegExp(pattern, 'gm');
                      > blockOfText = blockOfText.rep lace(rx,'');
                      > return blockOfText;
                      > }[/color]

                      NOTE: I've tried posting this in two previous replies which again seem
                      to be lost.

                      Thanks Shannon! This regex isn't original and it's probably more
                      commonly known among Perl programmers.

                      I have a suggestion. If you what to consume the linefeeds then you
                      don't need $ in the regex. Change $[\r\n]* to [\r\n]+

                      Here's how I think about this regex. Starting at a position before the
                      first character of the string, the negative lookahead checks if its
                      substring isn't present, if not then the "dot" matches any character
                      except linefeeds and moves us to a new position just after that
                      character. This is repeated until the end of the line unless the
                      negative lookaheads subpattern is found and thus no match. Now, the
                      caret ^ at the beginning of the regex eliminates "bump-alongs" when
                      the negative lookaheads subpattern is found. What happens is the regex
                      engine will do our scanning all over again except from the next
                      position in the line. Again, if regex match isn't found (e.g.,
                      lookaheads subpattern is found) then it bumps-along to start at the
                      next position, re-does the scan, and this bumping-along could continue
                      to the end of the line.

                      You want to suppress this because it's not needed, it will not match
                      the entire line, and it will cause false matches when the engine moves
                      past say "s" in "something" to start scanning from "omething.. ." in a
                      negative lookahead that has "something" as it's subpattern.

                      At least I think that's how this works ;-)

                      Comment

                      • Shannon Jacobs

                        #26
                        Re: Regular expression to exclude lines?

                        Not sure what to make of it, but my original post showed up again after a
                        couple of days. Maybe server problems at my end?

                        Mark Szlazak wrote:
                        <snip>[color=blue][color=green][color=darkred]
                        >>> Also, one of your posts talks about linefeeds and the \r\n pattern.
                        >>> This is OS dependent and linefeeds could also be just \r or \n.[/color][/color][/color]
                        <snip>
                        [My first derived version][color=blue][color=green]
                        >> function keepSelectedLin es(keepString, blockOfText) {
                        >> // based on tips from Mark Szlazak
                        >> pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
                        >> rx = new RegExp(pattern, 'gm');
                        >> blockOfText = blockOfText.rep lace(rx,'');
                        >> return blockOfText;
                        >> }[/color]
                        > I have a suggestion. If you what to consume the linefeeds then you
                        > don't need $ in the regex. Change $[\r\n]* to [\r\n]+[/color]

                        I'll probably try that suggestion, but I already went in and removed the \b
                        pair. I'm not sure why you recommended those. Actually, the first person I
                        showed it to also wanted to be able to do two keys at a time. That turned
                        out to be easy by entering the keepString as:

                        (key1|key2)

                        However, I did run into one problem already... The operation is inconsistent
                        with Japanese, which uses a DBCS (part of the time). I suspected it might be
                        one of those byte-alignment problems, but that doesn't seem to make sense if
                        the regexp is trying to match from every byte position...

                        And thanks for the explanation of how it works. Already seen a couple, but
                        that seems to be another aspect of regexp newsgroups?

                        Comment

                        • Mark Szlazak

                          #27
                          Re: Regular expression to exclude lines?

                          "Shannon Jacobs" <shanen@my-deja.com> wrote in message news:<3fd0705c$ 0$11073$44c9b20 d@news3.asahi-net.or.jp>...[color=blue]
                          > Not sure what to make of it, but my original post showed up again after a
                          > couple of days. Maybe server problems at my end?
                          >
                          > Mark Szlazak wrote:
                          > <snip>[color=green][color=darkred]
                          > >>> Also, one of your posts talks about linefeeds and the \r\n pattern.
                          > >>> This is OS dependent and linefeeds could also be just \r or \n.[/color][/color]
                          > <snip>
                          > [My first derived version][color=green][color=darkred]
                          > >> function keepSelectedLin es(keepString, blockOfText) {
                          > >> // based on tips from Mark Szlazak
                          > >> pattern = '^(?:(?!\\b' + keepString + '\\b).)*$[\r\n]*';
                          > >> rx = new RegExp(pattern, 'gm');
                          > >> blockOfText = blockOfText.rep lace(rx,'');
                          > >> return blockOfText;
                          > >> }[/color]
                          > > I have a suggestion. If you what to consume the linefeeds then you
                          > > don't need $ in the regex. Change $[\r\n]* to [\r\n]+[/color]
                          >
                          > I'll probably try that suggestion, but I already went in and removed the \b
                          > pair. I'm not sure why you recommended those. Actually, the first person I
                          > showed it to also wanted to be able to do two keys at a time. That turned
                          > out to be easy by entering the keepString as:
                          >
                          > (key1|key2)
                          >
                          > However, I did run into one problem already... The operation is inconsistent
                          > with Japanese, which uses a DBCS (part of the time). I suspected it might be
                          > one of those byte-alignment problems, but that doesn't seem to make sense if
                          > the regexp is trying to match from every byte position...
                          >
                          > And thanks for the explanation of how it works. Already seen a couple, but
                          > that seems to be another aspect of regexp newsgroups?[/color]

                          The \b's are for word boundaries. See what happens when one line
                          has "Java" but not "JavaScript " and another line has "JavaScript "
                          but not "Java" with this negative lookahead (?!Java)

                          JavaScript 1.5 regular expressions are undefined for many unicode
                          characters and Japanese characters. However, you can specify unicode
                          character ranges by hex. The following regex would filter Katakana
                          letters when using the Japanese encoding of this table,


                          katakana = /[\uff65-\uff9f]/;

                          Comment

                          Working...