Search/replace patterns in web pages?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jane Doe

    Search/replace patterns in web pages?

    Hi,

    I need to search and replace patterns in web pages, but I
    can't find a way even after reading the ad hoc chapter in New Rider's
    "Inside JavaScript".

    Here's what I want to do:

    function filter() {
    var items = new Array("John", "Jane");

    for (x = 0; x < items.length; x++) {
    //Doesn't work
    pattern = '/' + items[x] + '/';
    //Doesn't work either
    document.body = document.body.r eplace(pattern, "IGNORED");
    }

    ie., create an array of items to look for in the BODY section of the
    page, and if any item exists, replace the item with IGNORED.

    Anyone knows how to do this?

    Thank you
    JD.
  • Lasse Reichstein Nielsen

    #2
    Re: Search/replace patterns in web pages?

    Jane Doe <jane.doe@acme. com> writes:
    [color=blue]
    > Hi,
    >
    > I need to search and replace patterns in web pages, but I
    > can't find a way even after reading the ad hoc chapter in New Rider's
    > "Inside JavaScript".
    >
    > Here's what I want to do:
    >
    > function filter() {
    > var items = new Array("John", "Jane");
    >
    > for (x = 0; x < items.length; x++) {
    > //Doesn't work
    > pattern = '/' + items[x] + '/';[/color]

    This builds a string. (Make pattern a local variable with the "var" operator,
    no need to have it global).
    [color=blue]
    > //Doesn't work either
    > document.body = document.body.r eplace(pattern, "IGNORED");[/color]

    The object document.body is a DOM Node, not a text string.
    What you can do, in some browsers, is to work on
    document.body.i nnerHTML.

    Also, change "pattern" to "new RegExp(items[x],'')" in this line. Then
    you have created a regular expression with the name as content.

    There is no need to run through all the items on at a time.
    You can replace the entire for loop with

    document.body.i nnerHTML =
    document.body.i nnerHTML.replac e(new RegExp(items.jo in("|"),""),"IG NORED");

    (This way, the regualr expression becomes "John|Jane" . Since you replace
    them with the same string, you can just match them at the same time.

    /L
    --
    Lasse Reichstein Nielsen - lrn@hotpop.com
    Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit. html>
    'Faith without judgement merely degrades the spirit divine.'

    Comment

    • Jane Doe

      #3
      Re: Search/replace patterns in web pages?

      On 09 Sep 2003 14:03:34 +0200, Lasse Reichstein Nielsen
      <lrn@hotpop.com > wrote:[color=blue]
      > document.body.i nnerHTML =
      > document.body.i nnerHTML.replac e(new RegExp(items.jo in("|"),""),"IG NORED");[/color]

      Thx a bunch Lasse for the prompt answer :-) It looks like a much
      better solution, although I'll still have to find out the following:

      1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
      might not work with Netscape

      2. Only the first occurence of the pattern is replace, ie. if I have
      (John|Jane), and those items both appear in the page, only the first
      occurence is replaced (the second is ignored). I assume I need to add
      /g somewhere to tell JS to search & replace _all_ occurences

      3. I'm actually parsing rows in a table, so need to construct a more
      complicated search pattern than the one I gave to get started. The
      goal is to replace any row that contains any of the items into an
      empty row (ie.
      <tr><td>&nbsp ;</td><td>&nbsp;</td><td>&nbsp;</td><td>&nbsp;</td></tr>).

      FWIW, here's what I'd like to do:

      ---------
      function clean() {
      var items = new Array("John", "Jane");
      document.body.i nnerHTML = document.body.i nnerHTML.replac e(new
      RegExp(items.jo in("|"),""),"IG NORED");
      }

      [...]

      <body onload='clean() ()'>

      <table>
      <tr>
      <td bgcolor="#FFFFF F" ><a
      href="forum.php ?forum=myforum& m=123">Title</a></td>
      <td bgcolor="#FFFFF F">John</td>
      <td bgcolor="#FFFFF F">10</td>
      <td bgcolor="#FFFFF F">Posted 13 sept</td>
      </tr>
      <tr>
      <td bgcolor="#FFFFF F" ><a
      href="forum.php ?forum=myforum& m=124">Title</a></td>
      <td bgcolor="#FFFFF F">Jane</td>
      <td bgcolor="#FFFFF F">2</td>
      <td bgcolor="#FFFFF F">Posted 12 sept</td>
      </tr>
      </table>

      ---------

      If you have any idea or sample code on the Net swhere, I'm interested
      :-)

      Thx again for your help
      JD.

      Comment

      • Lasse Reichstein Nielsen

        #4
        Re: Search/replace patterns in web pages?

        Jane Doe <jane.doe@acme. com> writes:
        [color=blue]
        > On 09 Sep 2003 14:03:34 +0200, Lasse Reichstein Nielsen
        > <lrn@hotpop.com > wrote:[color=green]
        > > document.body.i nnerHTML =
        > > document.body.i nnerHTML.replac e(new RegExp(items.jo in("|"),""),"IG NORED");[/color]
        >
        > Thx a bunch Lasse for the prompt answer :-) It looks like a much
        > better solution, although I'll still have to find out the following:
        >
        > 1. innerHTML only seems to work in IE. Doesn't work with Opera 5 and
        > might not work with Netscape[/color]

        It works in IE 4+, Opera 7 and Mozilla. Perhas a few other recent
        browsers. Any older browsers are out.

        On the other hand, Netscape 4 and Opera 6 will not allow you to change
        the contents of the page at all, after it is loaded. So there is no
        method that works there.

        If you can ignore IE 4, I would prefer to use DOM methods, traversing
        the DOM tree and changing the text in the text nodes.
        [color=blue]
        >
        > 2. Only the first occurence of the pattern is replace, ie. if I have
        > (John|Jane), and those items both appear in the page, only the first
        > occurence is replaced (the second is ignored). I assume I need to add
        > /g somewhere to tell JS to search & replace _all_ occurences[/color]

        Doh. Yes, the place to add the "g" is in the second argument to RegExp
        (currently an empty string, make it "g", and perhaps even "gi").
        Also notice that you match even inside words, so Johnson becomes
        IGNOREDson. You can fix that, by making the regular expression

        new RegExp("\\b("+i tems.join("|")+ ")\\b","gi" );

        The "\b" matches the boundary between a word character and a non-word
        character, so it won't match after "John" in "Johnson".
        [color=blue]
        > 3. I'm actually parsing rows in a table, so need to construct a more
        > complicated search pattern than the one I gave to get started.[/color]

        It is sometimes easier to split the problem into more than one regular
        expression. E.g., one to find a table row, another to test whether
        it contains the forbidden words. You can alway combine them, they might
        just be horribly much bigger.
        [color=blue]
        > function clean() {[/color]

        Ok. If we only aim at newer browsers, try this:

        function clean() {
        var body = document.body.i nnerHTML;
        var itemRE = new RegExp("\\b("+i tems.join("|")+ ")\\b","gi" );
        body = body.replace(/<tr(.|\s)*?<\/tr>/gi,function(row ) {
        if (row.match(item RE)) {
        return "";
        } else {
        return row;
        }
        });
        document.body.i nnerHTML = body;
        }

        it replace each table row (from "<tr" to "</tr>") with either
        itself or the empty string, depending on whether the row
        contains the words in the "items" array.

        /L
        --
        Lasse Reichstein Nielsen - lrn@hotpop.com
        Art D'HTML: <URL:http://www.infimum.dk/HTML/randomArtSplit. html>
        'Faith without judgement merely degrades the spirit divine.'

        Comment

        • Jane Doe

          #5
          Re: Search/replace patterns in web pages?

          On 09 Sep 2003 14:56:54 +0200, Lasse Reichstein Nielsen
          <lrn@hotpop.com > wrote:[color=blue]
          >Ok. If we only aim at newer browsers, try this:[/color]

          You're awesome :-) Works like a charm. I owe you dinner next time
          you're in town.

          Thx again
          JD.

          Comment

          Working...