Regex problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Razvan

    Regex problem

    Hello there,

    I have the following problem:
    I have a big html and i want to remove from it everything between some
    tags and to keep the rest, of course using regex, but any solution
    will be great.
    The number and type of tags may vary. Here is an example:

    <body>
    text text text text text text text
    text text text
    text text text text

    <remove1>
    text text text text text text
    text text
    text
    text text text
    </remove1>

    text text text
    text text

    <remove1>
    text text text text
    </remove1>

    text text
    text text
    text text text

    <remove2>
    text text text text text
    text text text
    text text
    </remove2>

    text text text text text
    text text text text
    </body>

    Any suggestions will be appreciated !
    Thanks.

  • Alan

    #2
    Re: Regex problem


    "Razvan" <defconhaya@gma il.comwrote in message
    news:1174729037 .379978.230910@ o5g2000hsb.goog legroups.com...
    Hello there,
    >
    I have the following problem:
    I have a big html and i want to remove from it everything between some
    tags and to keep the rest, of course using regex, but any solution
    will be great.
    The number and type of tags may vary. Here is an example:
    >
    <body>
    text text text text text text text
    text text text
    text text text text
    >
    <remove1>
    text text text text text text
    text text
    text
    text text text
    </remove1>
    >
    text text text
    text text
    >
    <remove1>
    text text text text
    </remove1>
    >
    text text
    text text
    text text text
    >
    <remove2>
    text text text text text
    text text text
    text text
    </remove2>
    >
    text text text text text
    text text text text
    </body>
    >
    Any suggestions will be appreciated !
    Thanks.
    >
    regex search and replace with <(/?[^\>]+)and "" leaves just your text text
    text etc

    Possible some flavours may need escaping: \<(/?[^\>]+)\>
    hth

    Alan


    Comment

    • Razvan

      #3
      Re: Regex problem

      On Mar 24, 1:45 pm, "Alan" <a...@spamless. netwrote:
      "Razvan" <defconh...@gma il.comwrote in message
      >
      news:1174729037 .379978.230910@ o5g2000hsb.goog legroups.com...
      >
      >
      >
      Hello there,
      >
      I have the following problem:
      I have a big html and i want to remove from it everything between some
      tags and to keep the rest, of course using regex, but any solution
      will be great.
      The number and type of tags may vary. Here is an example:
      >
      <body>
      text text text text text text text
      text text text
      text text text text
      >
      <remove1>
      text text text text text text
      text text
      text
      text text text
      </remove1>
      >
      text text text
      text text
      >
      <remove1>
      text text text text
      </remove1>
      >
      text text
      text text
      text text text
      >
      <remove2>
      text text text text text
      text text text
      text text
      </remove2>
      >
      text text text text text
      text text text text
      </body>
      >
      Any suggestions will be appreciated !
      Thanks.
      >
      regex search and replace with <(/?[^\>]+)and "" leaves just your text text
      text etc
      >
      Possible some flavours may need escaping: \<(/?[^\>]+)\>
      hth
      >
      Alan
      i dont understand what are you trying to say. i want to remove
      everything between <removeXand </removeXincludin g tags.

      Comment

      • Alan

        #4
        Re: Regex problem


        "Razvan" <defconhaya@gma il.comwrote in message
        news:1174814928 .079661.94330@n 76g2000hsh.goog legroups.com...
        On Mar 24, 1:45 pm, "Alan" <a...@spamless. netwrote:
        >"Razvan" <defconh...@gma il.comwrote in message
        >>
        >news:117472903 7.379978.230910 @o5g2000hsb.goo glegroups.com.. .
        >>
        >>
        >>
        Hello there,
        >>
        I have the following problem:
        I have a big html and i want to remove from it everything between some
        tags and to keep the rest, of course using regex, but any solution
        will be great.
        The number and type of tags may vary. Here is an example:
        >>
        <body>
        text text text text text text text
        text text text
        text text text text
        >>
        <remove1>
        text text text text text text
        text text
        text
        text text text
        </remove1>
        >>
        text text text
        text text
        >>
        <remove1>
        text text text text
        </remove1>
        >>
        text text
        text text
        text text text
        >>
        <remove2>
        text text text text text
        text text text
        text text
        </remove2>
        >>
        text text text text text
        text text text text
        </body>
        >>
        Any suggestions will be appreciated !
        Thanks.
        >>
        >regex search and replace with <(/?[^\>]+)and "" leaves just your text
        >text
        >text etc
        >>
        >Possible some flavours may need escaping: \<(/?[^\>]+)\>
        >hth
        >>
        >Alan
        >
        i dont understand what are you trying to say. i want to remove
        everything between <removeXand </removeXincludin g tags.
        >
        Sorry, didn't read your post carefully enough. As no other response,
        perhaps this may help:

        Similar to your original:

        <body>
        text text text text text text text
        text text text
        text text text text

        <remove1>
        text text text text text text
        text text
        text
        text text text
        </remove1>

        text text text
        text text

        <anotherremove1 >
        text text text text
        </anotherremove1>

        text text
        text text
        text text text

        <remove2>
        text text text text text
        text text text
        text text
        </remove2>

        text text text text text
        text text text text
        </body>

        Processing this with basically:

        (?<=<[ra])(.+\s)+|<[ra]

        eg: php processing the file with
        $RegStr = '/(?<=<[ra])(.+\s)+|<[ra]/mi';
        $OutStr = preg_replace($R egStr,"",$TstSt r);
        with $TstStr containing the file contents.

        will do what you (I think!) want.
        Outputs

        <body>
        text text text text text text text
        text text text
        text text text text


        text text text
        text text


        text text
        text text
        text text text


        text text text text text
        text text text text
        </body>


        You will need to define the contents of the [ ] enough to identify the
        tags and contents you want to remove. Don't know whether this is the best
        (simplest?) way to achieve what you want.

        If you process the file with a regex search and replace, it will need a
        positive look behind assertion capability.

        hth
        Alan


        Comment

        Working...