Any regex pro's out there?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • nel

    Any regex pro's out there?

    I have two tags:
    <!--// Remove Begin //--and <!--// Remove End //-->

    I want to use regi_replace() to remove everything between these tags.

    The thing is, these tags can be repeated throughout the code.

    <!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
    if the tags exists once. Otherwise, it parses out everything between
    the first <!--// Remove Begin //--and the last <!--// Remove End //--
    >.
    How could i modify this so that it will...

    convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //--
    >ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
    into: aaaccc


    ??

  • shimmyshack

    #2
    Re: Any regex pro's out there?

    On Jun 5, 2:18 am, nel <NajibKa...@gma il.comwrote:
    I have two tags:
    <!--// Remove Begin //--and <!--// Remove End //-->
    >
    I want to use regi_replace() to remove everything between these tags.
    >
    The thing is, these tags can be repeated throughout the code.
    >
    <!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
    if the tags exists once. Otherwise, it parses out everything between
    the first <!--// Remove Begin //--and the last <!--// Remove End //--
    >
    .
    >
    How could i modify this so that it will...
    >
    convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //-->ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
    >
    into: aaaccc
    >
    ??
    you need to add a rule: "remove .+ but not if .+ contains the end
    marker <!--// Remove End //-->
    I am assuming you are doing this inside a [webpage?] where < or
    possibly <!-- will be present WITHIN the sections to be removed, so
    bbb
    could be
    <!--comment--><b>hello</>bb
    If html will not be present you could simply use a NOT instruction to
    look for <
    [^<]+

    Comment

    • shimmyshack

      #3
      Re: Any regex pro's out there?

      On Jun 5, 2:18 am, nel <NajibKa...@gma il.comwrote:
      I have two tags:
      <!--// Remove Begin //--and <!--// Remove End //-->
      >
      I want to use regi_replace() to remove everything between these tags.
      >
      The thing is, these tags can be repeated throughout the code.
      >
      <!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
      if the tags exists once. Otherwise, it parses out everything between
      the first <!--// Remove Begin //--and the last <!--// Remove End //--
      >
      .
      >
      How could i modify this so that it will...
      >
      convert: aaa<!--// Remove Begin //-->bbb <!--// Remove End //-->ccc<!--// Remove Begin //-->ddd<!--// Remove End //-->
      >
      into: aaaccc
      >
      ??
      i should have added, google for ungreedy U switch - your matching is
      too greedy, and slurps up one giant match rather than many "least"
      matches

      Comment

      • Rik

        #4
        Re: Any regex pro's out there?

        On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.farey@gma il.com>
        wrote:
        On Jun 5, 2:18 am, nel <NajibKa...@gma il.comwrote:
        >I have two tags:
        ><!--// Remove Begin //--and <!--// Remove End //-->
        >>
        >I want to use regi_replace() to remove everything between these tags.
        >>
        >The thing is, these tags can be repeated throughout the code.
        >>
        ><!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
        >if the tags exists once. Otherwise, it parses out everything between
        >the first <!--// Remove Begin //--and the last <!--// Remove End //--
        >>
        >How could i modify this so that it will...
        >
        i should have added, google for ungreedy U switch - your matching is
        too greedy, and slurps up one giant match rather than many "least"
        matches
        Or just use the ? modifier:
        preg_replace('| <!--// Remove Begin //-->.*?<!--// Remove End
        //-->|si','',$strin g);

        --
        Rik Wasmus

        Comment

        • Mike P2

          #5
          Re: Any regex pro's out there?

          On Jun 5, 11:20 am, Rik <luiheidsgoe... @hotmail.comwro te:
          On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.fa...@gma il.com>
          wrote:
          On Jun 5, 2:18 am, nel <NajibKa...@gma il.comwrote:
          I have two tags:
          <!--// Remove Begin //--and <!--// Remove End //-->
          I want to use regi_replace() to remove everything between these tags.
          The thing is, these tags can be repeated throughout the code.
          <!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
          if the tags exists once. Otherwise, it parses out everything between
          the first <!--// Remove Begin //--and the last <!--// Remove End //--
          How could i modify this so that it will...
          i should have added, google for ungreedy U switch - your matching is
          too greedy, and slurps up one giant match rather than many "least"
          matches
          Or just use the ? modifier:
          preg_replace('| <!--// Remove Begin //-->.*?<!--// Remove End
          //-->|si','',$strin g);
          --
          Rik Wasmus
          Just a side note to nel, if you are going to use shimmyshack's U
          modifier you have to use PCRE instead as Rik is doing, and be sure not
          to copy Rik's exact pattern unless you switch because you are using
          PHP's built in regex functions.

          At least, I think you are using PHP's built-in regex stuff, assuming
          that by regi_replace() you mean eregi_replace()

          -Mike PII

          Comment

          • nel

            #6
            Re: Any regex pro's out there?

            Yep thanks, I realized that when I googled "U modifier".

            This is what I'm using in case anyone wants to know:

            //first replaces any line breaks with a token
            identifier since preg_replace doesn't work with multiply lines
            $cleaned_conten t = str_ireplace("\ n","<!--// New Line //-->",
            $content);
            //this creates our regular query sequence... perl-
            stylezzz
            $reg = '/<!--\/\/ Remove Begin \/\/-->(.+)<!--\/\/ Remove End \/\/--
            >/U';
            $cleaned_conten t = preg_replace($r eg,"",$cleaned_ content);
            //now just put our line breaks back into place
            $cleaned_conten t = str_ireplace("< !--// New Line //-->","\n",
            $cleaned_conten t);

            The above code will replace everything in my string (which I pulled
            from an HTML file) with all the <!--// Remove Begin //--tags and
            <!--// Remove End //--and anything in between them removed!

            The U modifier solved my problem where It was making "abxxab" into ""
            when it's supposed to replace everything between a and b instead of
            making "abxxab" into "xx";

            Thanks again!
            -nel


            On Jun 5, 8:04 pm, Mike P2 <sumguyovrt...@ gmail.comwrote:
            On Jun 5, 11:20 am, Rik <luiheidsgoe... @hotmail.comwro te:
            >
            >
            >
            On Tue, 05 Jun 2007 03:56:05 +0200, shimmyshack <matt.fa...@gma il.com>
            wrote:
            On Jun 5, 2:18 am, nel <NajibKa...@gma il.comwrote:
            >I have two tags:
            ><!--// Remove Begin //--and <!--// Remove End //-->
            >I want to use regi_replace() to remove everything between these tags.
            >The thing is, these tags can be repeated throughout the code.
            ><!--// Remove Begin //-->(.+)<!--// Remove End //--works, but only
            >if the tags exists once. Otherwise, it parses out everything between
            >the first <!--// Remove Begin //--and the last <!--// Remove End //--
            >How could i modify this so that it will...
            i should have added, google for ungreedy U switch - your matching is
            too greedy, and slurps up one giant match rather than many "least"
            matches
            Or just use the ? modifier:
            preg_replace('| <!--// Remove Begin //-->.*?<!--// Remove End
            //-->|si','',$strin g);
            --
            Rik Wasmus
            >
            Just a side note to nel, if you are going to use shimmyshack's U
            modifier you have to use PCRE instead as Rik is doing, and be sure not
            to copy Rik's exact pattern unless you switch because you are using
            PHP's built in regex functions.
            >
            At least, I think you are using PHP's built-in regex stuff, assuming
            that by regi_replace() you mean eregi_replace()
            >
            -Mike PII

            Comment

            Working...