regex mystery

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Red

    regex mystery

    In netscape bookmark files, there are lots of lines like this:
    <DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
    LAST_CHARSET="I SO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>

    I want to eliminate the excess attributes and values to get this:
    <DT><A HREF="http://www.commondream s.org/">Common Dreams</A>

    I almost succeed with this:
    $lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
    $line);

    The only problem is the explicit "ADD". The code only works is there is
    an ADD_DATE attribute immediately after the url. I tried replacing (
    ADD.*) with ( .*), which I thought would match everything up to the ">":
    $lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);

    For some reason, this does not find a match. Since " ADD" is the same as
    ..*, I don't understand why I need the explicit " ADD".

    How do I match without the explicit " ADD"
  • steve

    #2
    Re: regex mystery

    "Red" wrote:[color=blue]
    > In netscape bookmark files, there are lots of lines like this:
    > <DT><A HREF="http://www.commondream s.org/"
    > ADD_DATE="10915 00674"
    > LAST_CHARSET="I SO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>
    >
    > I want to eliminate the excess attributes and values to get this:
    > <DT><A HREF="http://www.commondream s.org/">Common
    > Dreams</A>
    >
    > I almost succeed with this:
    > $lines[]=preg_replace(" {(<A HREF=\".*\")(
    > ADD.*)(>.*</A>)}","\1\3",
    > $line);
    >
    > The only problem is the explicit "ADD". The code only works is[/color]
    there[color=blue]
    > is
    > an ADD_DATE attribute immediately after the url. I tried replacing[/color]
    ([color=blue]
    >
    > ADD.*) with ( .*), which I thought would match everything up to the
    > ">":
    > $lines[]=preg_replace(" {(<A HREF=\".*\")(
    > .*)(>.*</A>)}","\1\3", $line);
    >
    > For some reason, this does not find a match. Since " ADD" is the[/color]
    same[color=blue]
    > as
    > ..*, I don’t understand why I need the explicit " ADD".
    >
    > How do I match without the explicit " ADD"[/color]

    I could not follow the code, but this should work
    ADD_DATE="10915 00674"

    $changedlined = preg_replace("/ADD_DATE\=\"\d+ \"/", ’’,
    $originalline);

    --
    http://www.dbForumz.com/ This article was posted by author's request
    Articles individually checked for conformance to usenet standards
    Topic URL: http://www.dbForumz.com/PHP-regex-my...ict136508.html
    Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=455857

    Comment

    • Michael Fesser

      #3
      Re: regex mystery

      .oO(Red)
      [color=blue]
      >In netscape bookmark files, there are lots of lines like this:
      ><DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
      >LAST_CHARSET=" ISO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>
      >
      >I want to eliminate the excess attributes and values to get this:
      ><DT><A HREF="http://www.commondream s.org/">Common Dreams</A>
      >
      >I almost succeed with this:
      >$lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
      >$line);
      >
      >The only problem is the explicit "ADD". The code only works is there is
      >an ADD_DATE attribute immediately after the url. I tried replacing (
      >ADD.*) with ( .*), which I thought would match everything up to the ">":
      >$lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);
      >
      >For some reason, this does not find a match. Since " ADD" is the same as
      >.*, I don't understand why I need the explicit " ADD".[/color]

      It's because of the default greediness of the quantifiers. The .* after
      the HREF=\" in your second pattern is quite hungry and eats up every-
      thing until the last " in the tag, including the ADD_DATE and everything
      else. You can change this behaviour with the U-modifier, e.g.

      $pattern = '#(<a href=".*").*(>. *</a>)#iU';
      $replace = '$1$2';
      $lines[] = preg_replace($p attern, $replace, $line);

      Pattern Modifiers
      <http://www.php.net/manual/en/pcre.pattern.mo difiers.php>

      HTH
      Micha

      Comment

      • Red

        #4
        Re: regex mystery

        Michael Fesser wrote:[color=blue]
        > .oO(Red)
        >
        >[color=green]
        >>In netscape bookmark files, there are lots of lines like this:
        >><DT><A HREF="http://www.commondream s.org/" ADD_DATE="10915 00674"
        >>LAST_CHARSET= "ISO-8859-1" ID="rdf:#$uiYyb 3">Common Dreams</A>
        >>
        >>I want to eliminate the excess attributes and values to get this:
        >><DT><A HREF="http://www.commondream s.org/">Common Dreams</A>
        >>
        >>I almost succeed with this:
        >>$lines[]=preg_replace(" {(<A HREF=\".*\")( ADD.*)(>.*</A>)}","\\1\\3" ,
        >>$line);
        >>
        >>The only problem is the explicit "ADD". The code only works is there is
        >>an ADD_DATE attribute immediately after the url. I tried replacing (
        >>ADD.*) with ( .*), which I thought would match everything up to the ">":
        >>$lines[]=preg_replace(" {(<A HREF=\".*\")( .*)(>.*</A>)}","\\1\\3" , $line);
        >>
        >>For some reason, this does not find a match. Since " ADD" is the same as
        >>.*, I don't understand why I need the explicit " ADD".[/color]
        >
        >
        > It's because of the default greediness of the quantifiers. The .* after
        > the HREF=\" in your second pattern is quite hungry and eats up every-
        > thing until the last " in the tag, including the ADD_DATE and everything
        > else. You can change this behaviour with the U-modifier, e.g.
        >
        > $pattern = '#(<a href=".*").*(>. *</a>)#iU';
        > $replace = '$1$2';
        > $lines[] = preg_replace($p attern, $replace, $line);
        >
        > Pattern Modifiers
        > <http://www.php.net/manual/en/pcre.pattern.mo difiers.php>
        >
        > HTH
        > Micha[/color]


        What a handy modifier, thanks.

        red

        Comment

        Working...