Need help replacing a specific <a href ... </a>

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Sorby

    Need help replacing a specific <a href ... </a>

    Hi
    I've been coding in PHP for a little while now and was starting to feel
    pretty confident but then realise I need to understand regular expressions
    to solve a particular problem I've got ... What a horrible can of worms
    regex is!! (to the uninitiated at least).

    I realise that if I spend the next few weeks researching regex I'll probably
    find the answer but I was wondering if anyone here would kindly help speed
    up the process?

    Basically I am grabbing the html from another URL, finding a specific <a
    href .. </a> block and replacing the contents of that block with another
    link. (n.b. the code examples below are pseudo code - I realise I need to
    'escape' some characters)

    Here's how I'm grabbing the source html :
    $html = join ("", file (http://www.sitexyz.com/index.htm));

    the link I want to replace looks something like :
    <a href="http://www.sitexyz.com/page2"><img height="120"
    src="origpic.jp g"></a>
    Note - the only thing I know to be constant about the above link is that the
    height is always 120 - the link destination and source image are completely
    unpredictable.

    Here's a variable holding the replacement link to my site using a different
    image :
    $newlink = "<a href="http://www.mysite.com" ><img height="120"
    src="mypic.jpg" ></a>";

    So... how do I use preg_replace() or ereg_replace() to find the <a href ..
    </a> block encompassing an image of height 120 and replace it with the
    contents of my $newlink variable?

    *** To precis my question ... How do I replace an HTML link (where all I
    know is the image height) with a link of my own? ***

    TIA!

    p.s. I promise I'll go and sit on the top of a mountain and not come down
    until I've memorised & understood everything about regex ... once I've
    sorted this problem! :o)

    --
    Sorby



  • Pedro Graca

    #2
    Re: Need help replacing a specific &lt;a href ... &lt;/a&gt;

    Sorby wrote:[color=blue]
    > Here's how I'm grabbing the source html :
    > $html = join ("", file (http://www.sitexyz.com/index.htm));[/color]

    You realize the parameter to file() should be between quotes, right?
    [color=blue]
    > the link I want to replace looks something like :
    > <a href="http://www.sitexyz.com/page2"><img height="120"
    > src="origpic.jp g"></a>
    > Note - the only thing I know to be constant about the above link is that the
    > height is always 120 - the link destination and source image are completely
    > unpredictable.[/color]

    What about the order they appear in?
    What about white space (including newlines)?
    What about other attributes?
    [color=blue]
    > Here's a variable holding the replacement link to my site using a different
    > image :
    > $newlink = "<a href="http://www.mysite.com" ><img height="120"
    > src="mypic.jpg" ></a>";
    >
    > So... how do I use preg_replace() or ereg_replace() to find the <a href ..
    > </a> block encompassing an image of height 120 and replace it with the
    > contents of my $newlink variable?[/color]

    You're better off using an HTML parser than relying on the structure of
    the HTML you're fetching.

    Anyway, if they don't ever change, here's a starting point:

    <?php
    error_reporting (E_ALL);
    ini_set('displa y_errors', '1');

    $x = '... <a href="http://www.sitexyz.com/page2">';
    $x .= '<img height="120" src="origpic.jp g"></a> ...';

    $newlink = '<a href="http://www.mysite.com" >';
    $newlink .='<img height="120" src="mypic.jpg" ></a>';

    $y = preg_replace('@ <a href="[^"]+"><img height="120" src="[^"]+"></a>@',
    $newlink, $x);

    echo "$x\n$y\n";
    ?>

    [color=blue]
    > *** To precis my question ... How do I replace an HTML link (where all I
    > know is the image height) with a link of my own? ***[/color]

    HTML links do not always have an image associated with them!
    --
    USENET would be a better place if everybody read: : mail address :
    http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
    http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
    http://www.expita.com/nomime.html : to 10K bytes :

    Comment

    • Sorby

      #3
      Re: Need help replacing a specific &lt;a href ... &lt;/a&gt;

      "Pedro Graca" <hexkid@hotpop. com> wrote in message
      news:c6pbtn$el8 9j$2@ID-203069.news.uni-berlin.de...[color=blue]
      > Sorby wrote:[color=green]
      > > Here's how I'm grabbing the source html :
      > > $html = join ("", file (http://www.sitexyz.com/index.htm));[/color]
      >
      > You realize the parameter to file() should be between quotes, right?[/color]

      Yes - thanks. I just typed it in that way - but in my source code it has
      quotes.
      [color=blue][color=green]
      > > the link I want to replace looks something like :
      > > <a href="http://www.sitexyz.com/page2"><img height="120"
      > > src="origpic.jp g"></a>
      > > Note - the only thing I know to be constant about the above link is that[/color][/color]
      the[color=blue][color=green]
      > > height is always 120 - the link destination and source image are[/color][/color]
      completely[color=blue][color=green]
      > > unpredictable.[/color]
      >
      > What about the order they appear in?[/color]

      Here is an exact cut'n'paste of the bit I want to replace.

      <a href="/1/files/small/north/page2.htm"><img height="120" hspace="0"
      align="left" vspace="0" border="0" width="80" alt="original photo"
      src="http://www.mysite.com/origpic.jpg" /></a>
      [color=blue]
      > What about white space (including newlines)?[/color]

      I don't think this will be a problem. Hope not anyway. All the examples I've
      seen are as above and with no line-breaks/carriage-returns.
      [color=blue]
      > What about other attributes?[/color]

      I've included them in the new example above.
      [color=blue][color=green]
      > > Here's a variable holding the replacement link to my site using a[/color][/color]
      different[color=blue][color=green]
      > > image :
      > > $newlink = "<a href="http://www.mysite.com" ><img height="120"
      > > src="mypic.jpg" ></a>";
      > >
      > > So... how do I use preg_replace() or ereg_replace() to find the <a href[/color][/color]
      ...[color=blue][color=green]
      > > </a> block encompassing an image of height 120 and replace it with the
      > > contents of my $newlink variable?[/color]
      >
      > You're better off using an HTML parser than relying on the structure of
      > the HTML you're fetching.[/color]

      Am I relying on the structure of the HTML I'm fetching? If the structure of
      the HTML changes over time but the image height remains 120 then the code
      should still work, right?
      [color=blue]
      > Anyway, if they don't ever change, here's a starting point:
      >
      > <?php
      > error_reporting (E_ALL);
      > ini_set('displa y_errors', '1');
      >
      > $x = '... <a href="http://www.sitexyz.com/page2">';
      > $x .= '<img height="120" src="origpic.jp g"></a> ...';
      >
      > $newlink = '<a href="http://www.mysite.com" >';
      > $newlink .='<img height="120" src="mypic.jpg" ></a>';
      >
      > $y = preg_replace('@ <a href="[^"]+"><img height="120" src="[^"]+"></a>@',
      > $newlink, $x);
      >
      > echo "$x\n$y\n";
      > ?>[/color]

      Thanks for taking the time to post this solution Pedro. I will try it out
      now.
      [color=blue][color=green]
      > > *** To precis my question ... How do I replace an HTML link (where all I
      > > know is the image height) with a link of my own? ***[/color]
      >
      > HTML links do not always have an image associated with them![/color]

      Good point but thankfully I am assured that in my case the links I'm looking
      for will always have an image.

      Thanks again - your help is greatly appreciated.

      --
      Sorby


      Comment

      • Sorby

        #4
        Re: Need help replacing a specific &lt;a href ... &lt;/a&gt;

        "Pedro Graca" <hexkid@hotpop. com> wrote in message
        news:c6pbtn$el8 9j$2@ID-203069.news.uni-berlin.de...[color=blue]
        > Sorby wrote:[color=green]
        > > Here's how I'm grabbing the source html :
        > > $html = join ("", file (http://www.sitexyz.com/index.htm));[/color]
        >
        > You realize the parameter to file() should be between quotes, right?
        >[color=green]
        > > the link I want to replace looks something like :
        > > <a href="http://www.sitexyz.com/page2"><img height="120"
        > > src="origpic.jp g"></a>
        > > Note - the only thing I know to be constant about the above link is that[/color][/color]
        the[color=blue][color=green]
        > > height is always 120 - the link destination and source image are[/color][/color]
        completely[color=blue][color=green]
        > > unpredictable.[/color]
        >
        > What about the order they appear in?
        > What about white space (including newlines)?
        > What about other attributes?
        >[color=green]
        > > Here's a variable holding the replacement link to my site using a[/color][/color]
        different[color=blue][color=green]
        > > image :
        > > $newlink = "<a href="http://www.mysite.com" ><img height="120"
        > > src="mypic.jpg" ></a>";
        > >
        > > So... how do I use preg_replace() or ereg_replace() to find the <a href[/color][/color]
        ...[color=blue][color=green]
        > > </a> block encompassing an image of height 120 and replace it with the
        > > contents of my $newlink variable?[/color]
        >
        > You're better off using an HTML parser than relying on the structure of
        > the HTML you're fetching.
        >
        > Anyway, if they don't ever change, here's a starting point:
        >
        > <?php
        > error_reporting (E_ALL);
        > ini_set('displa y_errors', '1');
        >
        > $x = '... <a href="http://www.sitexyz.com/page2">';
        > $x .= '<img height="120" src="origpic.jp g"></a> ...';[/color]

        Sorry - I probably wasn't clear enough - the href link and the image source
        name could be anything - I can't predict them - or even part of them.

        --
        Sorby


        Comment

        • Pedro Graca

          #5
          Re: Need help replacing a specific &lt;a href ... &lt;/a&gt;

          Sorby wrote:[color=blue]
          > "Pedro Graca" <hexkid@hotpop. com> wrote in message
          > news:c6pbtn$el8 9j$2@ID-203069.news.uni-berlin.de...[/color]
          [color=blue][color=green]
          >> $x = '... <a href="http://www.sitexyz.com/page2">';
          >> $x .= '<img height="120" src="origpic.jp g"></a> ...';[/color][/color]
          [color=blue]
          > Sorry - I probably wasn't clear enough - the href link and the image source
          > name could be anything - I can't predict them - or even part of them.[/color]

          It's ok, just try it with with different $x's :)


          $x = '<a href="one"><img src="one"/></a>';
          # $y = preg_replace();

          $x = '<a href="two"><img src="two"></a>';
          # $y = preg_replace();

          $x = '<a href="three"><i mg src="three"></a>';
          # $y = preg_replace();

          ....

          --
          USENET would be a better place if everybody read: : mail address :
          http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
          http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
          http://www.expita.com/nomime.html : to 10K bytes :

          Comment

          • Pedro Graca

            #6
            Re: Need help replacing a specific &lt;a href ... &lt;/a&gt;

            Sorby wrote:[color=blue]
            > Here is an exact cut'n'paste of the bit I want to replace.
            >
            > <a href="/1/files/small/north/page2.htm"><img height="120" hspace="0"
            > align="left" vspace="0" border="0" width="80" alt="original photo"
            > src="http://www.mysite.com/origpic.jpg" /></a>[/color]

            Hmmmm ... and you want to change just the URL and image SRC ?

            <a href="========= == CHANGED =========="><im g height="120" hspace="0"
            align="left" vspace="0" border="0" width="80" alt="original photo"
            src="========== ==== CHANGED ==========" /></a>

            Copy the stuff that you want to keep to the regexp and for the stuff you
            want replaced use
            [^"]+
            which means: one or more of anything except quotes

            so

            $regexp = '@' .
            '<a href="[^"]+"><img height="120" hspace="0" ' .
            ## =URL=
            'align="left" vspace="0" border="0" width="80" alt="original photo" ' .
            'src="[^"]+" /></a>' .
            ## =SRC=
            '@';

            $destin = preg_replace($r egexp, $newlink, $source);
            --
            USENET would be a better place if everybody read: : mail address :
            http://www.catb.org/~esr/faqs/smart-questions.html : is valid for :
            http://www.netmeister.org/news/learn2quote2.html : "text/plain" :
            http://www.expita.com/nomime.html : to 10K bytes :

            Comment

            Working...