extract website into string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • giloosh

    extract website into string

    Hello,
    how can i extract a websites html into a string.
    also, is there a limit to how many chars a string can hold?

    something like:
    $string = extracted_html( $url);

    thanks for any help!

  • Al

    #2
    Re: extract website into string

    As far as I know there is no limit... except probably in the gigabyte
    range, or whatever memory your server/php has available to it
    obviously.

    As for getting the html into a string, there are two main ways: using
    the native fopen() etc. commands from php and using an extension
    library such as cURL.

    Using the first method is slightly easier, but you'll need to have
    allow_url_fopen enabled. See
    http://uk.php.net/manual/en/function...t-contents.php (and related
    functions) and
    http://uk.php.net/manual/en/ref.file...llow-url-fopen for
    more information.

    Using the second method requires a nice wrapper function but after that
    it's relatively easy to code. Unfortunately you'll also require the
    library to be installed, although the cURL library is installed with
    most server setups as far as I know.

    A basic cURL using script I got from a firefox downloads ticker script
    and modified slightly is presented below:

    <?php

    function getResponse($ur l, $port, $timeout) {
    $ch = curl_init();

    curl_setopt($ch , CURLOPT_URL, $url);
    curl_setopt($ch , CURLOPT_HEADER, 0);
    curl_setopt($ch , CURLOPT_FOLLOWL OCATION, 1);
    curl_setopt($ch , CURLOPT_RETURNT RANSFER, 1);
    curl_setopt($ch , CURLOPT_FRESH_C ONNECT, 1);
    curl_setopt($ch , CURLOPT_TIMEOUT , $timeout);
    curl_setopt($ch , CURLOPT_PORT, $port);

    $data = curl_exec($ch);
    curl_close($ch) ;

    if ($data === NULL)
    return "Error"; // no data was retrieved, I'm not sure what
    you'd want to do here

    return $data;

    }

    // standard wrapper for getresponse (so you don't
    // have to do port/timeout business every time)
    function extract_html($u rl) {
    return getResponse($ur l, 80, 25);
    }



    $siteString = extract_html("h ttp://www.example.com/");

    ?>

    Comment

    • d

      #3
      Re: extract website into string

      "Al" <alexrussell101 @gmail.com> wrote in message
      news:1137561283 .796882.41560@z 14g2000cwz.goog legroups.com...[color=blue]
      > As far as I know there is no limit... except probably in the gigabyte
      > range, or whatever memory your server/php has available to it
      > obviously.
      >
      > As for getting the html into a string, there are two main ways: using
      > the native fopen() etc. commands from php and using an extension
      > library such as cURL.[/color]

      You can also use fsockopen(), too:

      $host="google.c om";
      $buffer="";
      $fp=fsockopen($ host, 80);
      fwrite($fp, "GET / HTTP/1.0\r\nHost: $host\r\n\r\n") ;
      while (!feof($fp)) $buffer.=fread( $fp, 4096);
      fclose($fp);

      that doesn't need any special flags set, or external modules.


      Comment

      • Al

        #4
        Re: extract website into string

        d wrote:[color=blue]
        > You can also use fsockopen(), too:
        >
        > $host="google.c om";
        > $buffer="";
        > $fp=fsockopen($ host, 80);
        > fwrite($fp, "GET / HTTP/1.0\r\nHost: $host\r\n\r\n") ;
        > while (!feof($fp)) $buffer.=fread( $fp, 4096);
        > fclose($fp);
        >
        > that doesn't need any special flags set, or external modules.[/color]

        My bad, I was under the impression that fsockopen() was under the
        fopen() (i.e. requiring the whole url_fopen thingy set) groupd of
        commands. I should have read up on it better.

        Comment

        • d

          #5
          Re: extract website into string

          "Al" <alexrussell101 @gmail.com> wrote in message
          news:1137722075 .518098.287090@ o13g2000cwo.goo glegroups.com.. .[color=blue]
          >d wrote:[color=green]
          >> You can also use fsockopen(), too:
          >>
          >> $host="google.c om";
          >> $buffer="";
          >> $fp=fsockopen($ host, 80);
          >> fwrite($fp, "GET / HTTP/1.0\r\nHost: $host\r\n\r\n") ;
          >> while (!feof($fp)) $buffer.=fread( $fp, 4096);
          >> fclose($fp);
          >>
          >> that doesn't need any special flags set, or external modules.[/color]
          >
          > My bad, I was under the impression that fsockopen() was under the
          > fopen() (i.e. requiring the whole url_fopen thingy set) groupd of
          > commands. I should have read up on it better.
          >[/color]

          No problemo ;)


          Comment

          • Jim Michaels

            #6
            Re: extract website into string


            "giloosh" <giloosh99@gmai l.com> wrote in message
            news:1137557983 .480184.227480@ o13g2000cwo.goo glegroups.com.. .[color=blue]
            > Hello,
            > how can i extract a websites html into a string.
            > also, is there a limit to how many chars a string can hold?
            >
            > something like:
            > $string = extracted_html( $url);
            >
            > thanks for any help!
            >[/color]

            one function. $page=file_get_ contents(http://www.google.com);


            Comment

            • Jim Michaels

              #7
              Re: extract website into string


              "Jim Michaels" <jmichae3@nospa m.yahoo.com> wrote in message
              news:z6adnS5CQ7 uI8XfeRVn-rw@comcast.com. ..[color=blue]
              >
              > "giloosh" <giloosh99@gmai l.com> wrote in message
              > news:1137557983 .480184.227480@ o13g2000cwo.goo glegroups.com.. .[color=green]
              >> Hello,
              >> how can i extract a websites html into a string.
              >> also, is there a limit to how many chars a string can hold?
              >>
              >> something like:
              >> $string = extracted_html( $url);
              >>
              >> thanks for any help!
              >>[/color]
              >
              > one function. $page=file_get_ contents(http://www.google.com);[/color]
              OOPS forgot quotes.
              $page=file_get_ contents("http://www.google.com" );

              [color=blue]
              >[/color]


              Comment

              Working...