Trapping file_get_contents for bad urls

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Csaba Gabor

    Trapping file_get_contents for bad urls

    I've got a section of code that takes a user supplied $url and
    essentially does:
    function redirectRequest ($url) {
    $resultRemote = @file_get_conte nts($url);
    return $resultRemote; }

    Is there a reasonable way to trap for and differentiate between the
    major (most common) types of failure?

    For example, common scenarios in which I would not get back a desired
    response include: malformed url, the domain can't be mapped (ie. the
    DNS servers don't return an IP address), the domain is not reachable,
    the directory or file is not found, standard HTTP error response
    header.


    I'm sure it's been solved many times over, but my searches didn't lead
    me to substantive results. My initial attempt only captures the fact
    that there was a problem, but not the nature of it:

    function redirectRequest ($url) {
    $oldTrack = ini_set("track_ errors","true") ; // allow $php_errormsg
    to be set
    $php_errormsg = ""; // initialize; will be set
    if error with next line
    $resultRemote = @file_get_conte nts($url);
    ini_set("track_ errors",$oldTra ck); // restore setting
    if ($php_errormsg) return -1; // this site had a problem
    return $resultRemote; }


    Thanks for any pointers,
    Csaba Gabor from Vienna

    Note: This is to be used between cooperating domains where each server
    puts out a serialized response when it receives a redirectRequest as
    above. So if the requesting site can't unserialize the return value,
    it knows there is a problem, but this would still leave a user (which
    for now is me) confused as to the source of the problem.

  • Chung Leong

    #2
    Re: Trapping file_get_conten ts for bad urls

    In theory, you can create a stream context, then attach a callback
    function to it to receive notifications:

    $context = stream_context_ create(array()) ;
    stream_context_ set_params($con text, array('notifica tion' =>
    'stream_callbac k'));
    $f = fopen("http://www.google.com/", "rb", false, $context);

    The parameters passed to the function are as followed:

    function stream_callback (
    $notifycode,
    $severity,
    $xmsg,
    $xcode,
    $bytes_sofar,
    $bytes_max)

    Notification codes are defined as such:

    1 = RESOLVE
    2 = CONNECT
    3 = AUTH_REQUIRED
    4 = MIME_TYPE_IS
    5 = FILE_SIZE_IS
    6 = REDIRECTED
    7 = PROGRESS
    8 = COMPLETED
    9 = FAILURE
    10 = AUTH_RESULT

    Severity codes:

    0 = INFO
    1 = WARNING
    2 = ERROR

    In practice, however, the PHP stream API is somewhat of a
    work-in-progress. Not everything is listed above is actually
    implemented. As of PHP 4.3.6, the callback doesn't receive
    notifications for the resolve stage.

    Comment

    • Mara Guida

      #3
      Re: Trapping file_get_conten ts for bad urls

      Csaba Gabor wrote:[color=blue]
      > I've got a section of code that takes a user supplied $url and
      > essentially does:
      > function redirectRequest ($url) {
      > $resultRemote = @file_get_conte nts($url);
      > return $resultRemote; }[/color]
      <snip>


      Try cURL


      Comment

      • NC

        #4
        Re: Trapping file_get_conten ts for bad urls

        Csaba Gabor wrote:[color=blue]
        >
        > I've got a section of code that takes a user supplied $url and
        > essentially does:
        > function redirectRequest ($url) {
        > $resultRemote = @file_get_conte nts($url);
        > return $resultRemote; }
        >
        > Is there a reasonable way to trap for and differentiate between
        > the major (most common) types of failure?[/color]

        Not if you insist on using file_get_conten ts()...
        [color=blue]
        > For example, common scenarios in which I would not get back a desired
        > response include: malformed url, the domain can't be mapped (ie. the
        > DNS servers don't return an IP address), the domain is not reachable,
        > the directory or file is not found, standard HTTP error response
        > header.[/color]

        Consider an alternative function that does something like this:

        1. Parse the argument URL into parts using parse_url() to
        verify that it is not obvously malformed.
        2. Run the server name through gethostbyname() to verify
        that the host name in fact resolves to an IP address.
        3. Connect to the server using fsockopen() to verify that
        it is operational.
        4. Attempt to retrieve the file by emulating a GET request
        with fputs() and reading response by fgets().
        5. Parse the response headers and, if applicable, the body.

        Cheers,
        NC

        Comment

        Working...