detection of a robot in php

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • giminik@gmail.com

    detection of a robot in php

    Hello everybody :)

    A friend recently showed me an odd thing while playing with the command
    wget under linux, I don't know why... But the result has surprised me :
    $ wget http://www.prizee.com/parole.php
    --02:35:29-- http://www.prizee.com/parole.php
    =`parole.php'
    Resolution de www.prizee.com... 213.186.63.5
    Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
    requete HTTP transmise, en attente de la reponse...302 Found
    Emplacement: /index.php?joueu r=1 [suivant]
    --02:35:30-- http://www.prizee.com/index.php?joueur=1
    =`index.php?jou eur=1.1'
    Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
    requete HTTP transmise, en attente de la reponse...200 OK
    Longueur: non specifie [text/html]

    [ <=>

    ] 12,521
    --.--K/s

    02:35:30 (103.57 KB/s) - ? index.php?joueu r=1.1 a sauvegarde [12521]


    Then, he obtains an http error code (302) which redirect him on the
    index page of the site.
    With a browser like firefox, ie, safari we get the good page without
    any redirection.
    After that, I've made some tests. I tried to change the user agent
    string with wget to identify it as mozilla, but I have the same result
    (redirection). I tried links (command line browser) and curl but same
    problem.
    Here is the result of curl command :

    $ curl -v http://www.prizee.com/parole.php
    * About to connect() to www.prizee.com port 80
    * Trying 213.186.63.5... connected
    * Connected to www.prizee.com (213.186.63.5) port 80
    GET /parole.php HTTP/1.1
    User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
    Host: www.prizee.com
    Accept: */*
    >
    < HTTP/1.1 302 Found
    < Date: Wed, 09 Aug 2006 00:02:57 GMT
    < Server: Apache/1.3.33 (Unix) PHP/4.3.10
    < X-Powered-By: PHP/4.3.10
    < X-Accelerated-By: PHPA/1.3.3r2
    < Expires: Mon, 26 Jul 1997 05:00:00 GMT
    < Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
    < Cache-Control: no-cache, must-revalidate
    < Pragma: no-cache
    < Set-Cookie: COOKIEis_accept ed=1; path=/; domain=.prizee. com
    < Location: /index.php?joueu r=1
    < Connection: close
    < Transfer-Encoding: chunked
    < Content-Type: text/html
    * Closing connection #0


    So, my question is : How we can detect the use of a command line tool
    on a web site ? Like the site above. Thank you for your answers.

    Sorry for my bad english, i'm french ;)

  • Chris Hope

    #2
    Re: detection of a robot in php

    giminik@gmail.c om wrote:
    Hello everybody :)
    >
    A friend recently showed me an odd thing while playing with the
    command wget under linux, I don't know why... But the result has
    surprised me : $ wget http://www.prizee.com/parole.php
    --02:35:29-- http://www.prizee.com/parole.php
    =`parole.php'
    Resolution de www.prizee.com... 213.186.63.5
    Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
    requete HTTP transmise, en attente de la reponse...302 Found
    Emplacement: /index.php?joueu r=1 [suivant]
    --02:35:30-- http://www.prizee.com/index.php?joueur=1
    =`index.php?jou eur=1.1'
    Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
    requete HTTP transmise, en attente de la reponse...200 OK
    Longueur: non specifie [text/html]
    >
    [ <=>
    >
    ] 12,521
    --.--K/s
    >
    02:35:30 (103.57 KB/s) - ? index.php?joueu r=1.1 a sauvegarde [12521]
    >
    >
    Then, he obtains an http error code (302) which redirect him on the
    index page of the site.
    With a browser like firefox, ie, safari we get the good page without
    any redirection.
    After that, I've made some tests. I tried to change the user agent
    string with wget to identify it as mozilla, but I have the same result
    (redirection). I tried links (command line browser) and curl but same
    problem.
    Here is the result of curl command :
    >
    $ curl -v http://www.prizee.com/parole.php
    * About to connect() to www.prizee.com port 80
    * Trying 213.186.63.5... connected
    * Connected to www.prizee.com (213.186.63.5) port 80
    >GET /parole.php HTTP/1.1
    >User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1
    >GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15 Host: www.prizee.com
    >Accept: */*
    >>
    < HTTP/1.1 302 Found
    < Date: Wed, 09 Aug 2006 00:02:57 GMT
    < Server: Apache/1.3.33 (Unix) PHP/4.3.10
    < X-Powered-By: PHP/4.3.10
    < X-Accelerated-By: PHPA/1.3.3r2
    < Expires: Mon, 26 Jul 1997 05:00:00 GMT
    < Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
    < Cache-Control: no-cache, must-revalidate
    < Pragma: no-cache
    < Set-Cookie: COOKIEis_accept ed=1; path=/; domain=.prizee. com
    < Location: /index.php?joueu r=1
    < Connection: close
    < Transfer-Encoding: chunked
    < Content-Type: text/html
    * Closing connection #0
    >
    >
    So, my question is : How we can detect the use of a command line tool
    on a web site ? Like the site above. Thank you for your answers.
    I tried both Firefox and Konqueror and they both redirected me to the
    second page, so there doesn't appear to be anything different between
    using wget and using a graphical browser, at least to me.

    You can't detect the use of a command line tool if they set the user
    agent correctly. For example:

    wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

    followed by the url will tell the website you're using IE on Windows XP.

    --
    Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com

    Comment

    • flamer die.spam@hotmail.com

      #3
      Re: detection of a robot in php


      giminik@gmail.c om wrote:
      Hello everybody :)
      >
      A friend recently showed me an odd thing while playing with the command
      wget under linux, I don't know why... But the result has surprised me :
      $ wget http://www.prizee.com/parole.php
      --02:35:29-- http://www.prizee.com/parole.php
      =`parole.php'
      Resolution de www.prizee.com... 213.186.63.5
      Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
      requete HTTP transmise, en attente de la reponse...302 Found
      Emplacement: /index.php?joueu r=1 [suivant]
      --02:35:30-- http://www.prizee.com/index.php?joueur=1
      =`index.php?jou eur=1.1'
      Connexion vers www.prizee.com| 213.186.63.5|:8 0...connecte.
      requete HTTP transmise, en attente de la reponse...200 OK
      Longueur: non specifie [text/html]
      >
      [ <=>
      >
      ] 12,521
      --.--K/s
      >
      02:35:30 (103.57 KB/s) - ? index.php?joueu r=1.1 a sauvegarde [12521]
      >
      >
      Then, he obtains an http error code (302) which redirect him on the
      index page of the site.
      With a browser like firefox, ie, safari we get the good page without
      any redirection.
      After that, I've made some tests. I tried to change the user agent
      string with wget to identify it as mozilla, but I have the same result
      (redirection). I tried links (command line browser) and curl but same
      problem.
      Here is the result of curl command :
      >
      $ curl -v http://www.prizee.com/parole.php
      * About to connect() to www.prizee.com port 80
      * Trying 213.186.63.5... connected
      * Connected to www.prizee.com (213.186.63.5) port 80
      GET /parole.php HTTP/1.1
      User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
      Host: www.prizee.com
      Accept: */*
      < HTTP/1.1 302 Found
      < Date: Wed, 09 Aug 2006 00:02:57 GMT
      < Server: Apache/1.3.33 (Unix) PHP/4.3.10
      < X-Powered-By: PHP/4.3.10
      < X-Accelerated-By: PHPA/1.3.3r2
      < Expires: Mon, 26 Jul 1997 05:00:00 GMT
      < Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
      < Cache-Control: no-cache, must-revalidate
      < Pragma: no-cache
      < Set-Cookie: COOKIEis_accept ed=1; path=/; domain=.prizee. com
      < Location: /index.php?joueu r=1
      < Connection: close
      < Transfer-Encoding: chunked
      < Content-Type: text/html
      * Closing connection #0
      >
      >
      So, my question is : How we can detect the use of a command line tool
      on a web site ? Like the site above. Thank you for your answers.
      >
      Sorry for my bad english, i'm french ;)
      probably just redirects for linux users and not ms by checking the
      agent-type.

      Flamer.

      Comment

      • giminik@gmail.com

        #4
        Re: detection of a robot in php

        Thank for your answers.
        I found the problem. It was a session cookie problem.
        I've just used the wget option : --keep-session-cookies with
        --load-cookies to solve the problem.

        Comment

        Working...