user identification

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • d.schulz81@gmx.net

    user identification

    Hi all,

    We have about 10 different domains that are linked very closely and we
    want to identify and track every single user that surfs our websites.
    Later we want to analyse user paths and find out the search robots with
    the referring search words.

    What are the possibilities?
    Cookies are not accepted by 40 % of our users and in addition to that
    for each domain a different cookie is created what makes it really
    complicated.
    I guess a combination of Browser type, Operating System, Hostname etc
    is really insecure as there are many users using the same stuff.
    I think the only secure way to logg this is by the way of using
    sessions.

    One disadvantage of sessions is that they take very much performance of
    the server when there are many users at the same time.
    Is there a way to reduce performance of the sessions?
    Are there any other possibilities except sessions?
    Are there freeware php statistic functions that allow the reuse of
    their statistic data in our own implementations ?

    Thank you very much for your help!

    Dennis

  • Gordon Burditt

    #2
    Re: user identification

    >We have about 10 different domains that are linked very closely and we[color=blue]
    >want to identify and track every single user that surfs our websites.
    >Later we want to analyse user paths and find out the search robots with
    >the referring search words.
    >
    >What are the possibilities?[/color]

    Require the users to log in with an ID and password after registration.
    That may not be an option. You also probably need sessions to keep
    track of the fact that they HAVE logged in.

    When the user first hits one of your pages, the URL passed from
    many search engines gives you the keywords. This you can get from
    Apache logs, or with PHP. Figuring out where the user goes after
    that is harder, especially across domains.
    [color=blue]
    >Cookies are not accepted by 40 % of our users and in addition to that
    >for each domain a different cookie is created what makes it really
    >complicated.[/color]
    [color=blue]
    >I guess a combination of Browser type, Operating System, Hostname etc
    >is really insecure as there are many users using the same stuff.[/color]

    I hope 'secure' is *NOT* what you are looking for, and what you are
    looking for is more like 'accurate Big Brother watching'. If
    'security' is any kind of issue (say, you are a bank or doctor or
    auction site), you shouldn't even be thinking about fingerprinting
    browsers as a way of identifying users.

    Browser fingerprinting is inaccurate for a number of reasons:
    (a) load-sharing proxies make requests for even stuff on the SAME
    page come from several different IP addresses (and I believe
    most AOL users go through one).
    (b) If you don't use IP address, there's not enough info to distinguish
    the users (what percentage are using IE(latest) with Windows XP?
    Over half?)
    (c) NAT gateways make lots of really different users appear to come
    from the SAME IP. See (b) for why you can't tell them apart.

    It might work OK for marketing stats but not for anything requiring
    'security'.
    [color=blue]
    >I think the only secure way to logg this is by the way of using
    >sessions.[/color]

    Well, you've got a problem. To keep track of a session, you need at
    least one of (a) cookies, (b) session IDs passed via URL transparently
    using trans_sid, (c) session IDs passed explicitly via URL manually,
    or (d) hidden form variables on the page.

    (a) doesn't work across domains and besides, a lot of your users have
    them turned off.
    (b) only works with relative URLs (which have to be the same domain).
    (c) is a pain in the butt and may offend users for the same reason
    they have cookies off,
    and (d) works only if every link is a form, and is also a pain in the butt.
    I presume that if cookies are often turned off, Javascript is also often
    turned off.

    You're essentially stuck with (c), with the others sometimes working
    as a backup.
    [color=blue]
    >One disadvantage of sessions is that they take very much performance of
    >the server when there are many users at the same time.[/color]

    It is possible to write session handlers to stuff the session info
    into a MySQL database instead of using lots of little files in a
    directory. Whether or not this is a performance increase or decrease
    depends on your setup and things like how much data gets stuffed
    into sessions. Among other things this gets you is the ability to
    share session data between different (load-shared) physical servers.
    [color=blue]
    >Is there a way to reduce performance of the sessions?[/color]

    You could always add time-wasting code such as checking whether a
    user is logged in several thousand times on each page, but I don't
    think this is what you meant to ask.
    [color=blue]
    >Are there any other possibilities except sessions?[/color]

    There are some do-it-yourself methods which essentially re-implement
    sessions, often poorly, using the methods (a) through (d) to
    keep track of a session ID. Once you can track the session ID,
    you can stuff the session data anywhere you need to (files, database,
    whatever).
    [color=blue]
    >Are there freeware php statistic functions that allow the reuse of
    >their statistic data in our own implementations ?[/color]

    Probably, but I'm not sure how calculating mean and standard deviation
    of something is going to get you the raw data in the first place.

    Gordon L. Burditt

    Comment

    • BearItAll

      #3
      Re: user identification

      On Mon, 30 May 2005 13:40:19 -0700, d.schulz81 wrote:
      [color=blue]
      > Hi all,
      >
      > We have about 10 different domains that are linked very closely and we
      > want to identify and track every single user that surfs our websites.
      > Later we want to analyse user paths and find out the search robots with
      > the referring search words.
      >
      > What are the possibilities?
      > Cookies are not accepted by 40 % of our users and in addition to that for
      > each domain a different cookie is created what makes it really
      > complicated.
      > I guess a combination of Browser type, Operating System, Hostname etc is
      > really insecure as there are many users using the same stuff. I think the
      > only secure way to logg this is by the way of using sessions.
      >
      > One disadvantage of sessions is that they take very much performance of
      > the server when there are many users at the same time. Is there a way to
      > reduce performance of the sessions? Are there any other possibilities
      > except sessions? Are there freeware php statistic functions that allow the
      > reuse of their statistic data in our own implementations ?
      >
      > Thank you very much for your help!
      >
      > Dennis[/color]

      This one is based on a mysql table, it records standard information from
      $_SERVER[]. You might want to resolve the IPs on entry to the table,
      personally I prefer to only look up those that haven't been looked up
      before, which I do locally so haven't included here. Also, I moved the
      database connect to inside the function, for the sake of those new to this
      stuff who want to see how it fits together.

      Finally, don't criticise my crappy HTML bits. I have a real battle with
      html. Don't bother asking me why because I don't know the answer, if only
      we could do php without having to bother at all with the html side I'd be
      alright.


      The first function is the one to include in each of your pages (it records
      the page hits as well as site hits).

      <?php
      // Visitor hits
      //
      function VisitorHit()
      {
      //constants

      define('DB_USER ','your mysql user name');
      define('DB_PASS WORD','your password');
      define('DB_HOST ','localhost');
      define('DB_NAME ','database name');

      //connect to database

      $dbh=mysql_conn ect (DB_HOST, DB_USER, DB_PASSWORD);
      if( ! $dbh) {
      die ('I'm terribly sorry but I cannot connect to the database because:
      '.mysql_error() );
      }

      //open database

      mysql_select_db (DB_NAME);

      //select records
      $thispage = $_SERVER['PHP_SELF'];
      if( strlen($thispag e) < 1 )
      {
      $thispage = "none";
      }
      $browser = $_SERVER['HTTP_USER_AGEN T'];
      if( strlen($browser ) < 1 )
      {
      $browser = "none";
      }
      $ip = $_SERVER['REMOTE_ADDR'];
      if( strlen($ip) < 1 )
      {
      $ip = "none";
      }
      $requestmethod = $_SERVER['REQUEST_METHOD '];
      if( strlen($request method) < 1 )
      {
      $requestmethod = "none";
      }
      $querystring = $_SERVER['QUERY_STRING'];
      if( strlen($queryst ring) < 1 )
      {
      $querystring = "none";
      }
      $requesturi = $_SERVER['REQUEST_URI'];
      if( strlen($request uri) < 1 )
      {
      $requesturi = "none";
      }
      $referer = $_SERVER['HTTP_REFERER'];
      if( strlen($referer ) < 1 )
      {
      $referer = "none";
      }

      // build SQL string

      $sql = "INSERT INTO `visitors` ( `visitorid` , `self` , `browser` , `ip` , `requestmethod` , `querystring` , `requesturl` , `referer` , `touched` ) VALUES ('', '$thispage', '$browser', '$ip', '$requestmethod ', '$querystring', '$requesturi', '$referer', NOW( ));";

      //perform query

      $result = mysql_query($sq l);

      // no checking because we aren't that bothered about missing the odd hit

      }

      ?>

      This next part is really an example of how a 'stats.php' file might be
      used to examine the data.


      <?php
      $page_title='St ats';
      include 'header.inc';

      define('DB_USER ','your mysql username');
      define('DB_PASS WORD','your password');
      define('DB_HOST ','localhost');
      define('DB_NAME ','your database name');

      $dbh=mysql_conn ect (DB_HOST, DB_USER, DB_PASSWORD);
      if( ! $dbh) {
      die ('I cannot connect to the database because: '.mysql_error() );
      }

      //open database

      mysql_select_db (DB_NAME);

      //select minimum date record

      $sql = "SELECT DATE_FORMAT(MIN (touched),'%D %b %Y') FROM visitors;";
      $result = mysql_query($sq l);

      if($result)
      {
      $row = mysql_fetch_arr ay($result);
      $mindate = $row[0];
      }

      //select maximum date record
      $sql = "SELECT DATE_FORMAT(MAX (touched),'%D %b %Y') FROM visitors;";
      $result = mysql_query($sq l);

      if($result)
      {
      $row = mysql_fetch_arr ay($result);
      $maxdate = $row[0];
      }

      //DISTINCT visitors

      $sql = "SELECT COUNT(DISTINCT( ip)) FROM visitors;";
      $result = mysql_query($sq l);

      if($result)
      {
      $row = mysql_fetch_arr ay($result);
      $distinct = $row[0];
      }

      //DISTINCT browsers

      $sql = "SELECT COUNT(DISTINCT( browser)) FROM visitors;";
      $result = mysql_query($sq l);

      if($result)
      {
      $row = mysql_fetch_arr ay($result);
      $browser = $row[0];
      }


      ?>

      <div >
      <h2><a name="Stats">St atistics</a></h2>
      <?php
      echo "<p>Stats between $mindate and $maxdate</p>";
      echo "<p>Number of visitors $distinct</p>";
      echo "<p>Number of browsers $browser</p>";

      //DISTINCT browsers

      $sql = "SELECT browser, COUNT(*) AS icount FROM visitors GROUP BY browser
      ORDER BY icount DESC;";

      $result = mysql_query($sq l);

      echo "<p>";
      if($result)
      {
      while( $row = mysql_fetch_arr ay($result))
      {
      printf("%s --- %s<br>", $row[1],$row[0]);
      }
      }
      echo "</p>";

      ?>

      </div>

      <?php
      include 'footer.inc'
      ?>


      Comment

      Working...