problem extracting count of visits

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • francdela
    New Member
    • Mar 2010
    • 1

    problem extracting count of visits

    hello, i have this structure

    Code:
    mysql> select id,refer,ip,date from logs;
    +-----+-------------------------------------------+----------------+---------------------+
    | id  | refer                                     | ip             | date                |
    +-----+-------------------------------------------+----------------+---------------------+
    | 354 | http://www.google.com                     | 125.122.211.11 | 2010-03-03 20:51:46 |
    | 355 | http://www.google.com                     | 12.123.125.121 | 2010-03-03 20:52:46 |
    | 356 | http://bing.com                           | 123.121.54.152 | 2010-03-03 20:53:46 |
    | 357 | http://bing.com                           | 125.122.211.11 | 2010-03-06 16:00:46 |
    | 358 | http://bing.com                           | 125.122.211.11 | 2010-03-06 16:20:46 |
    +-----+-------------------------------------------+----------------+---------------------+
    i want to extract count of visits (where difference between dates is more than 30 minutes for the same ip).

    this should work like this:

    125.122.211.11# 2010-03-03 20:51:46#visit = 1
    12.123.125.121# 2010-03-03 20:52:46#visit = 2
    123.121.54.152# 2010-03-03 20:53:46#visit = 3
    125.122.211.11# 2010-03-06 16:00:46#visit = 4
    125.122.211.11# 2010-03-06 16:20:46#visit = 4(diff less than 30 minutes)

    i try this but not working :(

    Code:
    SELECT date FROM logs 
    WHERE date >= now() - INTERVAL 30 DAY 
    GROUP BY ip HAVING TIMESTAMPDIFF(MINUTE, date, now()) < 30
    thanks
    Last edited by Atli; Mar 11 '10, 12:07 AM. Reason: Added [code] tags, and a couple of line-breaks to the query.
  • Atli
    Recognized Expert Expert
    • Nov 2006
    • 5062

    #2
    Hey.

    This is actually kind of tricky, because to do this you need to programmaticall y compare each row with previous rows. Which means you will need to be able to store data from a previous row to compare it with the current row. - That can be done using User Defined variables. They allow you to set a variable pretty much anywhere in a session and call them later in the same session.

    So what you need to do is iterate through all the IP's, and for each IP you need to run through all the dates, counting each date that was recorded 30 minutes after the previous date for that IP.

    The first step is to set up a query to iterate through all the IP's.
    [code=sql]SELECT
    ol.`ip`
    FROM `logs` AS ol
    GROUP BY ol.`ip`;[/code]
    And then we add a user-defined variable to hold the Unix time-stamp (seconds since January 1'st, 1970) for the first date recorded for the current IP.
    (Note that this all assumes the dates are in order, from oldest to newest!)
    [code=sql]SELECT
    ol.`ip`,
    @last_date := UNIX_TIMESTAMP( ol.`date`) AS `initial_stamp`
    FROM `logs` AS ol
    GROUP BY ol.`ip`;[/code]
    Note that the line I added there not only returns the value, but it also sets the @last_date to that value. The ":=" operator does that; it sets a variable, whereas a single "=" compares values.


    Now we need to go through all the dates for each IP. That will require the use of a Correlated Subquery. Those are subqueries that reference tables outside the current query. (See the manual for examples.) -- Here we use it to create a subquery that only runs through the rows of the table, where the IP matches the IP of the outer query.
    [code=sql]SELECT
    ol.`ip`,
    @last_date := UNIX_TIMESTAMP( ol.`date`) AS `initial_stamp` ,
    (
    SELECT
    COUNT(il.`ip`)
    FROM `logs` AS il
    WHERE il.`ip` = ol.`ip`
    ) AS 'visits'
    FROM `logs` AS ol
    GROUP BY ol.`ip`;[/code]

    And now, for the tricky part. To compare two values and perform different actions based on the results, we can use the IF function. - We compare the @last_date, who's initial value should always be the first date for that IP, to the date of the current entry for the IP. If it is more than 1800 seconds (30 minutes) away from the previous date, we have the IF() function return 1. If not, we have it return 0. - Then we use the SUM() function to add up the total result.
    [code=sql]SELECT
    ol.`ip`,
    @last_date := UNIX_TIMESTAMP( ol.`date`) AS `initial_stamp` ,
    (
    SELECT
    1 + SUM(
    IF(UNIX_TIMESTA MP(il.`date`) - @last_date > 1800,
    1, 0
    )
    )
    FROM `logs` AS il
    WHERE il.`ip` = ol.`ip`
    ) AS 'visits'
    FROM `logs` AS ol
    GROUP BY ol.`ip`;[/code]

    Now there is only one thing left to do. Because we need the @last_date variable to always hold the date for the last row we examined, we need to set it somewhere in the subquery; somewhere after the IF() has been completed. - The only place we can do that is in the return statements in the IF itself. -- We just need to manupulate it so that the expression that sets the variable returns 1 on TRUE or 0 on FALSE.
    [code=sql]SELECT
    ol.`ip`,
    @last_date := UNIX_TIMESTAMP( ol.`date`) AS `initial_stamp` ,
    (
    SELECT
    1 + SUM(
    IF(UNIX_TIMESTA MP(il.`date`) - @last_date > 1800,
    (@last_date := UNIX_TIMESTAMP( il.`date`)) > 0,
    (@last_date := UNIX_TIMESTAMP( il.`date`)) < 0
    )
    )
    FROM `logs` AS il
    WHERE il.`ip` = ol.`ip`
    ) AS 'visits'
    FROM `logs` AS ol
    GROUP BY ol.`ip`;[/code]

    And there you have it. According to my tests on a very limited data-set, this works fine. You may want to do some more testing on a larger set if you plan on using this somewhere.

    Comment

    Working...