Finding overlapping time intervals

**Motoma** · Oct 2 '07, 12:22 AM

Could you explain what the logic is behind how you came to your result set? I am afraid it is not as apparent to us as it is to you.

**corsin** · Oct 2 '07, 08:00 AM

Hi Motoma! Thanks for your reply!

The logic is as follows. I have several IP network flows in a MySQL database. These IP flows are basically represented as I wrote in my post (i.e., Src, Dst, Start_Time , End_Time, and amount of bytes). What I want to do is to merge those IP flows with the same source (Src) and Destination (Dst) that in some point in time overlapped.

Following the example that I gave to you we have...

Src Dst Start_time End_time Bytes
A B 10 16 40
A B 8 13 20
A B 1 5 100
A C 9 15 40
A B 7 12 60

... it's possible to see that there are 3 IP flows A B that overlapped in time: 1st row (with Start_time 10 and End_time 16), 2nd row (with Start_time 8 and End_time 13) and 5th row (with Start_time 7 and End_time 12). The IP flow A B in the 3rd row didn't overlap with any other IP flow A B so it does not have to be merged. It's interesting to notice that the IP flow A C (4th row) overlaps, but IP flow A C is different of IP flow A B (i.e., they have different destinations).

Therefore, the result that I would like to get is something like this:

Src Dst Start_time End_time Bytes
A B 7 16 120 *
A B 1 5 100
A C 9 15 40

* Notice here that the 'new' Start_time should be the earliest start_time (e.g., among the overlapped flows with Start_times 7, 8, and 10, 7 is the earliest) and the new End_time should be the latest end_time (among the overlapped flows with End_times 12, 13 and 16, 16 is the latest).

I hope I've made myself clearer now. If not, please let me know.

Regards,

**Motoma** · Oct 2 '07, 01:25 PM

Originally posted by corsin

Hi Motoma! Thanks for your reply!

The logic is as follows. I have several IP network flows in a MySQL database. These IP flows are basically represented as I wrote in my post (i.e., Src, Dst, Start_Time , End_Time, and amount of bytes). What I want to do is to merge those IP flows with the same source (Src) and Destination (Dst) that in some point in time overlapped.

Following the example that I gave to you we have...

Src Dst Start_time End_time Bytes
A B 10 16 40
A B 8 13 20
A B 1 5 100
A C 9 15 40
A B 7 12 60

... it's possible to see that there are 3 IP flows A B that overlapped in time: 1st row (with Start_time 10 and End_time 16), 2nd row (with Start_time 8 and End_time 13) and 5th row (with Start_time 7 and End_time 12). The IP flow A B in the 3rd row didn't overlap with any other IP flow A B so it does not have to be merged. It's interesting to notice that the IP flow A C (4th row) overlaps, but IP flow A C is different of IP flow A B (i.e., they have different destinations).

Therefore, the result that I would like to get is something like this:

Src Dst Start_time End_time Bytes
A B 7 16 120 *
A B 1 5 100
A C 9 15 40

* Notice here that the 'new' Start_time should be the earliest start_time (e.g., among the overlapped flows with Start_times 7, 8, and 10, 7 is the earliest) and the new End_time should be the latest end_time (among the overlapped flows with End_times 12, 13 and 16, 16 is the latest).

I hope I've made myself clearer now. If not, please let me know.

Regards,

Sounds good, but describe how this data would be represented:
Src Dst Start_time End_time Bytes
A B 1 5 10
A B 4 9 10
A B 8 13 10
A C 1 5 10
A C 5 7 10

**corsin** · Oct 2 '07, 03:49 PM

Originally posted by Motoma

Sounds good, but describe how this data would be represented:
Src Dst Start_time End_time Bytes
A B 1 5 10
A B 4 9 10
A B 8 13 10
A C 1 5 10
A C 5 7 10

Hi!

Src, Dst = varchar(16)
Start_Time, End_Time, and Bytes = bigint(20)

If this is not the answer you're looking for, my apologizes since databases are not my field of expertise :)

**Motoma** · Oct 2 '07, 04:11 PM

Originally posted by corsin

Hi!

Src, Dst = varchar(16)
Start_Time, End_Time, and Bytes = bigint(20)

If this is not the answer you're looking for, my apologizes since databases are not my field of expertise :)

Sorry, let me rephrase: Could you take a look at the dataset I gave, and tell me what the end result would be? That is to say, what do you want the SQL to return when the data is set up as I presented it earlier?

**corsin** · Oct 2 '07, 04:23 PM

Originally posted by Motoma

Sorry, let me rephrase: Could you take a look at the dataset I gave, and tell me what the end result would be? That is to say, what do you want the SQL to return when the data is set up as I presented it earlier?

Ok... I thought you meant the data type. Here we go then

Src Dst Start_time End_time Bytes
A B 1 5 10
A B 4 9 10
A B 8 13 10
A C 1 5 10
A C 5 7 10

The expected result should be:
Src Dst Start_time End_time Bytes
A B 1 13 30
A C 1 7 20

Regards,

**Motoma** · Oct 2 '07, 04:40 PM

Originally posted by corsin

Ok... I thought you meant the data type. Here we go then

Src Dst Start_time End_time Bytes
A B 1 5 10
A B 4 9 10
A B 8 13 10
A C 1 5 10
A C 5 7 10

The expected result should be:
Src Dst Start_time End_time Bytes
A B 1 13 30
A C 1 7 20

Regards,

As I expected.
There is no good way to do this with SQL; you can do it with cursors, but my expectation is that it will be much slower than performing the same calculations with code.

**corsin** · Oct 3 '07, 05:02 PM

Originally posted by Motoma

As I expected.
There is no good way to do this with SQL; you can do it with cursors, but my expectation is that it will be much slower than performing the same calculations with code.

Hello Motoma!

Could you please show me an example of how cursors would do the job? I tried with code and it worked quite well with a small set of rows, but with a table of +or- 5000000 rows my program dies.

I've the SQL query below without recursivity, which results that I can't get all consecutive overlapping intervals:

with ver1 as (
select a.id AS origem,
b.id AS pertence
from VER a,
VER b
where
a.ipv4_src = b.ipv4_src and
a.ipv4_dst = b.ipv4_dst and
a.port_src = b.port_src and
a.port_dst = b.port_dst and
(a.start_time between b.start_time and b.end_time or
a.end_time between b.start_time and b.end_time)
),
v12 as (
select
distinct a.origem AS origem,
a.pertence AS pertence
from
ver1 a
union
select
distinct a.origem AS origem,
b.pertence AS pertence
from
ver1 a,
ver1 b
where
a.pertence = b.origem
),
v13 as (
select
distinct a.origem AS origem,
a.pertence AS pertence
from
v12 a
union
select
distinct a.origem AS origem,
b.pertence AS pertence
from
v12 a,
v12 b
where
a.pertence = b.origem
),
ver2 as (
select
distinct a.origem AS origem,
b.ipv4_src, b.ipv4_dst,
b.port_src, b.port_dst,
min(b.start_tim e) as start_time,
max(b.end_time) as end_time,
sum(b.nb_bytes) AS nb_bytes
from v13 a,
VER b
where
a.pertence = b.id and
not exists
(select 1
from ver1 c
where
c.pertence = a.origem and
c.origem a.origem)
group by
origem, b.ipv4_src, b.ipv4_dst, b.port_src, b.port_dst
)
select from ver2;

**Motoma** · Oct 3 '07, 05:32 PM

Okay, here is the way I would do it, in descriptive pseudo code:

Code:

Create CURSOR list of unique paths (treat source-dest as PK)
For each element in PathCursor
{
    Set totalBias = 0;
    Set totalTransfered = 0;
    Construct CURSOR which a UNION of two SELECTs, merge StartTime and EndTime into one column labeled Time, along with a colunm labeled Bias which is 1 for StartTimes, and -1 for EndTimes, and if it is an end time, include the BytesTransfered, ORDER the result set BY Time;
    Set PathStartTime = 0;
    For each element in TimeCursor
    {
        if PathStartTime = 0
        {
            PathStartTime = StartTime
        }
        Add Bias to TotalBias;
        Add BytesTransfered to totalTransfered;
        If TotalBias = 0
        {
            One row in your result set will be made of Path, PathStartTime, EndTime, totalBytes;
            Set PathStartTime, totalByte = 0;
        }
    }
}

**flgllm1** · Oct 18 '07, 03:49 AM

How about this:

SELECT b.id, 'Overlaps", a.id
from
(SELECT * from TBL) a,
(SELECT * from TBL) b,
where
a.id > b.id and
b.start >= a.start
and b.start <= a.end
order by b.id

**Motoma** · Oct 18 '07, 12:58 PM

Originally posted by flgllm1

How about this:

SELECT b.id, 'Overlaps", a.id
from
(SELECT * from TBL) a,
(SELECT * from TBL) b,
where
a.id > b.id and
b.start >= a.start
and b.start <= a.end
order by b.id

Hi flgllm1! Welcome to The Scripts.
Looking at the original problem, I don't believe that a single JOIN will be able to perform the necessary operations: take for example this case.

Code:

start     end     tx
0         15      10
10        25      10
20        35      10
30        45      10

This would be combined into one connection, with a start time of 0, an end time of 45, and a bytes transfered value of 40.

Finding overlapping time intervals

Finding overlapping time intervals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment