Finding overlapping time intervals

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • corsin
    New Member
    • Oct 2007
    • 5

    Finding overlapping time intervals

    Hello everybody!
    I've been trying to find a way to create a MySQL query to find overlapping intervals, but no success so far. I've the following table as an example:

    Src Dst Start_time End_time Bytes
    A B 10 16 40
    A B 8 13 20
    A B 1 5 100
    A C 9 15 40
    A B 7 12 60

    I would like to group by 'Src' and 'Dst' and have a result that looks like the following:

    Src Dst Bytes
    A B 120
    A B 100
    A C 40

    Does anybody has any idea about how to create a query to achieve that? Any help will be appreciated!

    Regards,
  • Motoma
    Recognized Expert Specialist
    • Jan 2007
    • 3236

    #2
    Could you explain what the logic is behind how you came to your result set? I am afraid it is not as apparent to us as it is to you.

    Comment

    • corsin
      New Member
      • Oct 2007
      • 5

      #3
      Hi Motoma! Thanks for your reply!

      The logic is as follows. I have several IP network flows in a MySQL database. These IP flows are basically represented as I wrote in my post (i.e., Src, Dst, Start_Time , End_Time, and amount of bytes). What I want to do is to merge those IP flows with the same source (Src) and Destination (Dst) that in some point in time overlapped.

      Following the example that I gave to you we have...

      Src Dst Start_time End_time Bytes
      A B 10 16 40
      A B 8 13 20
      A B 1 5 100
      A C 9 15 40
      A B 7 12 60

      ... it's possible to see that there are 3 IP flows A B that overlapped in time: 1st row (with Start_time 10 and End_time 16), 2nd row (with Start_time 8 and End_time 13) and 5th row (with Start_time 7 and End_time 12). The IP flow A B in the 3rd row didn't overlap with any other IP flow A B so it does not have to be merged. It's interesting to notice that the IP flow A C (4th row) overlaps, but IP flow A C is different of IP flow A B (i.e., they have different destinations).

      Therefore, the result that I would like to get is something like this:

      Src Dst Start_time End_time Bytes
      A B 7 16 120 *
      A B 1 5 100
      A C 9 15 40

      * Notice here that the 'new' Start_time should be the earliest start_time (e.g., among the overlapped flows with Start_times 7, 8, and 10, 7 is the earliest) and the new End_time should be the latest end_time (among the overlapped flows with End_times 12, 13 and 16, 16 is the latest).

      I hope I've made myself clearer now. If not, please let me know.

      Regards,

      Comment

      • Motoma
        Recognized Expert Specialist
        • Jan 2007
        • 3236

        #4
        Originally posted by corsin
        Hi Motoma! Thanks for your reply!

        The logic is as follows. I have several IP network flows in a MySQL database. These IP flows are basically represented as I wrote in my post (i.e., Src, Dst, Start_Time , End_Time, and amount of bytes). What I want to do is to merge those IP flows with the same source (Src) and Destination (Dst) that in some point in time overlapped.

        Following the example that I gave to you we have...

        Src Dst Start_time End_time Bytes
        A B 10 16 40
        A B 8 13 20
        A B 1 5 100
        A C 9 15 40
        A B 7 12 60

        ... it's possible to see that there are 3 IP flows A B that overlapped in time: 1st row (with Start_time 10 and End_time 16), 2nd row (with Start_time 8 and End_time 13) and 5th row (with Start_time 7 and End_time 12). The IP flow A B in the 3rd row didn't overlap with any other IP flow A B so it does not have to be merged. It's interesting to notice that the IP flow A C (4th row) overlaps, but IP flow A C is different of IP flow A B (i.e., they have different destinations).

        Therefore, the result that I would like to get is something like this:

        Src Dst Start_time End_time Bytes
        A B 7 16 120 *
        A B 1 5 100
        A C 9 15 40

        * Notice here that the 'new' Start_time should be the earliest start_time (e.g., among the overlapped flows with Start_times 7, 8, and 10, 7 is the earliest) and the new End_time should be the latest end_time (among the overlapped flows with End_times 12, 13 and 16, 16 is the latest).

        I hope I've made myself clearer now. If not, please let me know.

        Regards,
        Sounds good, but describe how this data would be represented:
        Src Dst Start_time End_time Bytes
        A B 1 5 10
        A B 4 9 10
        A B 8 13 10
        A C 1 5 10
        A C 5 7 10

        Comment

        • corsin
          New Member
          • Oct 2007
          • 5

          #5
          Originally posted by Motoma
          Sounds good, but describe how this data would be represented:
          Src Dst Start_time End_time Bytes
          A B 1 5 10
          A B 4 9 10
          A B 8 13 10
          A C 1 5 10
          A C 5 7 10
          Hi!

          Src, Dst = varchar(16)
          Start_Time, End_Time, and Bytes = bigint(20)

          If this is not the answer you're looking for, my apologizes since databases are not my field of expertise :)

          Comment

          • Motoma
            Recognized Expert Specialist
            • Jan 2007
            • 3236

            #6
            Originally posted by corsin
            Hi!

            Src, Dst = varchar(16)
            Start_Time, End_Time, and Bytes = bigint(20)

            If this is not the answer you're looking for, my apologizes since databases are not my field of expertise :)
            Sorry, let me rephrase: Could you take a look at the dataset I gave, and tell me what the end result would be? That is to say, what do you want the SQL to return when the data is set up as I presented it earlier?

            Comment

            • corsin
              New Member
              • Oct 2007
              • 5

              #7
              Originally posted by Motoma
              Sorry, let me rephrase: Could you take a look at the dataset I gave, and tell me what the end result would be? That is to say, what do you want the SQL to return when the data is set up as I presented it earlier?
              Ok... I thought you meant the data type. Here we go then

              Src Dst Start_time End_time Bytes
              A B 1 5 10
              A B 4 9 10
              A B 8 13 10
              A C 1 5 10
              A C 5 7 10

              The expected result should be:
              Src Dst Start_time End_time Bytes
              A B 1 13 30
              A C 1 7 20

              Regards,

              Comment

              • Motoma
                Recognized Expert Specialist
                • Jan 2007
                • 3236

                #8
                Originally posted by corsin
                Ok... I thought you meant the data type. Here we go then

                Src Dst Start_time End_time Bytes
                A B 1 5 10
                A B 4 9 10
                A B 8 13 10
                A C 1 5 10
                A C 5 7 10

                The expected result should be:
                Src Dst Start_time End_time Bytes
                A B 1 13 30
                A C 1 7 20

                Regards,

                As I expected.
                There is no good way to do this with SQL; you can do it with cursors, but my expectation is that it will be much slower than performing the same calculations with code.

                Comment

                • corsin
                  New Member
                  • Oct 2007
                  • 5

                  #9
                  Originally posted by Motoma
                  As I expected.
                  There is no good way to do this with SQL; you can do it with cursors, but my expectation is that it will be much slower than performing the same calculations with code.
                  Hello Motoma!

                  Could you please show me an example of how cursors would do the job? I tried with code and it worked quite well with a small set of rows, but with a table of +or- 5000000 rows my program dies.

                  I've the SQL query below without recursivity, which results that I can't get all consecutive overlapping intervals:

                  with ver1 as (
                  select a.id AS origem,
                  b.id AS pertence
                  from VER a,
                  VER b
                  where
                  a.ipv4_src = b.ipv4_src and
                  a.ipv4_dst = b.ipv4_dst and
                  a.port_src = b.port_src and
                  a.port_dst = b.port_dst and
                  (a.start_time between b.start_time and b.end_time or
                  a.end_time between b.start_time and b.end_time)
                  ),
                  v12 as (
                  select
                  distinct a.origem AS origem,
                  a.pertence AS pertence
                  from
                  ver1 a
                  union
                  select
                  distinct a.origem AS origem,
                  b.pertence AS pertence
                  from
                  ver1 a,
                  ver1 b
                  where
                  a.pertence = b.origem
                  ),
                  v13 as (
                  select
                  distinct a.origem AS origem,
                  a.pertence AS pertence
                  from
                  v12 a
                  union
                  select
                  distinct a.origem AS origem,
                  b.pertence AS pertence
                  from
                  v12 a,
                  v12 b
                  where
                  a.pertence = b.origem
                  ),
                  ver2 as (
                  select
                  distinct a.origem AS origem,
                  b.ipv4_src, b.ipv4_dst,
                  b.port_src, b.port_dst,
                  min(b.start_tim e) as start_time,
                  max(b.end_time) as end_time,
                  sum(b.nb_bytes) AS nb_bytes
                  from v13 a,
                  VER b
                  where
                  a.pertence = b.id and
                  not exists
                  (select 1
                  from ver1 c
                  where
                  c.pertence = a.origem and
                  c.origem a.origem)
                  group by
                  origem, b.ipv4_src, b.ipv4_dst, b.port_src, b.port_dst
                  )
                  select from ver2;

                  Comment

                  • Motoma
                    Recognized Expert Specialist
                    • Jan 2007
                    • 3236

                    #10
                    Okay, here is the way I would do it, in descriptive pseudo code:
                    Code:
                    Create CURSOR list of unique paths (treat source-dest as PK)
                    For each element in PathCursor
                    {
                        Set totalBias = 0;
                        Set totalTransfered = 0;
                        Construct CURSOR which a UNION of two SELECTs, merge StartTime and EndTime into one column labeled Time, along with a colunm labeled Bias which is 1 for StartTimes, and -1 for EndTimes, and if it is an end time, include the BytesTransfered, ORDER the result set BY Time;
                        Set PathStartTime = 0;
                        For each element in TimeCursor
                        {
                            if PathStartTime = 0
                            {
                                PathStartTime = StartTime
                            }
                            Add Bias to TotalBias;
                            Add BytesTransfered to totalTransfered;
                            If TotalBias = 0
                            {
                                One row in your result set will be made of Path, PathStartTime, EndTime, totalBytes;
                                Set PathStartTime, totalByte = 0;
                            }
                        }
                    }

                    Comment

                    • flgllm1
                      New Member
                      • Oct 2007
                      • 1

                      #11
                      How about this:

                      SELECT b.id, 'Overlaps", a.id
                      from
                      (SELECT * from TBL) a,
                      (SELECT * from TBL) b,
                      where
                      a.id > b.id and
                      b.start >= a.start
                      and b.start <= a.end
                      order by b.id

                      Comment

                      • Motoma
                        Recognized Expert Specialist
                        • Jan 2007
                        • 3236

                        #12
                        Originally posted by flgllm1
                        How about this:

                        SELECT b.id, 'Overlaps", a.id
                        from
                        (SELECT * from TBL) a,
                        (SELECT * from TBL) b,
                        where
                        a.id > b.id and
                        b.start >= a.start
                        and b.start <= a.end
                        order by b.id
                        Hi flgllm1! Welcome to The Scripts.
                        Looking at the original problem, I don't believe that a single JOIN will be able to perform the necessary operations: take for example this case.

                        Code:
                        start     end     tx
                        0         15      10
                        10        25      10
                        20        35      10
                        30        45      10
                        This would be combined into one connection, with a start time of 0, an end time of 45, and a bytes transfered value of 40.

                        Comment

                        Working...