Date range problem

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Thomas R. Hummel

    Date range problem

    Hello all,

    I am trying to write a query that compares a member's enrollment period
    with the products that their group has had during that period (all
    members belong to a group and the products that the member has are
    based on that group). I need to get the date range for all products
    that the member had during their enrollment.

    Here are a few rules:
    - In the source table there are some group products that have two
    ranges that are really contiguous. This is because another column that
    we don't care about may have changed between those two periods. If the
    end_date = DATEADD(dy, 1, start_date) then the two periods are actually
    contiguous. These should only appear as one row in the output.
    - If the gap is greater than one day then two rows should result
    - If the product changes, of course it should be two rows in the output
    - If a group has a product from before the start of the member's
    enrollment then the start_date for the row should be the member's
    start_date. If the product extends past the member leaving the group
    then the end_date should be that of the member.
    - In my sample data below I only have as many as two rows back to back
    for the same product that are contiguous. In reality there could be
    even more than that.

    I have SQL that will join the two tables based on either the start or
    the end date of the group product falling in the member's enrollment
    period, but I'm not sure of the best way to merge the contiguous date
    ranges into single rows. Any suggestions?

    Erland, despite it being late on a Friday afternoon, the SQL and sample
    output are below. ;-)

    Thanks,
    -Tom.

    CREATE TABLE Members (
    group_id INT NOT NULL,
    member_id INT NOT NULL,
    start_date DATETIME NOT NULL,
    end_date DATETIME NOT NULL )
    GO

    ALTER TABLE Membership ADD CONSTRAINT PRIMARY KEY PK_Members PRIMARY
    KEY (group_id, member_id)
    GO

    CREATE TABLE Group_Products (
    group_id INT NOT NULL,
    product_id INT NOT NULL,
    start_date DATETIME NOT NULL,
    end_date DATETIME NOT NULL )
    GO

    ALTER TABLE Group_Products ADD CONSTRAINT PRIMARY KEY PK_Group_Produc ts
    PRIMARY KEY (group_id, product_id, start_date)
    GO

    INSERT INTO Members VALUES (1, 1, '2002-01-01', '9999-12-31')
    INSERT INTO Members VALUES (1, 2, '2004-11-01', '9999-12-31')
    INSERT INTO Members VALUES (1, 3, '2000-10-01', '2004-12-31')
    INSERT INTO Members VALUES (2, 4, '2002-01-01', '2005-01-15')
    INSERT INTO Members VALUES (2, 5, '2004-10-01', '9999-12-31')
    GO

    INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
    INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-11-15')
    INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
    INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
    INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')
    GO

    Expected Output:

    group_id member_id product_id start_date end_date
    -------- --------- ---------- ---------- ----------
    1 1 1 2002-01-01 2004-11-15
    1 1 2 2004-11-16 9999-12-31
    1 2 1 2004-11-01 2004-11-15
    1 2 2 2004-11-16 9999-12-31
    1 3 1 2001-01-01 2004-11-15
    1 3 2 2004-11-16 2004-12-31
    2 4 1 2002-01-01 2004-11-01
    2 4 1 2004-11-15 9999-12-31
    2 5 1 2004-10-01 2004-11-01
    2 5 1 2004-11-15 9999-12-31

  • Erland Sommarskog

    #2
    Re: Date range problem

    Thomas R. Hummel (tom_hummel@hot mail.com) writes:[color=blue]
    > I have SQL that will join the two tables based on either the start or
    > the end date of the group product falling in the member's enrollment
    > period, but I'm not sure of the best way to merge the contiguous date
    > ranges into single rows. Any suggestions?
    >
    > Erland, despite it being late on a Friday afternoon, the SQL and sample
    > output are below. ;-)[/color]

    Hey, here on my side it's not late Friday afternoon, it's just about
    midnight!

    Anyway, this looks like it will work, but please test further:

    SELECT m.group_id, m.member_id, gp.product_id, gp.start_date, gp.end_date
    FROM Members m
    JOIN (SELECT a.group_id, a.product_id, a.start_date,
    end_date = MIN(b.end_date)
    FROM Group_Products a
    JOIN Group_Products b ON a.group_id = b.group_id
    AND a.product_id = b.product_id
    WHERE a.start_date <= b.start_date
    AND NOT EXISTS
    (SELECT *
    FROM Group_Products c
    WHERE a.group_id = c.group_id
    AND a.product_id = c.product_id
    AND dateadd(DAY, -1, a.start_date) = c.end_date)
    AND NOT EXISTS
    (SELECT *
    FROM Group_Products d
    WHERE b.group_id = d.group_id
    AND b.product_id = d.product_id
    AND b.end_date = dateadd(DAY, -1, d.start_date))
    GROUP BY a.group_id, a.product_id, a.start_date) gp
    ON m.group_id = gp.group_id
    ORDER BY m.group_id, m.member_id, gp.product_id, gp.start_date

    All the fun goes on the derived table. First I make partial self-join
    between two instances of Group_Products, so I get all combinations
    of start_date, end_date for a group/product combo. The first WHERE
    condition filters away those where the left table has start_date later
    then than to the right. Then the two NOT EXISTS does away with the
    rows where end_date to the left has an adjacent start_date and
    vice versa.

    Here is some augmented test data:

    INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
    INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-01-05')
    INSERT INTO Group_Products VALUES (1, 1, '2004-01-06', '2004-11-15')
    INSERT INTO Group_Products VALUES (1, 1, '2005-01-06', '2005-11-15')
    INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
    INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
    INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')

    There's three adjacent periods for 1/1, and then there is a non-adjcent
    rerun as well.


    --
    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

    Books Online for SQL Server SP3 at
    Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

    Comment

    • Hugo Kornelis

      #3
      Re: Date range problem

      On 18 Mar 2005 13:01:32 -0800, Thomas R. Hummel wrote:
      [color=blue]
      >Erland, despite it being late on a Friday afternoon, the SQL and sample
      >output are below. ;-)[/color]

      Hi Tom,

      I hope this doesn't mean that only Erland may reply?

      Let's break this down in easy steps.

      First, create a query that uses lots of self-joins and not exists
      queries to join contiguous group_products. The logic is: only take rows
      that are the start of a contiguous set (no row has end_date equal to
      start_date - 1); join these to all rows with same group/product that
      don't start earlier and that are not themselves the start of a
      contiguous set. In SQL:

      SELECT a.group_id, a.product_id, a.start_date, MAX(b.end_date)
      FROM Group_Products AS a
      INNER JOIN Group_Products AS b
      ON b.group_id = a.group_id
      AND b.product_id = a.product_id
      AND b.start_date >= a.start_date
      AND NOT EXISTS (SELECT *
      FROM Group_Products AS c
      WHERE c.group_id = a.group_id
      AND c.product_id = a.product_id
      AND c.start_date > a.start_date
      AND c.start_date < b.end_date
      AND NOT EXISTS (SELECT *
      FROM Group_Products AS d
      WHERE d.group_id = c.group_id
      AND d.product_id =
      c.product_id
      AND d.end_date =
      DATEADD(day, -1, c.start_date)))
      WHERE NOT EXISTS (SELECT *
      FROM Group_Products AS e
      WHERE e.group_id = a.group_id
      AND e.product_id = a.product_id
      AND e.end_date = DATEADD(day, -1,
      a.start_date))
      GROUP BY a.group_id, a.product_id, a.start_date

      Converting NOT EXISTS to outer joins can be interesting. It sometimes
      speeds up the query. And if you do it with nested NOT EXISTS, you end up
      with nested outer joins - always fun, at friday midnight! <g>

      SELECT a.group_id, a.product_id, a.start_date, MAX(b.end_date)
      FROM Group_Products AS a
      INNER JOIN Group_Products AS b
      ON b.group_id = a.group_id
      AND b.product_id = a.product_id
      AND b.start_date >= a.start_date
      LEFT JOIN Group_Products AS c
      LEFT JOIN Group_Products AS d
      ON d.group_id = c.group_id
      AND d.product_id = c.product_id
      AND d.end_date = DATEADD(day, -1, c.start_date)
      ON c.group_id = a.group_id
      AND c.product_id = a.product_id
      AND c.start_date > a.start_date
      AND c.start_date < b.end_date
      AND d.group_id IS NULL
      LEFT JOIN Group_Products AS e
      ON e.group_id = a.group_id
      AND e.product_id = a.product_id
      AND e.end_date = DATEADD(day, -1, a.start_date)
      WHERE e.group_id IS NULL
      AND c.group_id IS NULL
      GROUP BY a.group_id, a.product_id, a.start_date

      In this case, I saw no performance difference. Test it on your own data
      to see how it works for you. I'll stick with this version for the rest
      of the post (if only to prevent linewrapping). you use whichever suits
      you best.

      Once we have this result, the rest is easy. Just use the query above as
      derived table (or you could create a view) and join that to the members
      table. Use CASE to find out which start_date and end_date to display:

      SELECT m.group_id, m.member_id, g.product_id,
      CASE WHEN m.start_date < g.start_date
      THEN g.start_date
      ELSE m.start_date END AS start_date,
      CASE WHEN m.end_date > g.end_date
      THEN g.end_date
      ELSE m.end_date END AS end_date
      FROM Members AS m
      INNER JOIN (SELECT a.group_id, a.product_id, a.start_date,
      MAX(b.end_date) AS end_date
      FROM Group_Products AS a
      INNER JOIN Group_Products AS b
      ON b.group_id = a.group_id
      AND b.product_id = a.product_id
      AND b.start_date >= a.start_date
      LEFT JOIN Group_Products AS c
      LEFT JOIN Group_Products AS d
      ON d.group_id = c.group_id
      AND d.product_id = c.product_id
      AND d.end_date = DATEADD(day, -1, c.start_date)
      ON c.group_id = a.group_id
      AND c.product_id = a.product_id
      AND c.start_date > a.start_date
      AND c.start_date < b.end_date
      AND d.group_id IS NULL
      LEFT JOIN Group_Products AS e
      ON e.group_id = a.group_id
      AND e.product_id = a.product_id
      AND e.end_date = DATEADD(day, -1, a.start_date)
      WHERE e.group_id IS NULL
      AND c.group_id IS NULL
      GROUP BY a.group_id, a.product_id, a.start_date) AS g
      ON g.group_id = m.group_id
      AND g.start_date <= m.end_date
      AND g.end_date >= m.start_date
      ORDER BY m.group_id, m.member_id, g.product_id, start_date

      This query produces the requested output when I run it on your test
      data. I didn't test it on other input data.

      Best, Hugo
      --

      (Remove _NO_ and _SPAM_ to get my e-mail address)

      Comment

      • Hugo Kornelis

        #4
        Re: Date range problem

        On Fri, 18 Mar 2005 23:03:10 +0000 (UTC), Erland Sommarskog wrote:
        [color=blue]
        >Anyway, this looks like it will work, but please test further:[/color]

        Drat! You did manage to beat me to it...

        Have a nice weekend!

        Best, Hugo
        --

        (Remove _NO_ and _SPAM_ to get my e-mail address)

        Comment

        • John Gilson

          #5
          Re: Date range problem

          "Thomas R. Hummel" <tom_hummel@hot mail.com> wrote in message
          news:1111179692 .412743.218250@ f14g2000cwb.goo glegroups.com.. .[color=blue]
          > Hello all,
          >
          > I am trying to write a query that compares a member's enrollment period
          > with the products that their group has had during that period (all
          > members belong to a group and the products that the member has are
          > based on that group). I need to get the date range for all products
          > that the member had during their enrollment.
          >
          > Here are a few rules:
          > - In the source table there are some group products that have two
          > ranges that are really contiguous. This is because another column that
          > we don't care about may have changed between those two periods. If the
          > end_date = DATEADD(dy, 1, start_date) then the two periods are actually
          > contiguous. These should only appear as one row in the output.
          > - If the gap is greater than one day then two rows should result
          > - If the product changes, of course it should be two rows in the output
          > - If a group has a product from before the start of the member's
          > enrollment then the start_date for the row should be the member's
          > start_date. If the product extends past the member leaving the group
          > then the end_date should be that of the member.
          > - In my sample data below I only have as many as two rows back to back
          > for the same product that are contiguous. In reality there could be
          > even more than that.
          >
          > I have SQL that will join the two tables based on either the start or
          > the end date of the group product falling in the member's enrollment
          > period, but I'm not sure of the best way to merge the contiguous date
          > ranges into single rows. Any suggestions?
          >
          > Erland, despite it being late on a Friday afternoon, the SQL and sample
          > output are below. ;-)
          >
          > Thanks,
          > -Tom.
          >
          > CREATE TABLE Members (
          > group_id INT NOT NULL,
          > member_id INT NOT NULL,
          > start_date DATETIME NOT NULL,
          > end_date DATETIME NOT NULL )
          > GO
          >
          > ALTER TABLE Membership ADD CONSTRAINT PRIMARY KEY PK_Members PRIMARY
          > KEY (group_id, member_id)
          > GO
          >
          > CREATE TABLE Group_Products (
          > group_id INT NOT NULL,
          > product_id INT NOT NULL,
          > start_date DATETIME NOT NULL,
          > end_date DATETIME NOT NULL )
          > GO
          >
          > ALTER TABLE Group_Products ADD CONSTRAINT PRIMARY KEY PK_Group_Produc ts
          > PRIMARY KEY (group_id, product_id, start_date)
          > GO
          >
          > INSERT INTO Members VALUES (1, 1, '2002-01-01', '9999-12-31')
          > INSERT INTO Members VALUES (1, 2, '2004-11-01', '9999-12-31')
          > INSERT INTO Members VALUES (1, 3, '2000-10-01', '2004-12-31')
          > INSERT INTO Members VALUES (2, 4, '2002-01-01', '2005-01-15')
          > INSERT INTO Members VALUES (2, 5, '2004-10-01', '9999-12-31')
          > GO
          >
          > INSERT INTO Group_Products VALUES (1, 1, '2001-01-01', '2003-12-31')
          > INSERT INTO Group_Products VALUES (1, 1, '2004-01-01', '2004-11-15')
          > INSERT INTO Group_Products VALUES (1, 2, '2004-11-16', '9999-12-31')
          > INSERT INTO Group_Products VALUES (2, 1, '2002-01-01', '2004-11-01')
          > INSERT INTO Group_Products VALUES (2, 1, '2004-11-15', '9999-12-31')
          > GO
          >
          > Expected Output:
          >
          > group_id member_id product_id start_date end_date
          > -------- --------- ---------- ---------- ----------
          > 1 1 1 2002-01-01 2004-11-15
          > 1 1 2 2004-11-16 9999-12-31
          > 1 2 1 2004-11-01 2004-11-15
          > 1 2 2 2004-11-16 9999-12-31
          > 1 3 1 2001-01-01 2004-11-15
          > 1 3 2 2004-11-16 2004-12-31
          > 2 4 1 2002-01-01 2004-11-01
          > 2 4 1 2004-11-15 9999-12-31
          > 2 5 1 2004-10-01 2004-11-01
          > 2 5 1 2004-11-15 9999-12-31[/color]

          The following view will coalesce ranges. It's been separated out from
          the main query for readability and other applicability.

          CREATE VIEW GroupProductsCo alesced
          (group_id, product_id, start_date, end_date)
          AS
          SELECT P.group_id, P.product_id,
          MIN(P.start_dat e) AS start_date, P.end_date
          FROM (SELECT P2.group_id, P2.product_id,
          P2.start_date, MIN(P1.end_date ) AS end_date
          FROM (SELECT group_id, product_id, start_date, end_date
          FROM Group_Products AS P1
          WHERE NOT EXISTS
          (SELECT *
          FROM Group_Products AS P2
          WHERE P1.group_id = P2.group_id AND
          P1.product_id = P2.product_id AND
          P1.end_date = P2.start_date - 1))
          AS P1
          INNER JOIN
          Group_Products AS P2
          ON P1.group_id = P2.group_id AND
          P1.product_id = P2.product_id AND
          P1.start_date >= P2.start_date
          GROUP BY P2.group_id, P2.product_id, P2.start_date) AS P
          GROUP BY P.group_id, P.product_id, P.end_date

          SELECT M.group_id, M.member_id, P.product_id,
          CASE WHEN P.start_date >= M.start_date
          THEN P.start_date
          ELSE M.start_date
          END AS start_date,
          CASE WHEN P.end_date <= M.end_date
          THEN P.end_date
          ELSE M.end_date
          END AS end_date
          FROM Members AS M
          INNER JOIN
          GroupProductsCo alesced AS P
          ON M.group_id = P.group_id AND
          P.start_date < M.end_date AND
          P.end_date > M.start_date
          ORDER BY M.group_id, M.member_id, P.product_id, start_date

          group_id member_id product_id start_date end_date
          1 1 1 2002-01-01 00:00:00.000 2004-11-15 00:00:00.000
          1 1 2 2004-11-16 00:00:00.000 9999-12-31 00:00:00.000
          1 2 1 2004-11-01 00:00:00.000 2004-11-15 00:00:00.000
          1 2 2 2004-11-16 00:00:00.000 9999-12-31 00:00:00.000
          1 3 1 2001-01-01 00:00:00.000 2004-11-15 00:00:00.000
          1 3 2 2004-11-16 00:00:00.000 2004-12-31 00:00:00.000
          2 4 1 2002-01-01 00:00:00.000 2004-11-01 00:00:00.000
          2 4 1 2004-11-15 00:00:00.000 2005-01-15 00:00:00.000
          2 5 1 2004-10-01 00:00:00.000 2004-11-01 00:00:00.000
          2 5 1 2004-11-15 00:00:00.000 9999-12-31 00:00:00.000

          --
          JAG


          Comment

          • Thomas R. Hummel

            #6
            Re: Date range problem

            I ended up using a method similar to this, but encapsulating the main
            logic for determining the ranges within a view as John had suggested. I
            checked about a dozen different cases and it looks like it is working
            correctly. Performance isn't too bad either.

            Thanks to everyone for the suggestions.

            -Tom.

            Comment

            • Thomas R. Hummel

              #7
              Re: Date range problem

              Well, for some reason my original reply to this never seemed to get
              posted, so...
              [color=blue]
              >On 18 Mar 2005 13:01:32 -0800, Thomas R. Hummel wrote:[color=green]
              >>Erland, despite it being late on a Friday afternoon, the SQL and[/color][/color]
              sample[color=blue][color=green]
              >>output are below. ;-)[/color]
              >
              > I hope this doesn't mean that only Erland may reply?[/color]

              Not at all. Last week I posted a question to the newsgroup and I
              neglected to include all of the CREATE TABLE, INSERT, and expected
              outcome information. Erland gave me a (minor) hard time about it and my
              excuse was that it was late on a Friday. I was just making a reference
              to that.

              Thanks for your solution!

              -Tom.

              Comment

              • Hugo Kornelis

                #8
                Re: Date range problem

                On 22 Mar 2005 13:27:41 -0800, Thomas R. Hummel wrote:
                [color=blue][color=green]
                >> I hope this doesn't mean that only Erland may reply?[/color]
                >
                >Not at all.[/color]
                (snip)

                Hi Tom,

                Woops, I appear to have caused confusion.

                I meant to write a smiley on that line, but I apparently forgot it. Here
                is is: :-)

                Best, Hugo
                --

                (Remove _NO_ and _SPAM_ to get my e-mail address)

                Comment

                Working...