Query to group sequential items

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • cyrus.kapadia@us.pm.com

    Query to group sequential items

    Let's say I have the following table:

    entry product quality
    1 A 80
    2 A 70
    3 A 80
    4 B 60
    5 B 90
    6 C 80
    7 D 80
    8 A 50
    9 C 70

    I'm looking for a way to find the average "quality" value for a
    SEQUENTIAL GROUPING of the same Product. For exmple, I need an
    average of Entry 1+2+3 (because this is the first grouping of the same
    product type), but NOT want that average to include row 8 (which is
    also Product A, but in a different "group".)

    I'm sure it can be done (because I can describe it!), but I'll be a
    monkey's uncle if I can figure out how. I would imagine it would
    involve some sort of running tally that references the next record as
    it goes... to see if the product type has changed. Perhaps use of a
    temporary table?

    Muchas gracias!!
    Cy.

  • David Rawheiser

    #2
    Re: Query to group sequential items

    Easy way ... cursor or loop thru as you stated.

    WARNING - THE FOLLOWING IS AN UNTESTED HALF BACKED IDEA -
    CONSUME AT YOUR OWN RISK

    The set oriented way would require the addition of a grouping column,
    initially null and populated via update statements from a temp table

    use something like this to generate a set of the grouping transition rows.

    -- GENERATED GROUP IDS AND GET MAX ENTRY IN GROUP
    select
    identity(int,1, 1) as groupid
    a.product,
    a.entry
    into #groupings
    from mytable a
    join mytable b on a.product != b.product and a.entry = b.entry + 1

    -- UPDATES BASE TABLE WITH FOR MAX ENTRY IN GROUP
    update a
    set groupid = g.groupid
    from mytable a
    join #grouping g on a.entry = g.entry

    -- UPDATES PRIOR ENTRIES IN GROUP
    update a
    set a.groupid = g.groupid
    from mytable a
    join #grouping g on a.entry < g.entry
    where a.groupid is null

    -- QUERY TO RETURN RESULTS YOU ARE LOOKING FOR
    select groupid , min( product ) , max( entry ) , min( entry) , sum (
    quantity ) , count(*) , avg( quantity)
    from mytable
    group by groupid

    -------------

    I am not so sure about the 2nd update here, as I am tired and going to bed
    soon. you may also need to join to the grouping temp table on the product
    and also put a not exists() in the where clause, but you may be covered by
    the simple is null to prevent muliple updates.

    Let me know how you make out, and if this points you in a good direction or
    throws you off track.


    <cyrus.kapadia@ us.pm.com> wrote in message
    news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=blue]
    > Let's say I have the following table:
    >
    > entry product quality
    > 1 A 80
    > 2 A 70
    > 3 A 80
    > 4 B 60
    > 5 B 90
    > 6 C 80
    > 7 D 80
    > 8 A 50
    > 9 C 70
    >
    > I'm looking for a way to find the average "quality" value for a
    > SEQUENTIAL GROUPING of the same Product. For exmple, I need an
    > average of Entry 1+2+3 (because this is the first grouping of the same
    > product type), but NOT want that average to include row 8 (which is
    > also Product A, but in a different "group".)
    >
    > I'm sure it can be done (because I can describe it!), but I'll be a
    > monkey's uncle if I can figure out how. I would imagine it would
    > involve some sort of running tally that references the next record as
    > it goes... to see if the product type has changed. Perhaps use of a
    > temporary table?
    >
    > Muchas gracias!!
    > Cy.
    >[/color]



    Comment

    • John Gilson

      #3
      Re: Query to group sequential items

      <cyrus.kapadia@ us.pm.com> wrote in message
      news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=blue]
      > Let's say I have the following table:
      >
      > entry product quality
      > 1 A 80
      > 2 A 70
      > 3 A 80
      > 4 B 60
      > 5 B 90
      > 6 C 80
      > 7 D 80
      > 8 A 50
      > 9 C 70
      >
      > I'm looking for a way to find the average "quality" value for a
      > SEQUENTIAL GROUPING of the same Product. For exmple, I need an
      > average of Entry 1+2+3 (because this is the first grouping of the same
      > product type), but NOT want that average to include row 8 (which is
      > also Product A, but in a different "group".)
      >
      > I'm sure it can be done (because I can describe it!), but I'll be a
      > monkey's uncle if I can figure out how. I would imagine it would
      > involve some sort of running tally that references the next record as
      > it goes... to see if the product type has changed. Perhaps use of a
      > temporary table?
      >
      > Muchas gracias!!
      > Cy.[/color]

      CREATE TABLE ProductEntries
      (
      product_entry INT NOT NULL PRIMARY KEY,
      product_code CHAR(1) NOT NULL,
      product_quality INT NOT NULL
      )

      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (1, 'A', 80)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (2, 'A', 70)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (3, 'A', 80)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (4, 'B', 60)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (5, 'B', 90)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (6, 'C', 80)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (7, 'D', 80)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (8, 'A', 50)
      INSERT INTO ProductEntries (product_entry, product_code, product_quality )
      VALUES (9, 'C', 70)

      SELECT PR.product_code AS product_code,
      PR.start_produc t_entry AS start_product_e ntry,
      MAX(P.product_e ntry) AS end_product_ent ry,
      AVG(CAST(P.prod uct_quality AS DECIMAL)) AS avg_product_qua lity
      FROM (SELECT MIN(PE.product_ entry) AS start_product_e ntry,
      PE.next_product _entry AS end_product_ent ry,
      PE.product_code
      FROM (SELECT P1.product_entr y, P1.product_code ,
      MIN(P2.product_ entry) AS next_product_en try
      FROM ProductEntries AS P1
      LEFT OUTER JOIN
      ProductEntries AS P2
      ON P2.product_entr y > P1.product_entr y AND
      P2.product_code <> P1.product_code
      GROUP BY P1.product_entr y, P1.product_code ) AS PE
      GROUP BY PE.product_code , PE.next_product _entry) AS PR
      INNER JOIN
      ProductEntries AS P
      ON P.product_code = PR.product_code AND
      P.product_entry >= PR.start_produc t_entry AND
      (PR.end_product _entry IS NULL OR
      P.product_entry < PR.end_product_ entry)
      GROUP BY PR.product_code , PR.start_produc t_entry
      ORDER BY start_product_e ntry

      product_code start_product_e ntry end_product_ent ry avg_product_qua lity
      A 1 3 76.666666
      B 4 5 75.000000
      C 6 6 80.000000
      D 7 7 80.000000
      A 8 8 50.000000
      C 9 9 70.000000

      --
      JAG


      Comment

      • David Rawheiser

        #4
        Re: Query to group sequential items

        Sure, that may work as well.

        "John Gilson" <jag@acm.org> wrote in message
        news:5zQtd.7206 0$Vk6.20781@twi ster.nyc.rr.com ...[color=blue]
        > <cyrus.kapadia@ us.pm.com> wrote in message
        > news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=green]
        >> Let's say I have the following table:
        >>
        >> entry product quality
        >> 1 A 80
        >> 2 A 70
        >> 3 A 80
        >> 4 B 60
        >> 5 B 90
        >> 6 C 80
        >> 7 D 80
        >> 8 A 50
        >> 9 C 70
        >>
        >> I'm looking for a way to find the average "quality" value for a
        >> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
        >> average of Entry 1+2+3 (because this is the first grouping of the same
        >> product type), but NOT want that average to include row 8 (which is
        >> also Product A, but in a different "group".)
        >>
        >> I'm sure it can be done (because I can describe it!), but I'll be a
        >> monkey's uncle if I can figure out how. I would imagine it would
        >> involve some sort of running tally that references the next record as
        >> it goes... to see if the product type has changed. Perhaps use of a
        >> temporary table?
        >>
        >> Muchas gracias!!
        >> Cy.[/color]
        >
        > CREATE TABLE ProductEntries
        > (
        > product_entry INT NOT NULL PRIMARY KEY,
        > product_code CHAR(1) NOT NULL,
        > product_quality INT NOT NULL
        > )
        >
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (1, 'A', 80)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (2, 'A', 70)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (3, 'A', 80)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (4, 'B', 60)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (5, 'B', 90)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (6, 'C', 80)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (7, 'D', 80)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (8, 'A', 50)
        > INSERT INTO ProductEntries (product_entry, product_code, product_quality )
        > VALUES (9, 'C', 70)
        >
        > SELECT PR.product_code AS product_code,
        > PR.start_produc t_entry AS start_product_e ntry,
        > MAX(P.product_e ntry) AS end_product_ent ry,
        > AVG(CAST(P.prod uct_quality AS DECIMAL)) AS
        > avg_product_qua lity
        > FROM (SELECT MIN(PE.product_ entry) AS start_product_e ntry,
        > PE.next_product _entry AS end_product_ent ry,
        > PE.product_code
        > FROM (SELECT P1.product_entr y, P1.product_code ,
        > MIN(P2.product_ entry) AS
        > next_product_en try
        > FROM ProductEntries AS P1
        > LEFT OUTER JOIN
        > ProductEntries AS P2
        > ON P2.product_entr y >
        > P1.product_entr y AND
        > P2.product_code <>
        > P1.product_code
        > GROUP BY P1.product_entr y, P1.product_code ) AS
        > PE
        > GROUP BY PE.product_code , PE.next_product _entry) AS PR
        > INNER JOIN
        > ProductEntries AS P
        > ON P.product_code = PR.product_code AND
        > P.product_entry >= PR.start_produc t_entry AND
        > (PR.end_product _entry IS NULL OR
        > P.product_entry < PR.end_product_ entry)
        > GROUP BY PR.product_code , PR.start_produc t_entry
        > ORDER BY start_product_e ntry
        >
        > product_code start_product_e ntry end_product_ent ry avg_product_qua lity
        > A 1 3 76.666666
        > B 4 5 75.000000
        > C 6 6 80.000000
        > D 7 7 80.000000
        > A 8 8 50.000000
        > C 9 9 70.000000
        >
        > --
        > JAG
        >
        >[/color]


        Comment

        • --CELKO--

          #5
          Re: Query to group sequential items

          That is too much work! Let's move the average calculation into a
          scalar subquery that will be done last, after all the clusters are
          found. The little-used = ALL predicate can replace a lot of your
          logic. And we pull up the usual Sequence auxiliary table.

          SELECT prod_code, MIN(start) AS start, finish,
          (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
          FROM ProductEntries AS P3
          WHERE P3.prod_entry
          BETWEEN MIN(start)
          AND X.finish) AS avg_quality
          FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
          FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
          WHERE S1.seq <= S2.seq
          AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
          AND P1.prod_code
          = ALL (SELECT P2.prod_code
          FROM ProductEntries AS P2
          WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
          GROUP BY P1.prod_code, S1.seq)
          AS X (prod_code, start, finish)
          GROUP BY prod_code, finish;

          Another version requires two sentinal values
          --
          INSERT INTO ProductEntries VALUES (0, '?', 0);
          INSERT INTO ProductEntries VALUES (10, '?', 0);

          SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
          (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
          FROM ProductEntries AS P3
          WHERE P3.prod_entry
          BETWEEN S1.seq AND S2.seq) AS avg_quality
          FROM ProductEntries AS P1,
          (SELECT seq FROM Sequence
          UNION ALL SELECT 0) AS S1, Sequence AS S2
          WHERE S1.seq <= S2.seq
          AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
          AND P1.prod_code
          <> (SELECT P3.prod_code
          FROM ProductEntries AS P3
          WHERE P3.prod_entry = S1.seq - 1)
          AND P1.prod_code
          <> (SELECT P4.prod_code
          FROM ProductEntries AS P4
          WHERE P4.prod_entry = S2.seq + 1)
          AND P1.prod_code
          = ALL (SELECT P2.prod_code
          FROM ProductEntries AS P2
          WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);

          Comment

          • John Gilson

            #6
            Re: Query to group sequential items

            "--CELKO--" <jcelko212@eart hlink.net> wrote in message
            news:1102618417 .146169.127100@ c13g2000cwb.goo glegroups.com.. .[color=blue]
            > That is too much work! Let's move the average calculation into a
            > scalar subquery that will be done last, after all the clusters are
            > found. The little-used = ALL predicate can replace a lot of your
            > logic. And we pull up the usual Sequence auxiliary table.
            >
            > SELECT prod_code, MIN(start) AS start, finish,
            > (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
            > FROM ProductEntries AS P3
            > WHERE P3.prod_entry
            > BETWEEN MIN(start)
            > AND X.finish) AS avg_quality
            > FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
            > FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
            > WHERE S1.seq <= S2.seq
            > AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
            > AND P1.prod_code
            > = ALL (SELECT P2.prod_code
            > FROM ProductEntries AS P2
            > WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
            > GROUP BY P1.prod_code, S1.seq)
            > AS X (prod_code, start, finish)
            > GROUP BY prod_code, finish;
            >
            > Another version requires two sentinal values
            > --
            > INSERT INTO ProductEntries VALUES (0, '?', 0);
            > INSERT INTO ProductEntries VALUES (10, '?', 0);
            >
            > SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
            > (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
            > FROM ProductEntries AS P3
            > WHERE P3.prod_entry
            > BETWEEN S1.seq AND S2.seq) AS avg_quality
            > FROM ProductEntries AS P1,
            > (SELECT seq FROM Sequence
            > UNION ALL SELECT 0) AS S1, Sequence AS S2
            > WHERE S1.seq <= S2.seq
            > AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
            > AND P1.prod_code
            > <> (SELECT P3.prod_code
            > FROM ProductEntries AS P3
            > WHERE P3.prod_entry = S1.seq - 1)
            > AND P1.prod_code
            > <> (SELECT P4.prod_code
            > FROM ProductEntries AS P4
            > WHERE P4.prod_entry = S2.seq + 1)
            > AND P1.prod_code
            > = ALL (SELECT P2.prod_code
            > FROM ProductEntries AS P2
            > WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);[/color]

            Less work? Debatable. Also, this won't work if the product_entry values
            aren't consecutive.

            --
            JAG


            Comment

            • --CELKO--

              #7
              Re: Query to group sequential items

              >> Less work? Debatable. <<

              Fewer nesting levels should be a bit faster. But trying to find the
              start and finish points is going to get really bad as the number of row
              increases.
              [color=blue][color=green]
              >> Also, this won't work if the product_entry values[/color][/color]
              aren't consecutive. <<

              It depends on the sequence of tests having no gaps.

              This is one that might be better done with a cursor and a WHILE loop
              that accumulates a count and total of each quality test to a working
              table.

              --CELKO--
              Please post DDL in a human-readable format and not a machne-generated
              one. This way people do not have to guess what the keys, constraints,
              Declarative Referential Integrity, datatypes, etc. in your schema are.
              Sample data is also a good idea, along with clear specifications.


              *** Sent via Developersdex http://www.developersdex.com ***
              Don't just participate in USENET...get rewarded for it!

              Comment

              • John Gilson

                #8
                Re: Query to group sequential items

                "--CELKO--" <remove.jcelko2 12@earthlink.ne t> wrote in message
                news:1102630139 .1f7348d9b8d0e1 527f37d16587e3c ecc@teranews...[color=blue][color=green][color=darkred]
                > >> Less work? Debatable. <<[/color][/color]
                >
                > Fewer nesting levels should be a bit faster. But trying to find the
                > start and finish points is going to get really bad as the number of row
                > increases.
                >[color=green][color=darkred]
                > >> Also, this won't work if the product_entry values[/color][/color]
                > aren't consecutive. <<
                >
                > It depends on the sequence of tests having no gaps.
                >
                > This is one that might be better done with a cursor and a WHILE loop
                > that accumulates a count and total of each quality test to a working
                > table.[/color]

                You could be right but bite your tongue!

                --
                JAG
                [color=blue]
                > --CELKO--
                > Please post DDL in a human-readable format and not a machne-generated
                > one. This way people do not have to guess what the keys, constraints,
                > Declarative Referential Integrity, datatypes, etc. in your schema are.
                > Sample data is also a good idea, along with clear specifications.
                >
                >
                > *** Sent via Developersdex http://www.developersdex.com ***
                > Don't just participate in USENET...get rewarded for it![/color]


                Comment

                • David Rawheiser

                  #9
                  Re: Query to group sequential items

                  Actually if this was an ongoing query: performance-wise it would be best to
                  have a trigger or "phase shift" grouping id set as part of the insert
                  operation.

                  "John Gilson" <jag@acm.org> wrote in message
                  news:bo4ud.7447 7$Vk6.62153@twi ster.nyc.rr.com ...[color=blue]
                  > "--CELKO--" <remove.jcelko2 12@earthlink.ne t> wrote in message
                  > news:1102630139 .1f7348d9b8d0e1 527f37d16587e3c ecc@teranews...[color=green][color=darkred]
                  >> >> Less work? Debatable. <<[/color]
                  >>
                  >> Fewer nesting levels should be a bit faster. But trying to find the
                  >> start and finish points is going to get really bad as the number of row
                  >> increases.
                  >>[color=darkred]
                  >> >> Also, this won't work if the product_entry values[/color]
                  >> aren't consecutive. <<
                  >>
                  >> It depends on the sequence of tests having no gaps.
                  >>
                  >> This is one that might be better done with a cursor and a WHILE loop
                  >> that accumulates a count and total of each quality test to a working
                  >> table.[/color]
                  >
                  > You could be right but bite your tongue!
                  >
                  > --
                  > JAG
                  >[color=green]
                  >> --CELKO--
                  >> Please post DDL in a human-readable format and not a machne-generated
                  >> one. This way people do not have to guess what the keys, constraints,
                  >> Declarative Referential Integrity, datatypes, etc. in your schema are.
                  >> Sample data is also a good idea, along with clear specifications.
                  >>
                  >>
                  >> *** Sent via Developersdex http://www.developersdex.com ***
                  >> Don't just participate in USENET...get rewarded for it![/color]
                  >
                  >[/color]


                  Comment

                  • --CELKO--

                    #10
                    Re: Query to group sequential items

                    >> Actually if this was an ongoing query: performance-wise it would be
                    best to have a trigger or "phase shift" grouping id set as part of the
                    insert operation. <<

                    My impulse is for a "cluster group number" column as each test is done.
                    Look to see if the current quality test is on the same product as the
                    most recent one, etc.

                    --CELKO--
                    Please post DDL in a human-readable format and not a machne-generated
                    one. This way people do not have to guess what the keys, constraints,
                    Declarative Referential Integrity, datatypes, etc. in your schema are.
                    Sample data is also a good idea, along with clear specifications.


                    *** Sent via Developersdex http://www.developersdex.com ***
                    Don't just participate in USENET...get rewarded for it!

                    Comment

                    Working...