Query to group sequential items

**David Rawheiser** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

Easy way ... cursor or loop thru as you stated.

WARNING - THE FOLLOWING IS AN UNTESTED HALF BACKED IDEA -
CONSUME AT YOUR OWN RISK

The set oriented way would require the addition of a grouping column,
initially null and populated via update statements from a temp table

use something like this to generate a set of the grouping transition rows.

-- GENERATED GROUP IDS AND GET MAX ENTRY IN GROUP
select
identity(int,1, 1) as groupid
a.product,
a.entry
into #groupings
from mytable a
join mytable b on a.product != b.product and a.entry = b.entry + 1

-- UPDATES BASE TABLE WITH FOR MAX ENTRY IN GROUP
update a
set groupid = g.groupid
from mytable a
join #grouping g on a.entry = g.entry

-- UPDATES PRIOR ENTRIES IN GROUP
update a
set a.groupid = g.groupid
from mytable a
join #grouping g on a.entry < g.entry
where a.groupid is null

-- QUERY TO RETURN RESULTS YOU ARE LOOKING FOR
select groupid , min( product ) , max( entry ) , min( entry) , sum (
quantity ) , count(*) , avg( quantity)
from mytable
group by groupid

-------------

I am not so sure about the 2nd update here, as I am tired and going to bed
soon. you may also need to join to the grouping temp table on the product
and also put a not exists() in the where clause, but you may be covered by
the simple is null to prevent muliple updates.

Let me know how you make out, and if this points you in a good direction or
throws you off track.

<cyrus.kapadia@ us.pm.com> wrote in message
news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=blue]
> Let's say I have the following table:
>
> entry product quality
> 1 A 80
> 2 A 70
> 3 A 80
> 4 B 60
> 5 B 90
> 6 C 80
> 7 D 80
> 8 A 50
> 9 C 70
>
> I'm looking for a way to find the average "quality" value for a
> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
> average of Entry 1+2+3 (because this is the first grouping of the same
> product type), but NOT want that average to include row 8 (which is
> also Product A, but in a different "group".)
>
> I'm sure it can be done (because I can describe it!), but I'll be a
> monkey's uncle if I can figure out how. I would imagine it would
> involve some sort of running tally that references the next record as
> it goes... to see if the product type has changed. Perhaps use of a
> temporary table?
>
> Muchas gracias!!
> Cy.
>[/color]

**John Gilson** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

<cyrus.kapadia@ us.pm.com> wrote in message
news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=blue]
> Let's say I have the following table:
>
> entry product quality
> 1 A 80
> 2 A 70
> 3 A 80
> 4 B 60
> 5 B 90
> 6 C 80
> 7 D 80
> 8 A 50
> 9 C 70
>
> I'm looking for a way to find the average "quality" value for a
> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
> average of Entry 1+2+3 (because this is the first grouping of the same
> product type), but NOT want that average to include row 8 (which is
> also Product A, but in a different "group".)
>
> I'm sure it can be done (because I can describe it!), but I'll be a
> monkey's uncle if I can figure out how. I would imagine it would
> involve some sort of running tally that references the next record as
> it goes... to see if the product type has changed. Perhaps use of a
> temporary table?
>
> Muchas gracias!!
> Cy.[/color]

CREATE TABLE ProductEntries
(
product_entry INT NOT NULL PRIMARY KEY,
product_code CHAR(1) NOT NULL,
product_quality INT NOT NULL
)

INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (1, 'A', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (2, 'A', 70)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (3, 'A', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (4, 'B', 60)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (5, 'B', 90)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (6, 'C', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (7, 'D', 80)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (8, 'A', 50)
INSERT INTO ProductEntries (product_entry, product_code, product_quality )
VALUES (9, 'C', 70)

SELECT PR.product_code AS product_code,
PR.start_produc t_entry AS start_product_e ntry,
MAX(P.product_e ntry) AS end_product_ent ry,
AVG(CAST(P.prod uct_quality AS DECIMAL)) AS avg_product_qua lity
FROM (SELECT MIN(PE.product_ entry) AS start_product_e ntry,
PE.next_product _entry AS end_product_ent ry,
PE.product_code
FROM (SELECT P1.product_entr y, P1.product_code ,
MIN(P2.product_ entry) AS next_product_en try
FROM ProductEntries AS P1
LEFT OUTER JOIN
ProductEntries AS P2
ON P2.product_entr y > P1.product_entr y AND
P2.product_code <> P1.product_code
GROUP BY P1.product_entr y, P1.product_code ) AS PE
GROUP BY PE.product_code , PE.next_product _entry) AS PR
INNER JOIN
ProductEntries AS P
ON P.product_code = PR.product_code AND
P.product_entry >= PR.start_produc t_entry AND
(PR.end_product _entry IS NULL OR
P.product_entry < PR.end_product_ entry)
GROUP BY PR.product_code , PR.start_produc t_entry
ORDER BY start_product_e ntry

product_code start_product_e ntry end_product_ent ry avg_product_qua lity
A 1 3 76.666666
B 4 5 75.000000
C 6 6 80.000000
D 7 7 80.000000
A 8 8 50.000000
C 9 9 70.000000

--
JAG

**David Rawheiser** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

Sure, that may work as well.

"John Gilson" <jag@acm.org> wrote in message
news:5zQtd.7206 0$Vk6.20781@twi ster.nyc.rr.com ...[color=blue]
> <cyrus.kapadia@ us.pm.com> wrote in message
> news:1102562637 .046747.292110@ c13g2000cwb.goo glegroups.com.. .[color=green]
>> Let's say I have the following table:
>>
>> entry product quality
>> 1 A 80
>> 2 A 70
>> 3 A 80
>> 4 B 60
>> 5 B 90
>> 6 C 80
>> 7 D 80
>> 8 A 50
>> 9 C 70
>>
>> I'm looking for a way to find the average "quality" value for a
>> SEQUENTIAL GROUPING of the same Product. For exmple, I need an
>> average of Entry 1+2+3 (because this is the first grouping of the same
>> product type), but NOT want that average to include row 8 (which is
>> also Product A, but in a different "group".)
>>
>> I'm sure it can be done (because I can describe it!), but I'll be a
>> monkey's uncle if I can figure out how. I would imagine it would
>> involve some sort of running tally that references the next record as
>> it goes... to see if the product type has changed. Perhaps use of a
>> temporary table?
>>
>> Muchas gracias!!
>> Cy.[/color]
>
> CREATE TABLE ProductEntries
> (
> product_entry INT NOT NULL PRIMARY KEY,
> product_code CHAR(1) NOT NULL,
> product_quality INT NOT NULL
> )
>
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (1, 'A', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (2, 'A', 70)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (3, 'A', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (4, 'B', 60)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (5, 'B', 90)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (6, 'C', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (7, 'D', 80)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (8, 'A', 50)
> INSERT INTO ProductEntries (product_entry, product_code, product_quality )
> VALUES (9, 'C', 70)
>
> SELECT PR.product_code AS product_code,
> PR.start_produc t_entry AS start_product_e ntry,
> MAX(P.product_e ntry) AS end_product_ent ry,
> AVG(CAST(P.prod uct_quality AS DECIMAL)) AS
> avg_product_qua lity
> FROM (SELECT MIN(PE.product_ entry) AS start_product_e ntry,
> PE.next_product _entry AS end_product_ent ry,
> PE.product_code
> FROM (SELECT P1.product_entr y, P1.product_code ,
> MIN(P2.product_ entry) AS
> next_product_en try
> FROM ProductEntries AS P1
> LEFT OUTER JOIN
> ProductEntries AS P2
> ON P2.product_entr y >
> P1.product_entr y AND
> P2.product_code <>
> P1.product_code
> GROUP BY P1.product_entr y, P1.product_code ) AS
> PE
> GROUP BY PE.product_code , PE.next_product _entry) AS PR
> INNER JOIN
> ProductEntries AS P
> ON P.product_code = PR.product_code AND
> P.product_entry >= PR.start_produc t_entry AND
> (PR.end_product _entry IS NULL OR
> P.product_entry < PR.end_product_ entry)
> GROUP BY PR.product_code , PR.start_produc t_entry
> ORDER BY start_product_e ntry
>
> product_code start_product_e ntry end_product_ent ry avg_product_qua lity
> A 1 3 76.666666
> B 4 5 75.000000
> C 6 6 80.000000
> D 7 7 80.000000
> A 8 8 50.000000
> C 9 9 70.000000
>
> --
> JAG
>
>[/color]

**--CELKO--** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

That is too much work! Let's move the average calculation into a
scalar subquery that will be done last, after all the clusters are
found. The little-used = ALL predicate can replace a lot of your
logic. And we pull up the usual Sequence auxiliary table.

SELECT prod_code, MIN(start) AS start, finish,
(SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
FROM ProductEntries AS P3
WHERE P3.prod_entry
BETWEEN MIN(start)
AND X.finish) AS avg_quality
FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
WHERE S1.seq <= S2.seq
AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
AND P1.prod_code
= ALL (SELECT P2.prod_code
FROM ProductEntries AS P2
WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
GROUP BY P1.prod_code, S1.seq)
AS X (prod_code, start, finish)
GROUP BY prod_code, finish;

Another version requires two sentinal values
--
INSERT INTO ProductEntries VALUES (0, '?', 0);
INSERT INTO ProductEntries VALUES (10, '?', 0);

SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
(SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
FROM ProductEntries AS P3
WHERE P3.prod_entry
BETWEEN S1.seq AND S2.seq) AS avg_quality
FROM ProductEntries AS P1,
(SELECT seq FROM Sequence
UNION ALL SELECT 0) AS S1, Sequence AS S2
WHERE S1.seq <= S2.seq
AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
AND P1.prod_code
<> (SELECT P3.prod_code
FROM ProductEntries AS P3
WHERE P3.prod_entry = S1.seq - 1)
AND P1.prod_code
<> (SELECT P4.prod_code
FROM ProductEntries AS P4
WHERE P4.prod_entry = S2.seq + 1)
AND P1.prod_code
= ALL (SELECT P2.prod_code
FROM ProductEntries AS P2
WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);

**John Gilson** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

"--CELKO--" <jcelko212@eart hlink.net> wrote in message
news:1102618417 .146169.127100@ c13g2000cwb.goo glegroups.com.. .[color=blue]
> That is too much work! Let's move the average calculation into a
> scalar subquery that will be done last, after all the clusters are
> found. The little-used = ALL predicate can replace a lot of your
> logic. And we pull up the usual Sequence auxiliary table.
>
> SELECT prod_code, MIN(start) AS start, finish,
> (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
> FROM ProductEntries AS P3
> WHERE P3.prod_entry
> BETWEEN MIN(start)
> AND X.finish) AS avg_quality
> FROM (SELECT P1.prod_code, S1.seq, MAX(S2.seq) AS finish
> FROM ProductEntries AS P1, Sequence AS S1, Sequence AS S2
> WHERE S1.seq <= S2.seq
> AND S2.seq <= (SELECT MAX(prod_entry) FROM ProductEntries)
> AND P1.prod_code
> = ALL (SELECT P2.prod_code
> FROM ProductEntries AS P2
> WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq)
> GROUP BY P1.prod_code, S1.seq)
> AS X (prod_code, start, finish)
> GROUP BY prod_code, finish;
>
> Another version requires two sentinal values
> --
> INSERT INTO ProductEntries VALUES (0, '?', 0);
> INSERT INTO ProductEntries VALUES (10, '?', 0);
>
> SELECT DISTINCT P1.prod_code, S1.seq AS start, S2.seq AS finish,
> (SELECT AVG(CAST(prod_q uality AS DECIMAL(8,4)))
> FROM ProductEntries AS P3
> WHERE P3.prod_entry
> BETWEEN S1.seq AND S2.seq) AS avg_quality
> FROM ProductEntries AS P1,
> (SELECT seq FROM Sequence
> UNION ALL SELECT 0) AS S1, Sequence AS S2
> WHERE S1.seq <= S2.seq
> AND S2.seq <= (SELECT MAX(prod_entry) + 1 FROM ProductEntries)
> AND P1.prod_code
> <> (SELECT P3.prod_code
> FROM ProductEntries AS P3
> WHERE P3.prod_entry = S1.seq - 1)
> AND P1.prod_code
> <> (SELECT P4.prod_code
> FROM ProductEntries AS P4
> WHERE P4.prod_entry = S2.seq + 1)
> AND P1.prod_code
> = ALL (SELECT P2.prod_code
> FROM ProductEntries AS P2
> WHERE P2.prod_entry BETWEEN S1.seq AND S2.seq);[/color]

Less work? Debatable. Also, this won't work if the product_entry values
aren't consecutive.

--
JAG

**--CELKO--** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

>> Less work? Debatable. <<

Fewer nesting levels should be a bit faster. But trying to find the
start and finish points is going to get really bad as the number of row
increases.
[color=blue][color=green]
>> Also, this won't work if the product_entry values[/color][/color]
aren't consecutive. <<

It depends on the sequence of tests having no gaps.

This is one that might be better done with a cursor and a WHILE loop
that accumulates a count and total of each quality test to a working
table.

--CELKO--
Please post DDL in a human-readable format and not a machne-generated
one. This way people do not have to guess what the keys, constraints,
Declarative Referential Integrity, datatypes, etc. in your schema are.
Sample data is also a good idea, along with clear specifications.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

**John Gilson** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

"--CELKO--" <remove.jcelko2 12@earthlink.ne t> wrote in message
news:1102630139 .1f7348d9b8d0e1 527f37d16587e3c ecc@teranews...[color=blue][color=green][color=darkred]
> >> Less work? Debatable. <<[/color][/color]
>
> Fewer nesting levels should be a bit faster. But trying to find the
> start and finish points is going to get really bad as the number of row
> increases.
>[color=green][color=darkred]
> >> Also, this won't work if the product_entry values[/color][/color]
> aren't consecutive. <<
>
> It depends on the sequence of tests having no gaps.
>
> This is one that might be better done with a cursor and a WHILE loop
> that accumulates a count and total of each quality test to a working
> table.[/color]

You could be right but bite your tongue!

--
JAG
[color=blue]
> --CELKO--
> Please post DDL in a human-readable format and not a machne-generated
> one. This way people do not have to guess what the keys, constraints,
> Declarative Referential Integrity, datatypes, etc. in your schema are.
> Sample data is also a good idea, along with clear specifications.
>
>
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it![/color]

**David Rawheiser** · Jul 23 '05, 07:13 AM

Re: Query to group sequential items

Actually if this was an ongoing query: performance-wise it would be best to
have a trigger or "phase shift" grouping id set as part of the insert
operation.

"John Gilson" <jag@acm.org> wrote in message
news:bo4ud.7447 7$Vk6.62153@twi ster.nyc.rr.com ...[color=blue]
> "--CELKO--" <remove.jcelko2 12@earthlink.ne t> wrote in message
> news:1102630139 .1f7348d9b8d0e1 527f37d16587e3c ecc@teranews...[color=green][color=darkred]
>> >> Less work? Debatable. <<[/color]
>>
>> Fewer nesting levels should be a bit faster. But trying to find the
>> start and finish points is going to get really bad as the number of row
>> increases.
>>[color=darkred]
>> >> Also, this won't work if the product_entry values[/color]
>> aren't consecutive. <<
>>
>> It depends on the sequence of tests having no gaps.
>>
>> This is one that might be better done with a cursor and a WHILE loop
>> that accumulates a count and total of each quality test to a working
>> table.[/color]
>
> You could be right but bite your tongue!
>
> --
> JAG
>[color=green]
>> --CELKO--
>> Please post DDL in a human-readable format and not a machne-generated
>> one. This way people do not have to guess what the keys, constraints,
>> Declarative Referential Integrity, datatypes, etc. in your schema are.
>> Sample data is also a good idea, along with clear specifications.
>>
>>
>> *** Sent via Developersdex http://www.developersdex.com ***
>> Don't just participate in USENET...get rewarded for it![/color]
>
>[/color]

**--CELKO--** · Jul 23 '05, 07:14 AM

Re: Query to group sequential items

>> Actually if this was an ongoing query: performance-wise it would be
best to have a trigger or "phase shift" grouping id set as part of the
insert operation. <<

My impulse is for a "cluster group number" column as each test is done.
Look to see if the current quality test is on the same product as the
most recent one, etc.

--CELKO--
Please post DDL in a human-readable format and not a machne-generated
one. This way people do not have to guess what the keys, constraints,
Declarative Referential Integrity, datatypes, etc. in your schema are.
Sample data is also a good idea, along with clear specifications.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Query to group sequential items

Query to group sequential items

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment