Update - can I do better statement?

**--CELKO--** · Dec 1 '06, 06:45 PM

Re: Update - can I do better statement?

This is a mess. I woudl start over and get it right.

1) Why do you use all uppercase letters? It is a screaming bitch to
read and we did this because we had no choice in the 1950's with
punch cards.
2) Why do you have no keys in any table? Most columns are NULL-able, a
sure sign that you have a rotten data model (if you have one at all).
.. In fact you have more NULLs than the entire database for a major
automobile company.
3) Why are you storing computable totals in a table? That is how we
did it with file systems; in SQL, we use a VIEW. Or we use a front-end
report application.
4) Why did you pick the unreadable alphabetic aliases? That is, why
does CC remind the reader that the base table is "D_Bi_Stude nt"?
5) Why did you use a "_key" suffix? That tells us HOW an attribute
is use, not what the data element IS. Also, think about how stupid
"unique_key " is - are there any non-unique keys???
6) Is "curr_ind" actually a flag? We do not write with assembly
level bit flags in SQL. It is redundant in a properly designed schema.
The status is computed from the current state of the basic data.
7) Why did you use so many proprietary data types?
8) Why did you mix summary data and base data in the same table? Do
you know what 1NF and the definition of a table are?
9) Having both "session_dt " and "session_da y" is a bit
redundant.
10) Why are you keeping captions in a table with the data? A caption
is part of a display and display is done in the front end, never in the
database.
11) Why are so many of your codes VARCHAR(n)? Most codes are fixed
length.
12) It looks like you need to split up your tables and normalize the
schema. Think about the basic entities with which you are working -
students, candidates, admissions personnel, etc. and then once those
are in tables, think about their relationships.

**rcamarda** · Dec 2 '06, 01:55 AM

Re: Update - can I do better statement?

Celko, sometimes I cring when I see your replies.
1. I use an ETL tool, it creates the DDL, I just copied and pasted. My
extreme apologies.
2. I use an ETL tool, it handles the relationships and data validation.
I haven't put those in the tables and have relied on validation within
the tool.
3. This is a data warehouse, not an operational system. I build the
data for speedy reporting, not data entry.
4. Valid point, I need better hygen with alisases.
5. this might be weakness in my SQL skills. Example. the business keys
for the table im trying to update is ADMREP_ID, SESSION_ID and
SESSION_DAY. I concatinate them into a single field so I have a unique
key. I guess this would be my primary key.
6. Yes, it is a Y/N field. Tables that are slowly changing dimensions
have start_date, end_date, update_date and curr_ind. Y is the current
record as it is in the operational system.
7. This is an area that I would like to have more control within the
ETL tool. it looks at the data and determines what type to create. Of
course I could create the tables ahead of time, and not let the tool
create them for me.
8. the table I am trying to update is the summary table. Not sure what
you are meaning.
9. yes, if this were an operational database. This is a data warehouse
for reporting. I want to keep the on the fly computations to a minimum.
Also, the session_day is computed by taking the current date minus the
last day to add drop. This allows us to compare a given day session by
session. "How does day 0 for session 200703 compare to day 0 for
session 200603". Day 0 (last day to add or drop classes) varies year to
year and session to session.
10. This data is for a data warehouse. I have 3 databases, SOURCE,
STAGING and TARGET. Stating is as you describe, My admissions officer
dimension has the keys and captions, where the student table has the
admissions officer key, but not any information about the admissions
officer. When I build the final data in Target, I have a lot of
redundancy and pull in captions into the final tables.
11. Good point, i've ment to convert those to char.
12. Data warehouse, final data is de-normalized. (My vendor of our lead
managment system says the same thing, which I remind them that I am not
building data for data entry, but for reporting. This is also why I can
smoke their home grown reporting with my data and Cognos tools).

Thanks for the constructive suggestions Celko. I've learned a lot on
how I might change the UPdate to run faster...no wait a minute, you
just finished berating me and came just sort of calling me stupid...
Never mind.

--CELKO-- wrote:

This is a mess. I would start over and get it right.
>
1) Why do you use all uppercase letters? It is a screaming bitch to
read and we did this because we had no choice in the 1950's with
punch cards.
2) Why do you have no keys in any table? Most columns are NULL-able, a
sure sign that you have a rotten data model (if you have one at all).
. In fact you have more NULLs than the entire database for a major
automobile company.
3) Why are you storing computable totals in a table? That is how we
did it with file systems; in SQL, we use a VIEW. Or we use a front-end
report application.
4) Why did you pick the unreadable alphabetic aliases? That is, why
does CC remind the reader that the base table is "D_Bi_Stude nt"?
5) Why did you use a "_key" suffix? That tells us HOW an attribute
is use, not what the data element IS. Also, think about how stupid
"unique_key " is - are there any non-unique keys???
6) Is "curr_ind" actually a flag? We do not write with assembly
level bit flags in SQL. It is redundant in a properly designed schema.
The status is computed from the current state of the basic data.
7) Why did you use so many proprietary data types?
8) Why did you mix summary data and base data in the same table? Do
you know what 1NF and the definition of a table are?
9) Having both "session_dt " and "session_da y" is a bit
redundant.
10) Why are you keeping captions in a table with the data? A caption
is part of a display and display is done in the front end, never in the
database.
11) Why are so many of your codes VARCHAR(n)? Most codes are fixed
length.
12) It looks like you need to split up your tables and normalize the
schema. Think about the basic entities with which you are working -
students, candidates, admissions personnel, etc. and then once those
are in tables, think about their relationships.

**Ed Murphy** · Dec 2 '06, 05:05 AM

Re: Update - can I do better statement?

rcamarda wrote:

UPDATE F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_RT _CURRENT -- (RT =
rollingtotal)
SET NEW_REGISTRATIO N_COUNT =
(SELECT COUNT(DISTINCT STUDENT_ID)
FROM
(SELECT aa.session_id, aa.student_id, student_status
FROM F_BI_REGISTRATI ON_TRACKING AA
LEFT OUTER JOIN F_BI_STUDENT_SU MMARY_SESSION BB ON
AA.STUDENT_ID=B B.STUDENT_ID AND AA.SESSION_ID=B B.SESSION_ID
LEFT OUTER JOIN D_BI_STUDENT CC
ON AA.STUDENT_ID = CC.STUDENT_ID AND CC.CURR_IND
= 'Y'
WHERE AA.SESSION_ID = OA.SESSION_ID
AND ACTIVITY_DT <= OA.SESSION_DT
AND CC.TALISMA_AO_I D = OA.ADMREP_ID
AND STUDENT_STATUS = 'NEW'
GROUP BY AA.SESSION_ID, aa.STUDENT_ID, STUDENT_STATUS
HAVING SUM(ACTIVITY_CO UNT) 0
) as ppp
)
FROM F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_rt _current OA

Do you have indexes on all appropriate columns?

Which table contains STUDENT_STATUS?

Does the join to D_BI_STUDENT need to be left outer? What about the
join to F_BI_STUDENT_SU MMARY_SESSION? If so, can you eliminate the
need by adding dummy rows (with some sort of this-is-a-dummy-row
indicator) to those tables?

Can you replace this form (boiled down from the original):

SELECT COUNT(DISTINCT STUDENT_ID) FROM (
SELECT SESSION_ID, STUDENT_ID, STUDENT_STATUS
FROM ...
GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
HAVING SUM(ACTIVITY_CO UNT) 0
)

with this?

SELECT COUNT(*) FROM (
SELECT STUDENT_ID
FROM ...
GROUP BY STUDENT_ID
HAVING SUM(ACTIVITY_CO UNT) 0
)

**rcamarda** · Dec 2 '06, 06:15 PM

Re: Update - can I do better statement?

I have indexes build on all fields I use for where or joins. Engine
tuning could not recommend improvments.
Student_Status lives in the f_bi_student_su mmary_session table.
admrep_id lives in d_bi_student.
Your right about the left outer joins; inner joins would work. I should
only have registrations for students that exist in the student
dimension.
Ill do some testing using the inner join instead.

Your suggestion for the new SQL: Is your point that since I am using a
where clause, it is redundant to have the additional group by's?
Thanks for you input
Rob

Ed Murphy wrote:

rcamarda wrote:
>

UPDATE F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_RT _CURRENT -- (RT =
rollingtotal)
SET NEW_REGISTRATIO N_COUNT =
(SELECT COUNT(DISTINCT STUDENT_ID)
FROM
(SELECT aa.session_id, aa.student_id, student_status
FROM F_BI_REGISTRATI ON_TRACKING AA
LEFT OUTER JOIN F_BI_STUDENT_SU MMARY_SESSION BB ON
AA.STUDENT_ID=B B.STUDENT_ID AND AA.SESSION_ID=B B.SESSION_ID
LEFT OUTER JOIN D_BI_STUDENT CC
ON AA.STUDENT_ID = CC.STUDENT_ID AND CC.CURR_IND
= 'Y'
WHERE AA.SESSION_ID = OA.SESSION_ID
AND ACTIVITY_DT <= OA.SESSION_DT
AND CC.TALISMA_AO_I D = OA.ADMREP_ID
AND STUDENT_STATUS = 'NEW'
GROUP BY AA.SESSION_ID, aa.STUDENT_ID, STUDENT_STATUS
HAVING SUM(ACTIVITY_CO UNT) 0
) as ppp
)
FROM F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_rt _current OA

>
Do you have indexes on all appropriate columns?
>
Which table contains STUDENT_STATUS?
>
Does the join to D_BI_STUDENT need to be left outer? What about the
join to F_BI_STUDENT_SU MMARY_SESSION? If so, can you eliminate the
need by adding dummy rows (with some sort of this-is-a-dummy-row
indicator) to those tables?
>
Can you replace this form (boiled down from the original):
>
SELECT COUNT(DISTINCT STUDENT_ID) FROM (
SELECT SESSION_ID, STUDENT_ID, STUDENT_STATUS
FROM ...
GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
HAVING SUM(ACTIVITY_CO UNT) 0
)
>
with this?
>
SELECT COUNT(*) FROM (
SELECT STUDENT_ID
FROM ...
GROUP BY STUDENT_ID
HAVING SUM(ACTIVITY_CO UNT) 0
)

**Ed Murphy** · Dec 2 '06, 11:05 PM

Re: Update - can I do better statement?

rcamarda wrote:

Your suggestion for the new SQL: Is your point that since I am using a
where clause, it is redundant to have the additional group by's?

That, and also
GROUP BY STUDENT_ID
HAVING SUM(ACTIVITY_CO UNT) 0
will pick up the same students as
GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
HAVING SUM(ACTIVITY_CO UNT) 0
(assuming that ACTIVITY_COUNT is never negative).

**Erland Sommarskog** · Dec 3 '06, 03:05 PM

Re: Update - can I do better statement?

rcamarda (robert.a.camar da@gmail.com) writes:

I have created a table that contains buckets to hold activitives of
enrollment for each of our admissions officer for each day of an
enrollment session. I have an UPDATE that builds rolling totals and
updates said table. I count unique students from a table that contains
registration information.
Everything works, however it runs pretty slowly and I am wondering if
there is a better approach to writing the SQL. SQL's Engine Tuning
advisor has 0% improvment.
I'd appreciate any suggestions as to better ways to accomplish my goal.
Or, compliments on my genius if there isnt a better way! *grin*

Query tuning is rarely a trivial game, and when you don't have
the completely picture, it is even more difficult. You posted two
of three tables, but you did not include any indexes.

And it would certainly have helped if you had posted a working query.
When I add:

CREATE TABLE D_BI_STUDENT(
STUDENT_ID char(20) NULL,
CURR_IND char(1) NULL,
TALISMA_AO_ID varchar(15) NULL,
STUDENT_STATUS varchar(19) NULL,
)
go

and fix your inconsistent usage of upper- and lowercase, the query dies
with:

Msg 209, Level 16, State 1, Line 12
Ambiguous column name 'ACTIVITY_DT'.
Msg 209, Level 16, State 1, Line 16
Ambiguous column name 'ACTIVITY_COUNT '.

So all I know is that you have a query that is slow, and two tables
without their indexes, but I don't even know the query.

I can make two reflections from the query as posted:

1) The self-join to F_BI_Registrati on_Tracking with alias BB serves no
purpose at all. If it appears in the real query, there may be some
performance to win by removing it.

2) The outer join to D_BI_STUDENT is in practice an inner join, since you
have the condition STUDENT_STATUS = 'NEW', which appears to relate to
this table. I don't know what the intention is, but you should probably
either move this condition to the ON clause, or you should make the
outer join an inner join. Not that I think it will boost performance
much, but it makes the query clearer.

As for performance, roiling totals is a difficult thing, since SQL Server
has no query construct to support this well.

For further assistance, I suggest that you post the actualy query you have
now, and that in you include the schema for all tables, as well as all
indexes you have now. It also helps if you use aliases for all columns,
and if you apply upper/lowercase consistently. (I run with a case-sensitive
collation, and in general I recommend this for development.)

And, of course, please state which version of SQL Server you are using.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server 2005 at

http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Books Online for SQL Server 2000 at

http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

**--CELKO--** · Dec 3 '06, 05:45 PM

Re: Update - can I do better statement?

>1. I use an ETL tool, it creates the DDL, I just copied and pasted. <<

There are some open source "pretty printers" for SQL, which would be
good to have around the shop. So many of the packages generate code
for a compiler to use and not for a human being to read.

>3. This is a data warehouse, not an operational system. <<

Can you get a DW package, which would handle this stuff much better
than an SQL aimed at OLTP? Or will the boss scream about the budget?
DW is a really diffrent game and that explains some of your code
decisions.

>>. the business keys for the table im trying to update is ADMREP_ID, SESSION_ID and

SESSION_DAY. I concatinate them into a single field [sic] so I have a
unique key. <<

Here we are back to the conceptual problems. Columns are not fields,
nothing like them at all. The concatenation is a bad idea and you
should use a three-column key instead.

>6. Yes, it is a Y/N field. Tables that are slowly changing dimensions have start_date, end_date, update_date and curr_ind. Y is the current record [sic] as it is in the operational system. <<

You can get that fact from the dates, can't you? There should be no
need for flags in a DW. Also, that implies that the DW is getting live
feeds from operations. Is that true?

>7. This is an area that I would like to have more control within the ETL tool. it looks at the data and determines what type to create. Of course I could create the tables ahead of time, and not let the tool create them for me. <<

That sounds like the answer. The cost of casting, the chance for
errors, etc. is just too much. Can the ETL tool load the tables
without having to immediately re-index them as it goes?

>Thanks for the constructive suggestions Celko. I've learned a lot on how I might change the UPdate to run faster...no wait a minute, you just finished berating me and came just sort of calling me stupid... Never mind. <<

LOL!!

**othellomy@yahoo.com** · Dec 4 '06, 03:35 AM

Re: Update - can I do better statement?

with 50K rows, this takes about 5 minutes. However I have almost

9,000,000 rows of history. (15 years of 4 quarter sessions). If this
build is linear, it would take 1.25 hours.

Your update clause is way to complex and I think you should be very
lucky if you can get an answer in 1.5 hours for 9 million records (you
can use all the indexes in the world). My best estimation is it will
take more than 24 hours which is unrealistic.

**othellomy@yahoo.com** · Dec 4 '06, 04:55 AM

Re: Update - can I do better statement?

with 50K rows, this takes about 5 minutes. However I have almost

9,000,000 rows of history. (15 years of 4 quarter sessions). If this
build is linear, it would take 1.25 hours.

Maybe you should do a linear build if you can. Then you can add up the
counts into a variable. However, it's not a nice solution but it just
might work.

**Tony Rogerson** · Dec 4 '06, 07:55 AM

Re: Update - can I do better statement?

Can you get a DW package, which would handle this stuff much better

than an SQL aimed at OLTP? Or will the boss scream about the budget?
DW is a really diffrent game and that explains some of your code
decisions.

As usual you have not done your research on this, simply installing
Microsoft SQL Server and you will quickly realise (part of the set up) that
the product SQL Server is actually a name given to group the data engine
offering from Microsoft - relational engine, dimensioanl/aggregation/data
mining engine (analysis services), etl (integration services) messaging
(service broker).

It is a fundemental principal of a good consultant to do their research, for
most of us it comes naturally for you it obviously doesn't.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials

"--CELKO--" <jcelko212@eart hlink.netwrote in message
news:1165169193 .259749.173140@ 80g2000cwy.goog legroups.com...

>>1. I use an ETL tool, it creates the DDL, I just copied and pasted. <<

>
There are some open source "pretty printers" for SQL, which would be
good to have around the shop. So many of the packages generate code
for a compiler to use and not for a human being to read.
>

>>3. This is a data warehouse, not an operational system. <<

>
Can you get a DW package, which would handle this stuff much better
than an SQL aimed at OLTP? Or will the boss scream about the budget?
DW is a really diffrent game and that explains some of your code
decisions.
>

>>>. the business keys for the table im trying to update is ADMREP_ID,
>>>SESSION_ID and

SESSION_DAY. I concatinate them into a single field [sic] so I have a
unique key. <<
>
Here we are back to the conceptual problems. Columns are not fields,
nothing like them at all. The concatenation is a bad idea and you
should use a three-column key instead.
>

>>6. Yes, it is a Y/N field. Tables that are slowly changing dimensions
>>have start_date, end_date, update_date and curr_ind. Y is the current
>>record [sic] as it is in the operational system. <<

>
You can get that fact from the dates, can't you? There should be no
need for flags in a DW. Also, that implies that the DW is getting live
feeds from operations. Is that true?
>

>>7. This is an area that I would like to have more control within the ETL
>>tool. it looks at the data and determines what type to create. Of course
>>I could create the tables ahead of time, and not let the tool create
>>them for me. <<

>
That sounds like the answer. The cost of casting, the chance for
errors, etc. is just too much. Can the ETL tool load the tables
without having to immediately re-index them as it goes?
>

>>Thanks for the constructive suggestions Celko. I've learned a lot on how
>>I might change the UPdate to run faster...no wait a minute, you just
>>finished berating me and came just sort of calling me stupid... Never
>>mind. <<

>
LOL!!
>

Update - can I do better statement?

Update - can I do better statement?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment