SQL Server for data processing

**Hugo Kornelis** · Jul 23 '05, 07:59 AM

Re: SQL Server for data processing

On Mon, 07 Feb 2005 04:54:43 GMT, William Cleveland wrote:

(snip)[color=blue]
>In several cases, I'm summarizing one table into another by several
>fields, and then updating the original table with an ID for the
>summary row each source row was summarized into (e.g., I summarize
>PlaceAndProduc tSummary into PlaceSummary, and then populate
>PlaceSummary ID in PlaceAndProduct Summary). The update of the
>source table is much faster if the summary table has a clustered
>index on the summarized fields, but all later access will be faster
>if the clustered index is on the identity column. I've been
>including an ORDER BY the summarized fields in the original insert,
>so the identity column is in the same order as the summarized fields,
>but I don't know of any way to take advantage of that in the
>indexing declarations.[/color]

Hi William,

First, you should know that including ORDER BY in an INSERT .. SELECT
statement in meaningless. There is no way to guarantee that the identity
values get handed out in the order you specify. If you really must have
them in a specific order, the only safe way is to use a cursor - but since
IDENTITY is meant to be a **meaningless** unique code, this is a quite
questionable approach. Especially since there are more caveats to using an
identity for this goal. If the generated value has to have some logical
relation to the data, you'll have to calculate it, either on insert or on
retrieval.

I don't really understand the rest of this paragraph, though. It appears
that you are merely looking for maximum speed. If that's the case, there
might be other options to consider. Maybe, the update of the source table
will be equally fast (or even faster) if you use a covering nonclustered
index?

If you could show some more specifics about what exactly you are trying to
do, I might be able to give more specific comments.

[color=blue]
>As another approach to the above situation, if I change the
>clustered index on a table, and the rows happen to be in the
>same order by both indexes, will the table still get rebuilt?[/color]

My guess is that it is - it should! First because SQL Server doesn't know
that the rows are in order, so it will have to be checked anyway. Second
because a clustered index is more than just the table - the table is in
the leaf pages of the index; the root and the intermediate pages have to
be built as well.

To verify, I ran this code with the Show Execution Plan option turned on
(Ctrl-K in Query Analyzer). Both with or without the added nonclustered
index, the rebuilding of the clustered index had exactly the same plan as
the building of the first clustered index.

create table testit (a int not null, b int not null)
go
create unique clustered index i1 on testit(a)
-- create unique nonclustered index i2 on testit(b)
go
insert testit values (1, 1)
insert testit values (2, 2)
insert testit values (3, 3)
go
create unique clustered index i1 on testit(b) with drop_existing
go
drop table testit
go

[color=blue]
>I will never do a roll-back in the process; if an action fails, I
>want to raise an error and halt (and I haven't lost any data).
>Is there any way to completely turn off logging?[/color]

No, there is no way to turn off logging. It's not only used for rollback,
it's also used for trigger execution, for recovery after a system crash
and for backup and restore - and then I'm probably forgetting some.

Besides, *you* might never do a rollback, but for some errors, SQL Server
will issue a rollback.

[color=blue]
>Will I gain anything by marking the database as single-user?[/color]

Yes: you'll save the overhead of taking and releasing locks. The price you
pay is that you can't run multiple connections at once. Without the
single-user setting, you could still inspect some data in table A while a
long running query is updating table B. This is impossible in single-user
mode (in fact, it should have been called single-connection mode).

[color=blue]
>Any indexes that I am not using while I populate the tables, I'm
>adding at the end with FillFactor 100, to keep any slack out.
>Is there a way to remove all the slack from everything else, at
>the end of the process? During a backup operation would be fine.[/color]

The only way I know is to rebuild all indexes with fillfactor 100.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

**Erland Sommarskog** · Jul 23 '05, 07:59 AM

Re: SQL Server for data processing

Hugo Kornelis (hugo@pe_NO_rFa ct.in_SPAM_fo) writes:[color=blue]
> First, you should know that including ORDER BY in an INSERT .. SELECT
> statement in meaningless. There is no way to guarantee that the identity
> values get handed out in the order you specify.[/color]

That is definitely not the same as saying that ORDER BY is meaningless.

If you say:

INSERT tbl (...)
SELECT ....
ORDER BY
OPTION (MAXDOP 1)

there is no guarantee that your IDENTITY values match - but your odds are
pretty good.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

SQL Server 2025 | Microsoft Evaluation Center

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Accelerate your AI application's time to market by harnessing the power of your own data and the built-in AI capabilities of SQL Server 2025, the enterprise database with best-in-class security, performance and availability.

SQL Server for data processing

SQL Server for data processing

Comment

Comment