Particularly Challenging SQL Problem.

**SSK** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

Hi

The following solution will work if the inserted records (with
same inFK value ) are in sequential order only.

update test set inSeq = (inid - (select min(inid) from test b where
a.infk = b.infk ) + 1) from test a

Thanks
Ssk

If the Records are
LineVoltageHalo gen wrote:[color=blue]
> Table DDL:
>
> create table test
> (
> inId int primary key identity,
> inFK int not null,
> inSeq int not null,
> dtDate datetime
> )
>
> Data Insert:
> insert into test
> select 1,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 1,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 1,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 2,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 2,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 3,1, getdate()
>
> WAITFOR DELAY '000:00:01'
>
> insert into test
> select 4,1, getdate()
>
>
> If we select on this table:
> inId inFK inSeq dtDate
>
> ----------- ----------- -----------
> ------------------------------------------------------
> 1 1 1 2005-02-01 12:54:40.967
> 2 1 1 2005-02-01 12:54:41.967
> 3 1 1 2005-02-01 12:54:42.967
> 4 2 1 2005-02-01 12:54:43.967
> 5 2 1 2005-02-01 12:54:44.967
> 6 3 1 2005-02-01 12:54:45.983
> 7 4 1 2005-02-01 12:54:47.077
>
> (7 row(s) affected)
>
>
> Problem:
> What I would like to do (using SQL and not a cursor) is to update the
> value of inSeq to its ordinal position, this will be based on the #[/color]
of[color=blue]
> occurences of inFK. For Example, I would like to run a sql[/color]
statement[color=blue]
> that would transform the data to:
>
> update test
> set inSEQ = (some sql)
>
> select * from test - This would then produce:
>
> inId inFK inSeq dtDate
>
> ----------- ----------- -----------
> ------------------------------------------------------
> 1 1 1 2005-02-01 12:54:40.967
> 2 1 2 2005-02-01 12:54:41.967
> 3 1 3 2005-02-01 12:54:42.967
> 4 2 1 2005-02-01 12:54:43.967
> 5 2 2 2005-02-01 12:54:44.967
> 6 3 1 2005-02-01 12:54:45.983
> 7 4 1 2005-02-01 12:54:47.077
> (7 row(s) affected)
>
>
> Any help would be greatly appreciated.
>
> TFD[/color]

**LineVoltageHalogen** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

I can probably control this by creating a clusterd primary key on inId,
inFk, and sdDate, do you concur?

TFD

**David Portas** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

No. A clustered index is irrelevant. It has no impact on the standard,
documented behaviour of an UPDATE statement. The results of certain unusual
and ambiguous UPDATE statements which are undefined in the documentation may
be affected by indexes but the behaviour of those remains the same in each
case: the result is undefined and therefore unreliable.

If inseq is based purely on the values of infk and dtdate then there is no
obvious need to put the column in the table at all. Derive the sequence when
you query the table or add the numbering at client-side:

SELECT T1.inid, T1.infk,
COUNT(*) AS inseq, T1.dtdate
FROM Test AS T1
JOIN Test AS T2
ON T1.infk = T2.infk
AND T1.dtdate >= T2.dtdate
GROUP BY T1.inid, T1.infk, T1.dtdate

--
David Portas
SQL Server MVP
--

**Hugo Kornelis** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

On 1 Feb 2005 09:58:01 -0800, LineVoltageHalo gen wrote:

(snip)[color=blue]
>Problem:
>What I would like to do (using SQL and not a cursor) is to update the
>value of inSeq to its ordinal position, this will be based on the # of
>occurences of inFK. For Example, I would like to run a sql statement
>that would transform the data to:[/color]
(snip)

Hi TFD,

Thanks for providing the create table and insert statements and the
expected output to clarify your need.

The following update statement will set the inSeq values as requested:

UPDATE test
SET inSeq = (SELECT COUNT(*)
FROM test AS t
WHERE t.inFK = test.inFK
AND t.dtDate <= test.dtDate)

By the way - using datatype-prefixes to column names is really not a
recommended practice. Would you really want to go over all your code if
the datatype of one of your columns has to be changed?

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

**LineVoltageHalogen** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

David, thank you for your response. Doing stuff client side is not an
option for me, the example I put forth is just the gist of what I need
to do. I am actually implementing an ETL for a BI tool and this
example demonstrates the nature of the real problem I am solving. So,
an update statement is required, I also don't have the luxury of
knowing how many inFk,inSeq pairs will be coming in until after the
load completes. Can your query be modified to accomodate an update?
TFD

**LineVoltageHalogen** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

Hugo, if I understand correctly this query does not depend on the
physical order of the data on disk?

**David Portas** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

SQL statements never depend on the physical order of the data on disk.
That's a basic principle of an RDBMS.

Why do you ask?

--
David Portas
SQL Server MVP
--

**LineVoltageHalogen** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

I just want to understand, that is all. I was thinking that if they
had been sorted when I kicked off the update that SQL would just start
with the first record and then rip through them all from begining to
end. That would seem like the most natural action on SQL's part, how
else would it decide on the order in which is was going to execute the
update? Let's assume we have a simple query such as:

update table
set row = 'new value'

Wouldn't SQL just go to the head of the set and begin updating until it
reaches the end?

L

**Erland Sommarskog** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
> If inseq is based purely on the values of infk and dtdate then there is
> no obvious need to put the column in the table at all. Derive the
> sequence when you query the table or add the numbering at client-side:
>
> SELECT T1.inid, T1.infk,
> COUNT(*) AS inseq, T1.dtdate
> FROM Test AS T1
> JOIN Test AS T2
> ON T1.infk = T2.infk
> AND T1.dtdate >= T2.dtdate
> GROUP BY T1.inid, T1.infk, T1.dtdate[/color]

The performance of this type of query is not always that fantastic, so
if you need the row numbering often, it may indeed be a good idea to
compute it once.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**David Portas** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

The Relational Model and set-based SQL keep the logical results of a any
data manipulation abstracted from the way the data is physically stored. A
table is logically unordered and cannot be "sorted" but irrespective of the
order in which data is physically stored the result of any operation on the
data is the same. This principle is called Physical Data Independence. You
should read a book on relational basics.

--
David Portas
SQL Server MVP
--

**LineVoltageHalogen** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

David, again thank you for taking the time to inform me. I am quite
aware of Physical Independence and I would bet that the question I am
trying to ask probably won't be answered in a book on the basics of
theory. I am just curious how SQL actually handles this internally,
how it writes to disk, how it chooses an access path, etc. I know it
is not relevant to my problem but sometimes I like to visualize what is
going on at the physical implementation level.

Regards, TFD.

**David Portas** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

SQL Server's Query Optimizer chooses the access path to the data based on
analysis of indexes and statistics for your table(s). The Optimizer may
choose a different plan on different occassions, even for the same query and
table structure.

If you want to read more about SQL Server internals I recommend "Inside SQL
Server" by Kalen Delaney.

--
David Portas
SQL Server MVP
--

**Erland Sommarskog** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

LineVoltageHalo gen (tropicalfruitd rops@yahoo.com) writes:[color=blue]
> David, again thank you for taking the time to inform me. I am quite
> aware of Physical Independence and I would bet that the question I am
> trying to ask probably won't be answered in a book on the basics of
> theory. I am just curious how SQL actually handles this internally,
> how it writes to disk, how it chooses an access path, etc. I know it
> is not relevant to my problem but sometimes I like to visualize what is
> going on at the physical implementation level.[/color]

The bottom line is that you cannot rely on physical ordering.

In 6.5 and previous version if you had a clustered index on a table
and said

SELECT * FROM tbl

you could be pretty sure that you would get the rows ordered according
to the clustered index. Same if you had an unindexed table, you would
get back the rows in the order they had been inserted.

But from SQL7 this is no longer true. This is because the SQL Server
engine has improved, and now can return data from multiple output
streams. If you have a somewhat large table with, say, 20000 rows, and
say SELECT * FRON tbl, you may contiguous blocks of ids, but there
will be some random variation of the blocks.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

Books Online for SQL Server SP3 at

Microsoft SQL Server

http://www.microsoft.com/sql/techinfo/productdoc/2000/books.asp

Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

**Hugo Kornelis** · Jul 23 '05, 07:55 AM

Re: Particularly Challenging SQL Problem.

On 1 Feb 2005 14:22:13 -0800, LineVoltageHalo gen wrote:
[color=blue]
>I just want to understand, that is all. I was thinking that if they
>had been sorted when I kicked off the update that SQL would just start
>with the first record and then rip through them all from begining to
>end.[/color]

Hi L,

In theory, an UPDATE statement is carried out instantaneously . One moment,
all rows have the "old" value; the next moment, all rows are changed to
the "new" value.

In practice, nothing can be done instantaneously . The computer needs some
time to fetch data from disk or cache, calculate the new values, push the
new values back to the cache and write the log entries to disk. Depending
on table size, available indexes, workload and lots of other factors, your
UPDATE statement will take anything from microseconds to hours or even
more.

During the time the UPDATE statement is processing, the number of changed
rows will be gradually increasing. Each changed row is also locked - this
means that no other process will get access to the data. All these locks
are held until the entire transaction is finished.

What happens if you fire a long-running UPDATE, then try to read some rows
from another connection? Well, there are basically two possibilities:
either the row you read has not yet been affected by the UPDATE - in that
case, you'll get to see the "old" value. But if the row has already been
changed by the UPDATE, you'll see nothing - the execution of your query is
postponed until the lock on the required row is lifted. Once the UPDATE is
completely done, the required row can be read, and you'll get the new data
returned.

The net effect of this is, that until the UPDATE statement has finished,
other people can't ever see the new data - either they see the old data,
or they are "put on hold" (blocked). After the UPDATE statement has
finished, other people will only see the new data. So even though the
UPDATE can't really be carried out instanteneously , the combination of
locking and blocking ensures that the system behaves as if it is.
[color=blue]
> That would seem like the most natural action on SQL's part, how
>else would it decide on the order in which is was going to execute the
>update? Let's assume we have a simple query such as:
>
>update table
>set row = 'new value'
>
>Wouldn't SQL just go to the head of the set and begin updating until it
>reaches the end?[/color]

SQL Server is entirely free to choose it's own strategy for carrying out
the update. The chosen order might depend on many factors, like indexes
available, workload, table statistics or maybe even the weather - but
since the system behaves as if all rows were affected at the same time, it
really doesn't matter what order SQL Server picks.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

Particularly Challenging SQL Problem.

Particularly Challenging SQL Problem.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment