Particularly Challenging SQL Problem.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • LineVoltageHalogen

    Particularly Challenging SQL Problem.

    Table DDL:

    create table test
    (
    inId int primary key identity,
    inFK int not null,
    inSeq int not null,
    dtDate datetime
    )

    Data Insert:
    insert into test
    select 1,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 1,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 1,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 2,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 2,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 3,1, getdate()

    WAITFOR DELAY '000:00:01'

    insert into test
    select 4,1, getdate()


    If we select on this table:
    inId inFK inSeq dtDate

    ----------- ----------- -----------
    ------------------------------------------------------
    1 1 1 2005-02-01 12:54:40.967
    2 1 1 2005-02-01 12:54:41.967
    3 1 1 2005-02-01 12:54:42.967
    4 2 1 2005-02-01 12:54:43.967
    5 2 1 2005-02-01 12:54:44.967
    6 3 1 2005-02-01 12:54:45.983
    7 4 1 2005-02-01 12:54:47.077

    (7 row(s) affected)


    Problem:
    What I would like to do (using SQL and not a cursor) is to update the
    value of inSeq to its ordinal position, this will be based on the # of
    occurences of inFK. For Example, I would like to run a sql statement
    that would transform the data to:

    update test
    set inSEQ = (some sql)

    select * from test - This would then produce:

    inId inFK inSeq dtDate

    ----------- ----------- -----------
    ------------------------------------------------------
    1 1 1 2005-02-01 12:54:40.967
    2 1 2 2005-02-01 12:54:41.967
    3 1 3 2005-02-01 12:54:42.967
    4 2 1 2005-02-01 12:54:43.967
    5 2 2 2005-02-01 12:54:44.967
    6 3 1 2005-02-01 12:54:45.983
    7 4 1 2005-02-01 12:54:47.077
    (7 row(s) affected)


    Any help would be greatly appreciated.

    TFD

  • SSK

    #2
    Re: Particularly Challenging SQL Problem.

    Hi

    The following solution will work if the inserted records (with
    same inFK value ) are in sequential order only.

    update test set inSeq = (inid - (select min(inid) from test b where
    a.infk = b.infk ) + 1) from test a

    Thanks
    Ssk

    If the Records are
    LineVoltageHalo gen wrote:[color=blue]
    > Table DDL:
    >
    > create table test
    > (
    > inId int primary key identity,
    > inFK int not null,
    > inSeq int not null,
    > dtDate datetime
    > )
    >
    > Data Insert:
    > insert into test
    > select 1,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 1,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 1,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 2,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 2,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 3,1, getdate()
    >
    > WAITFOR DELAY '000:00:01'
    >
    > insert into test
    > select 4,1, getdate()
    >
    >
    > If we select on this table:
    > inId inFK inSeq dtDate
    >
    > ----------- ----------- -----------
    > ------------------------------------------------------
    > 1 1 1 2005-02-01 12:54:40.967
    > 2 1 1 2005-02-01 12:54:41.967
    > 3 1 1 2005-02-01 12:54:42.967
    > 4 2 1 2005-02-01 12:54:43.967
    > 5 2 1 2005-02-01 12:54:44.967
    > 6 3 1 2005-02-01 12:54:45.983
    > 7 4 1 2005-02-01 12:54:47.077
    >
    > (7 row(s) affected)
    >
    >
    > Problem:
    > What I would like to do (using SQL and not a cursor) is to update the
    > value of inSeq to its ordinal position, this will be based on the #[/color]
    of[color=blue]
    > occurences of inFK. For Example, I would like to run a sql[/color]
    statement[color=blue]
    > that would transform the data to:
    >
    > update test
    > set inSEQ = (some sql)
    >
    > select * from test - This would then produce:
    >
    > inId inFK inSeq dtDate
    >
    > ----------- ----------- -----------
    > ------------------------------------------------------
    > 1 1 1 2005-02-01 12:54:40.967
    > 2 1 2 2005-02-01 12:54:41.967
    > 3 1 3 2005-02-01 12:54:42.967
    > 4 2 1 2005-02-01 12:54:43.967
    > 5 2 2 2005-02-01 12:54:44.967
    > 6 3 1 2005-02-01 12:54:45.983
    > 7 4 1 2005-02-01 12:54:47.077
    > (7 row(s) affected)
    >
    >
    > Any help would be greatly appreciated.
    >
    > TFD[/color]

    Comment

    • LineVoltageHalogen

      #3
      Re: Particularly Challenging SQL Problem.

      I can probably control this by creating a clusterd primary key on inId,
      inFk, and sdDate, do you concur?

      TFD

      Comment

      • David Portas

        #4
        Re: Particularly Challenging SQL Problem.

        No. A clustered index is irrelevant. It has no impact on the standard,
        documented behaviour of an UPDATE statement. The results of certain unusual
        and ambiguous UPDATE statements which are undefined in the documentation may
        be affected by indexes but the behaviour of those remains the same in each
        case: the result is undefined and therefore unreliable.

        If inseq is based purely on the values of infk and dtdate then there is no
        obvious need to put the column in the table at all. Derive the sequence when
        you query the table or add the numbering at client-side:

        SELECT T1.inid, T1.infk,
        COUNT(*) AS inseq, T1.dtdate
        FROM Test AS T1
        JOIN Test AS T2
        ON T1.infk = T2.infk
        AND T1.dtdate >= T2.dtdate
        GROUP BY T1.inid, T1.infk, T1.dtdate

        --
        David Portas
        SQL Server MVP
        --


        Comment

        • Hugo Kornelis

          #5
          Re: Particularly Challenging SQL Problem.

          On 1 Feb 2005 09:58:01 -0800, LineVoltageHalo gen wrote:

          (snip)[color=blue]
          >Problem:
          >What I would like to do (using SQL and not a cursor) is to update the
          >value of inSeq to its ordinal position, this will be based on the # of
          >occurences of inFK. For Example, I would like to run a sql statement
          >that would transform the data to:[/color]
          (snip)

          Hi TFD,

          Thanks for providing the create table and insert statements and the
          expected output to clarify your need.

          The following update statement will set the inSeq values as requested:

          UPDATE test
          SET inSeq = (SELECT COUNT(*)
          FROM test AS t
          WHERE t.inFK = test.inFK
          AND t.dtDate <= test.dtDate)

          By the way - using datatype-prefixes to column names is really not a
          recommended practice. Would you really want to go over all your code if
          the datatype of one of your columns has to be changed?

          Best, Hugo
          --

          (Remove _NO_ and _SPAM_ to get my e-mail address)

          Comment

          • LineVoltageHalogen

            #6
            Re: Particularly Challenging SQL Problem.

            David, thank you for your response. Doing stuff client side is not an
            option for me, the example I put forth is just the gist of what I need
            to do. I am actually implementing an ETL for a BI tool and this
            example demonstrates the nature of the real problem I am solving. So,
            an update statement is required, I also don't have the luxury of
            knowing how many inFk,inSeq pairs will be coming in until after the
            load completes. Can your query be modified to accomodate an update?
            TFD

            Comment

            • LineVoltageHalogen

              #7
              Re: Particularly Challenging SQL Problem.

              Hugo, if I understand correctly this query does not depend on the
              physical order of the data on disk?

              Comment

              • David Portas

                #8
                Re: Particularly Challenging SQL Problem.

                SQL statements never depend on the physical order of the data on disk.
                That's a basic principle of an RDBMS.

                Why do you ask?

                --
                David Portas
                SQL Server MVP
                --


                Comment

                • LineVoltageHalogen

                  #9
                  Re: Particularly Challenging SQL Problem.

                  I just want to understand, that is all. I was thinking that if they
                  had been sorted when I kicked off the update that SQL would just start
                  with the first record and then rip through them all from begining to
                  end. That would seem like the most natural action on SQL's part, how
                  else would it decide on the order in which is was going to execute the
                  update? Let's assume we have a simple query such as:

                  update table
                  set row = 'new value'

                  Wouldn't SQL just go to the head of the set and begin updating until it
                  reaches the end?

                  L

                  Comment

                  • Erland Sommarskog

                    #10
                    Re: Particularly Challenging SQL Problem.

                    David Portas (REMOVE_BEFORE_ REPLYING_dporta s@acm.org) writes:[color=blue]
                    > If inseq is based purely on the values of infk and dtdate then there is
                    > no obvious need to put the column in the table at all. Derive the
                    > sequence when you query the table or add the numbering at client-side:
                    >
                    > SELECT T1.inid, T1.infk,
                    > COUNT(*) AS inseq, T1.dtdate
                    > FROM Test AS T1
                    > JOIN Test AS T2
                    > ON T1.infk = T2.infk
                    > AND T1.dtdate >= T2.dtdate
                    > GROUP BY T1.inid, T1.infk, T1.dtdate[/color]

                    The performance of this type of query is not always that fantastic, so
                    if you need the row numbering often, it may indeed be a good idea to
                    compute it once.


                    --
                    Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                    Books Online for SQL Server SP3 at
                    Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                    Comment

                    • David Portas

                      #11
                      Re: Particularly Challenging SQL Problem.

                      The Relational Model and set-based SQL keep the logical results of a any
                      data manipulation abstracted from the way the data is physically stored. A
                      table is logically unordered and cannot be "sorted" but irrespective of the
                      order in which data is physically stored the result of any operation on the
                      data is the same. This principle is called Physical Data Independence. You
                      should read a book on relational basics.

                      --
                      David Portas
                      SQL Server MVP
                      --


                      Comment

                      • LineVoltageHalogen

                        #12
                        Re: Particularly Challenging SQL Problem.

                        David, again thank you for taking the time to inform me. I am quite
                        aware of Physical Independence and I would bet that the question I am
                        trying to ask probably won't be answered in a book on the basics of
                        theory. I am just curious how SQL actually handles this internally,
                        how it writes to disk, how it chooses an access path, etc. I know it
                        is not relevant to my problem but sometimes I like to visualize what is
                        going on at the physical implementation level.

                        Regards, TFD.

                        Comment

                        • David Portas

                          #13
                          Re: Particularly Challenging SQL Problem.

                          SQL Server's Query Optimizer chooses the access path to the data based on
                          analysis of indexes and statistics for your table(s). The Optimizer may
                          choose a different plan on different occassions, even for the same query and
                          table structure.

                          If you want to read more about SQL Server internals I recommend "Inside SQL
                          Server" by Kalen Delaney.

                          --
                          David Portas
                          SQL Server MVP
                          --


                          Comment

                          • Erland Sommarskog

                            #14
                            Re: Particularly Challenging SQL Problem.

                            LineVoltageHalo gen (tropicalfruitd rops@yahoo.com) writes:[color=blue]
                            > David, again thank you for taking the time to inform me. I am quite
                            > aware of Physical Independence and I would bet that the question I am
                            > trying to ask probably won't be answered in a book on the basics of
                            > theory. I am just curious how SQL actually handles this internally,
                            > how it writes to disk, how it chooses an access path, etc. I know it
                            > is not relevant to my problem but sometimes I like to visualize what is
                            > going on at the physical implementation level.[/color]

                            The bottom line is that you cannot rely on physical ordering.

                            In 6.5 and previous version if you had a clustered index on a table
                            and said

                            SELECT * FROM tbl

                            you could be pretty sure that you would get the rows ordered according
                            to the clustered index. Same if you had an unindexed table, you would
                            get back the rows in the order they had been inserted.

                            But from SQL7 this is no longer true. This is because the SQL Server
                            engine has improved, and now can return data from multiple output
                            streams. If you have a somewhat large table with, say, 20000 rows, and
                            say SELECT * FRON tbl, you may contiguous blocks of ids, but there
                            will be some random variation of the blocks.


                            --
                            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

                            Books Online for SQL Server SP3 at
                            Get the flexibility you need to use integrated solutions, apps, and innovations in technology with your data, wherever it lives—in the cloud, on-premises, or at the edge.

                            Comment

                            • Hugo Kornelis

                              #15
                              Re: Particularly Challenging SQL Problem.

                              On 1 Feb 2005 14:22:13 -0800, LineVoltageHalo gen wrote:
                              [color=blue]
                              >I just want to understand, that is all. I was thinking that if they
                              >had been sorted when I kicked off the update that SQL would just start
                              >with the first record and then rip through them all from begining to
                              >end.[/color]

                              Hi L,

                              In theory, an UPDATE statement is carried out instantaneously . One moment,
                              all rows have the "old" value; the next moment, all rows are changed to
                              the "new" value.

                              In practice, nothing can be done instantaneously . The computer needs some
                              time to fetch data from disk or cache, calculate the new values, push the
                              new values back to the cache and write the log entries to disk. Depending
                              on table size, available indexes, workload and lots of other factors, your
                              UPDATE statement will take anything from microseconds to hours or even
                              more.

                              During the time the UPDATE statement is processing, the number of changed
                              rows will be gradually increasing. Each changed row is also locked - this
                              means that no other process will get access to the data. All these locks
                              are held until the entire transaction is finished.

                              What happens if you fire a long-running UPDATE, then try to read some rows
                              from another connection? Well, there are basically two possibilities:
                              either the row you read has not yet been affected by the UPDATE - in that
                              case, you'll get to see the "old" value. But if the row has already been
                              changed by the UPDATE, you'll see nothing - the execution of your query is
                              postponed until the lock on the required row is lifted. Once the UPDATE is
                              completely done, the required row can be read, and you'll get the new data
                              returned.

                              The net effect of this is, that until the UPDATE statement has finished,
                              other people can't ever see the new data - either they see the old data,
                              or they are "put on hold" (blocked). After the UPDATE statement has
                              finished, other people will only see the new data. So even though the
                              UPDATE can't really be carried out instanteneously , the combination of
                              locking and blocking ensures that the system behaves as if it is.
                              [color=blue]
                              > That would seem like the most natural action on SQL's part, how
                              >else would it decide on the order in which is was going to execute the
                              >update? Let's assume we have a simple query such as:
                              >
                              >update table
                              >set row = 'new value'
                              >
                              >Wouldn't SQL just go to the head of the set and begin updating until it
                              >reaches the end?[/color]

                              SQL Server is entirely free to choose it's own strategy for carrying out
                              the update. The chosen order might depend on many factors, like indexes
                              available, workload, table statistics or maybe even the weather - but
                              since the system behaves as if all rows were affected at the same time, it
                              really doesn't matter what order SQL Server picks.

                              Best, Hugo
                              --

                              (Remove _NO_ and _SPAM_ to get my e-mail address)

                              Comment

                              Working...