Can an update statement be used for interpolating missing data?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Vorpal

    Can an update statement be used for interpolating missing data?

    Here is a small sample of data from a table of about 500 rows
    (Using MSSqlserver 2000)

    EntryTime Speed Gross Net
    ------------------ ----- -----
    21:09:13.310 0 0 0
    21:09:19.370 9000 NULL NULL
    21:09:21.310 NULL 95 NULL
    21:10:12.380 9000 NULL NULL
    21:10:24.310 NULL 253 NULL
    21:11:24.370 8000 NULL NULL
    21:11:27.310 NULL 410 NULL
    21:11:51.320 NULL 438 NULL
    21:11:51.490 NULL NULL 10

    After the first row, every row has only one value of the three.
    I would like to replace all the NULL values with calculated
    interpolations.

    I can do it w/ cursors or while loops.
    I could do it w/ VB (I think)

    Can this be done w/ an Update statement using self joins?
    What would be the best way?

    The value for speed can increase or decrease over time, but can never
    be < 0

    Net is always less than gross, and neither can go below 0.

    TIA for any helpful suggestions.

    Thanks,
    BM

  • Simon Hayes

    #2
    Re: Can an update statement be used for interpolating missing data?

    It's not really clear how you want to calculate the new values, but
    perhaps you can look at CASE and COALESCE in Books Online. If this
    doesn't help, then you should post some more information about how you
    want to calculate the new values.

    Simon

    Comment

    • David Portas

      #3
      Re: Can an update statement be used for interpolating missing data?

      I expect it will be possible with an UPDATE and a join/subquery.

      UPDATE YourTable
      SET speed =
      (SELECT ...
      FROM YourTable
      WHERE entrytime < YourTable.entry time ...)
      WHERE speed IS NULL

      If you need a complete solution then explain the calculation, show the
      result you want and post DDL for the table. Also, it's best to post sample
      data as INSERT statements so that others can more easily test out possible
      solutions. That way you'll get accurate and useful answers more quickly.
      See:


      --
      David Portas
      SQL Server MVP
      --


      Comment

      • Vorpal

        #4
        Re: Can an update statement be used for interpolating missing data?

        Thank you greatly for the FAQ link. I learned a lot just reading it.

        The table:
        if exists (select * from dbo.sysobjects where id =
        object_id(N'[tblProfileTemp]') and OBJECTPROPERTY( id, N'IsUserTable') =
        1)
        drop table [tblProfileTemp]
        GO

        if not exists (select * from dbo.sysobjects where id =
        object_id(N'[dbo].[tblProfileTemp]') and OBJECTPROPERTY( id,
        N'IsUserTable') = 1)
        BEGIN
        CREATE TABLE [tblProfileTempX] (
        --[Item] [int] IDENTITY (1, 1) NOT NULL ,
        [Item] [int] NOT NULL ,
        [EntryTime] [datetime] NULL ,
        [RunTime] [numeric](12, 3) NULL ,
        [Speed] [int] NULL ,
        [gross] [int] NULL ,
        [net] [int] NULL
        ) ON [PRIMARY]
        END

        The insert statements (the code to generate this was an education in
        itself).
        (10 rows should be enough) I commented out the identity contraint so
        inserting w/o the column list would be possible.
        INSERT INTO [tblProfileTempx] VALUES(1,'Jun 21 2005
        9:09:13:310PM', 0.000,0,0,0)
        INSERT INTO [tblProfileTempx] VALUES(2,'Jun 21 2005
        9:09:19:370PM', 6.060,9000,NULL ,NULL)
        INSERT INTO [tblProfileTempx] VALUES(3,'Jun 21 2005
        9:09:21:310PM', 8.000,NULL,95,N ULL)
        INSERT INTO [tblProfileTempx] VALUES(4,'Jun 21 2005
        9:10:12:380PM', 59.070,9000,NUL L,NULL)
        INSERT INTO [tblProfileTempx] VALUES(5,'Jun 21 2005
        9:10:24:310PM', 71.000,NULL,253 ,NULL)
        INSERT INTO [tblProfileTempx] VALUES(6,'Jun 21 2005
        9:11:24:370PM', 131.060,8000,NU LL,NULL)
        INSERT INTO [tblProfileTempx] VALUES(7,'Jun 21 2005
        9:11:27:310PM', 134.000,NULL,41 0,NULL)
        INSERT INTO [tblProfileTempx] VALUES(8,'Jun 21 2005
        9:11:51:320PM', 158.010,NULL,43 8,NULL)
        INSERT INTO [tblProfileTempx] VALUES(9,'Jun 21 2005
        9:11:51:490PM', 158.180,0,NULL, NULL)
        INSERT INTO [tblProfileTempx] VALUES(10,'Jun 21 2005
        9:13:51:310PM', 278.000,NULL,44 6,NULL)

        Explanation of data:
        The data represents the output of a running press. Each data element
        is recorded at EntryTime. RunTime represents the time elapsed since
        the start, and is expressed in seconds. Gross is number of copies
        printed. Net is number of copies not rejected automatically by various
        defect detectors.


        Desired Result:
        Example:
        Gross for item 1 is 0
        Gross for item 2 is null
        Gross for item 3 is 95

        I need to replace the null in item 2 with a value that represents the
        gross count for that time, assuming a constant press speed. It will
        not necesarily be constant, but the error will be slight.

        The formula for that value will be:

        Gross2 = Gross1 + ((Gross3 - Gross1) * ((RunTime2-RunTime1) / (RunTime3
        - RunTime1)))

        Similar interpolations will be calculated for Net and Speed.

        It gets harder where there are two or more nulls between known values.

        I was working along the lines of:

        Update t1
        Set t1.Gross = t0.Gross + ((t2.Gross = t0.Gross) *
        ((t1.runtime-t0.runtime)/(t2.runtime-t0.runtime)))

        from tblProfileTempX t1 inner join tblProfileTempX t0 on t0.item =
        t1.item
        inner join tblProfileTempX t2 on t2.item = t0.item

        where t1.gross is null
        and t0.EntryTime = (select Max(EntryTime) from tblProfileTempX
        where gross is not null and item < t1.item)
        and t2.EntryTime = (select Min(EntryTime) from tblProfileTempX
        where gross is not null and item > t1.item)

        I've reduce the errors to the following:
        Server: Msg 170, Level 15, State 1, Line 2
        Line 2: Incorrect syntax near '='.
        Server: Msg 156, Level 15, State 1, Line 9
        Incorrect syntax near the keyword 'and'.

        Thanks in advance for your time and effort, and apologies for the group
        etiquette breach,

        Regards,
        BM

        David Portas wrote:[color=blue]
        > I expect it will be possible with an UPDATE and a join/subquery.
        >
        > UPDATE YourTable
        > SET speed =
        > (SELECT ...
        > FROM YourTable
        > WHERE entrytime < YourTable.entry time ...)
        > WHERE speed IS NULL
        >
        > If you need a complete solution then explain the calculation, show the
        > result you want and post DDL for the table. Also, it's best to post sample
        > data as INSERT statements so that others can more easily test out possible
        > solutions. That way you'll get accurate and useful answers more quickly.
        > See:
        > http://www.aspfaq.com/etiquette.asp?id=5006
        >
        > --
        > David Portas
        > SQL Server MVP
        > --[/color]

        Comment

        • Vorpal

          #5
          Re: Can an update statement be used for interpolating missing data?

          Groan:
          Tired eyes mistook an = sign for a - sign in line two.
          With that fixed, it runs, but:

          0 rwos affected

          Comment

          • Erland Sommarskog

            #6
            Re: Can an update statement be used for interpolating missing data?

            [posted and mailed, please reply in news]

            Vorpal (brumac@gmail.c om) writes:[color=blue]
            > Here is a small sample of data from a table of about 500 rows
            > (Using MSSqlserver 2000)
            >
            > EntryTime Speed Gross Net
            > ------------------ ----- -----
            > 21:09:13.310 0 0 0
            > 21:09:19.370 9000 NULL NULL
            > 21:09:21.310 NULL 95 NULL
            > 21:10:12.380 9000 NULL NULL
            > 21:10:24.310 NULL 253 NULL
            > 21:11:24.370 8000 NULL NULL
            > 21:11:27.310 NULL 410 NULL
            > 21:11:51.320 NULL 438 NULL
            > 21:11:51.490 NULL NULL 10
            >
            > After the first row, every row has only one value of the three.
            > I would like to replace all the NULL values with calculated
            > interpolations.
            >
            > I can do it w/ cursors or while loops.
            > I could do it w/ VB (I think)
            >
            > Can this be done w/ an Update statement using self joins?[/color]

            Not "an", but a couple. In the below script I get the data into a temp
            table with an IDENTITY column, which has a consecutive number. I then
            find the next and previous row with a non-NULL value for speed, for those
            rows that have a NULL value. Once I have these pointers I can make the
            interpolation. There is no extrapolation for the NULL values at the end.

            The number of UPDATE statements could be reduced if you have three
            sets of pointer columns, but I'm not sure that is worth the pain.

            The script does not include handling of Net. That is left as an exercise
            to the reader. :-)

            CREATE TABLE tbl (entrytime datetime NOT NULL PRIMARY KEY,
            speed int NULL,
            gross int NULL,
            net int NULL)
            go
            INSERT tbl(entrytime, speed, gross, net)
            SELECT '21:09:13.310', 0, 0, 0 UNION
            SELECT '21:09:19.370', 9000, NULL, NULL UNION
            SELECT '21:09:21.310', NULL, 95, NULL UNION
            SELECT '21:10:12.380', 9000, NULL, NULL UNION
            SELECT '21:10:24.310', NULL, 253, NULL UNION
            SELECT '21:11:24.370', 8000, NULL, NULL UNION
            SELECT '21:11:27.310', NULL, 410, NULL UNION
            SELECT '21:11:51.320', NULL, 438, NULL UNION
            SELECT '21:11:51.490', NULL, NULL, 10
            go
            CREATE TABLE #temp (ident int IDENTITY UNIQUE,
            entrytime datetime NOT NULL PRIMARY KEY,
            speed int NULL,
            gross int NULL,
            net int NULL,
            prevval int NULL,
            nextval int NULL)

            INSERT #temp(entrytime , speed, gross, net)
            SELECT entrytime, speed, gross, net
            FROM tbl
            ORDER BY entrytime

            UPDATE #temp
            SET prevval = (SELECT MAX(t2.ident)
            FROM #temp t2
            WHERE t2.ident < t.ident
            AND t2.speed IS NOT NULL)
            FROM #temp t
            WHERE t.speed IS NULL

            UPDATE #temp
            SET nextval = (SELECT MIN(t2.ident)
            FROM #temp t2
            WHERE t2.ident > t.ident
            AND t2.speed IS NOT NULL)
            FROM #temp t
            WHERE t.speed IS NULL

            UPDATE t
            SET speed = p.speed +
            1E0 * (n.speed - p.speed) * (t.ident - t.prevval) /
            (t.nextval - t.prevval)
            FROM #temp t
            JOIN #temp p ON t.prevval = p.ident
            JOIN #temp n ON t.nextval = n.ident
            WHERE t.speed IS NULL

            UPDATE #temp
            SET prevval = NULL, nextval = NULL


            UPDATE #temp
            SET prevval = (SELECT MAX(t2.ident)
            FROM #temp t2
            WHERE t2.ident < t.ident
            AND t2.gross IS NOT NULL)
            FROM #temp t
            WHERE t.gross IS NULL

            UPDATE #temp
            SET nextval = (SELECT MIN(t2.ident)
            FROM #temp t2
            WHERE t2.ident > t.ident
            AND t2.gross IS NOT NULL)
            FROM #temp t
            WHERE t.gross IS NULL

            UPDATE t
            SET gross = p.gross +
            1E0 * (n.gross - p.gross) * (t.ident - t.prevval) /
            (t.nextval - t.prevval)
            FROM #temp t
            JOIN #temp p ON t.prevval = p.ident
            JOIN #temp n ON t.nextval = n.ident
            WHERE t.gross IS NULL

            UPDATE #temp
            SET prevval = NULL, nextval = NULL

            go
            UPDATE tbl
            SET speed = t.speed,
            gross = t.gross,
            net = t.net
            FROM tbl
            JOIN #temp t ON tbl.entrytime = t.entrytime
            go
            SELECT * FROM #temp
            SELECT * FROM tbl ORDER BY entrytime
            go
            DROP TABLE tbl
            DROP TABLE #temp




            --
            Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

            Books Online for SQL Server SP3 at
            SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

            Comment

            • Erland Sommarskog

              #7
              Re: Can an update statement be used for interpolating missing data?

              Erland Sommarskog (esquel@sommars kog.se) writes:[color=blue]
              > UPDATE t
              > SET speed = p.speed +
              > 1E0 * (n.speed - p.speed) * (t.ident - t.prevval) /
              > (t.nextval - t.prevval)
              > FROM #temp t
              > JOIN #temp p ON t.prevval = p.ident
              > JOIN #temp n ON t.nextval = n.ident
              > WHERE t.speed IS NULL[/color]

              So I did not consider time. This might be better:

              UPDATE t
              SET speed = p.speed +
              1E0 * (n.speed - p.speed) *
              datediff(ms, p.entrytime, t.entrytime) /
              datediff(ms, p.entrytime, n.entrytime)
              FROM #temp t
              JOIN #temp p ON t.prevval = p.ident
              JOIN #temp n ON t.nextval = n.ident
              WHERE t.speed IS NULL




              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server SP3 at
              SQL Server 2025 redefines what's possible for enterprise data. With developer-first features and integration with analytics and AI models, SQL Server 2025 accelerates AI innovation using the data you already have.

              Comment

              • Vorpal

                #8
                Re: Can an update statement be used for interpolating missing data?

                I have settled on the following for now:
                1. First, if the final value is a null, set it to the maximum value
                for that field.
                This is necessary so that the intervening values can be calculated.
                This update does that:
                update t1
                set Gross = t0.gross
                from tblProfiletemp t1, tblprofiletemp t0
                where t1.gross is null
                and t1.entrytime = (select max(entrytime) from tblprofiletemp)
                and t0.gross = (select max(gross) from tblprofiletemp)

                Once that is done, then the following update statement fills in all
                intervening values with the correct value:
                Update t1
                set Gross=t0.Gross + ((t2.Gross - t0.Gross) *
                ((t1.runtime-t0.runtime)/(t2.runtime-t0.runtime)))
                from tblProfiletemp t0 ,tblProfiletemp t1,tblProfilete mp t2
                where t1.gross is null
                and t0.entrytime = (select Max(EntryTime) from tblProfiletemp where
                item < t1.item and gross is not null)
                and t2.EntryTime = (select Min(EntryTime) from tblProfiletemp where
                item > t1.item and gross is not null)

                The reason I could not get the update statement to work before is that
                I was erroneously creating self joins.

                The above will work properly on the columns where the data always
                increases. It may need some modification for the speed columns.

                Thanks for all the input.
                BM

                Comment

                • David Portas

                  #9
                  Re: Can an update statement be used for interpolating missing data?

                  Don't rely on the IDENTITY column to drive the sequence. IDENTITY is
                  only supposed to be an arbitrary key. EntryTime should be a better way
                  to do it:

                  UPDATE tblProfileTemp
                  SET gross =
                  (SELECT T0.gross +
                  ((T2.gross - T0.gross) *
                  ((tblProfileTem p.runtime-T0.runtime)/(T2.runtime-T0.runtime)))
                  FROM tblProfileTemp AS T0,
                  tblProfileTemp AS T2
                  WHERE T0.entrytime =
                  (SELECT MAX(entrytime)
                  FROM tblProfileTemp AS T
                  WHERE entrytime < tblProfileTemp. entrytime
                  AND gross IS NOT NULL)
                  AND T2.entrytime =
                  (SELECT MIN(entrytime)
                  FROM tblProfileTemp AS T
                  WHERE entrytime > tblProfileTemp. entrytime
                  AND gross IS NOT NULL))
                  WHERE gross IS NULL

                  Thanks for posting the DDL and sample. It helped.

                  --
                  David Portas
                  SQL Server MVP
                  --

                  Comment

                  • Vorpal

                    #10
                    Re: Can an update statement be used for interpolating missing data?

                    For the speed table, I took a different approach.
                    Once the values for gross copy count have been inserted, then the speed
                    can be calculated backwards as
                    Speed = (Gross - Gross0)/(RunTime - RunTime0).

                    I changed the runtime so it is recorded in seconds, rather than
                    minutes, and round the speed value to the neares 100.
                    These two changes give a smoother graph.

                    Where a speed needs to be calculated from two gross values recorded
                    very short times apart, then the speed sometimes appears anomalously
                    high or low. (usually high).

                    The results of this are now in testing, and I'll see what feedback from
                    the users is before making further changes.

                    Thanks tremendously to all who helped.

                    Comment

                    Working...