Update - can I do better statement?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rcamarda

    Update - can I do better statement?

    I have created a table that contains buckets to hold activitives of
    enrollment for each of our admissions officer for each day of an
    enrollment session. I have an UPDATE that builds rolling totals and
    updates said table. I count unique students from a table that contains
    registration information.
    Everything works, however it runs pretty slowly and I am wondering if
    there is a better approach to writing the SQL. SQL's Engine Tuning
    advisor has 0% improvment.
    I'd appreciate any suggestions as to better ways to accomplish my goal.
    Or, compliments on my genius if there isnt a better way! *grin*

    UPDATE F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_RT _CURRENT -- (RT =
    rollingtotal)
    SET NEW_REGISTRATIO N_COUNT =
    (SELECT COUNT(DISTINCT STUDENT_ID)
    FROM
    (SELECT aa.session_id, aa.student_id, student_status
    FROM F_BI_REGISTRATI ON_TRACKING AA
    LEFT OUTER JOIN F_BI_STUDENT_SU MMARY_SESSION BB ON
    AA.STUDENT_ID=B B.STUDENT_ID AND AA.SESSION_ID=B B.SESSION_ID
    LEFT OUTER JOIN D_BI_STUDENT CC
    ON AA.STUDENT_ID = CC.STUDENT_ID AND CC.CURR_IND
    = 'Y'
    WHERE AA.SESSION_ID = OA.SESSION_ID
    AND ACTIVITY_DT <= OA.SESSION_DT
    AND CC.TALISMA_AO_I D = OA.ADMREP_ID
    AND STUDENT_STATUS = 'NEW'
    GROUP BY AA.SESSION_ID, aa.STUDENT_ID, STUDENT_STATUS
    HAVING SUM(ACTIVITY_CO UNT) 0
    ) as ppp
    )
    FROM F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_rt _current OA

    with 50K rows, this takes about 5 minutes. However I have almost
    9,000,000 rows of history. (15 years of 4 quarter sessions). If this
    build is linear, it would take 1.25 hours.

    My relationships and dependencies are housed in my ETL tool.
    // Connection: Target

    CREATE TABLE
    "dbo"."F_BI_Adm issions_Officer _Summary_By_Ses sion_RT_Current "
    (
    "AO_SUMMARY_KEY " VARCHAR(43) NULL,
    "ADMREP_ID" VARCHAR(15) NULL,
    "ADMREP_SKE Y" BIGINT NULL,
    "ADMREP_CAPTION " VARCHAR(20) NULL,
    "ADMREP_CAMPUS_ ID" VARCHAR(10) NULL,
    "ADMREP_CAMPUS_ CAPTION" VARCHAR(30) NULL,
    "ADMREP_REGION_ CAPTION" VARCHAR(20) NULL,
    "ADMREP_TEAM_CA PTION" VARCHAR(57) NULL,
    "ADMREP_EXPIRED _FLAG" VARCHAR(5) NULL,
    "SESSION_ID " CHAR(6) NULL,
    "SESSION_DA Y" INTEGER NULL,
    "SESSION_DT " DATETIME NULL,
    "LEAD_COUNT " INTEGER NULL,
    "INTERVIEW_COUN T" INTEGER NULL,
    "APPLICATION_CO UNT" INTEGER NULL,
    "LY_SESSION _ID" VARCHAR(10) NULL,
    "LY_LEAD_CO UNT" SMALLINT NULL,
    "LY_INTERVIEW_C OUNT" SMALLINT NULL,
    "LY_APPLICATION _COUNT" SMALLINT NULL,
    "LY_ALL_REGISTR ATION_COUNT" SMALLINT NULL,
    "NEW_READMIT_RE GISTRATION_COUN T" SMALLINT NULL,
    "ALL_REGISTRATI ON_COUNT" INTEGER NOT NULL,
    "NEW_REGISTRATI ON_COUNT" INTEGER NOT NULL,
    "READMIT_REGIST RATION_COUNT" INTEGER NOT NULL,
    "CONTINUING_REG ISTRATION_COUNT " INTEGER NOT NULL,
    "CANCELLED_REGI STRATION_COUNT" INTEGER NOT NULL,
    "LY_NEW_REGISTR ATION_COUNT" SMALLINT NULL,
    "LY_READMIT_REG ISTRATION_COUNT " SMALLINT NULL,
    "LY_CONTINUING_ REGISTRATION_CO UNT" SMALLINT NULL,
    "LY_CANCELLED_R EGISTRATION_COU NT" SMALLINT NULL,
    "TALISMA_CALLS_ IN" SMALLINT NULL,
    "TALISMA_CALLS_ OUT" SMALLINT NULL,
    "AVAYA_CALLS_IN " SMALLINT NULL,
    "AVAYA_CALLS_OU T" SMALLINT NULL,
    "LEAD_FOREC AST" SMALLINT NULL,
    "INTERVIEW_FORE CAST" SMALLINT NULL,
    "APPLICATION_FO RECAST" SMALLINT NULL,
    "NEW_REGISTRATI ON_FORECAST" SMALLINT NULL,
    "READMIT_REGIST RATION_FORECAST " SMALLINT NULL,
    "CONTINUING_REG ISTRATION_FOREC AST" SMALLINT NULL
    )
    ;

    // Connection: Target

    CREATE TABLE "dbo"."F_BI_Reg istration_Track ing"
    (
    "UNIQUE_KEY " VARCHAR(87) NULL,
    "REGISTRATION_K EY" VARCHAR(40) NULL,
    "REGTRACK_I D" VARCHAR(47) NULL,
    "CLASSES_OFFERE D_ID" VARCHAR(23) NULL,
    "STUDENT_ID " CHAR(20) NULL,
    "SESSION_ID " CHAR(6) NULL,
    "FULL_CLASS _ID" VARCHAR(15) NULL,
    "CAMPUS_ID" VARCHAR(10) NULL,
    "ACTIVITY_D T" DATETIME NULL,
    "ACTIVITY_C ODE" VARCHAR(1) NULL,
    "LOAD_DT" DATETIME NULL,
    "SOURCE" VARCHAR(4) NULL,
    "ACTIVITY_COUNT " SMALLINT NULL,
    "CLASS_DROP _DT" DATETIME NULL,
    "CLASS_DROP_COU NT" SMALLINT NULL,
    "CLASS_ADD_ DT" DATETIME NULL,
    "CLASS_ADD_COUN T" SMALLINT NULL,
    "BEFORE_D0_CLAS S_COUNT" SMALLINT NULL,
    "DAY0_CLASS_COU NT" SMALLINT NULL,
    "AFTER_D0_CLASS _COUNT" SMALLINT NULL,
    "ALL_CLASS_COUN T" SMALLINT NULL,
    "BEFORE_D0_ONLI NE_CLASS_COUNT" SMALLINT NULL,
    "DAY0_ONLINE_CL ASS_COUNT" SMALLINT NULL,
    "AFTER_D0_ONLIN E_CLASS_COUNT" SMALLINT NULL,
    "ALL_ONLINE_CLA SS_COUNT" SMALLINT NULL,
    "VM_POS" BIGINT NULL,
    "SOURCE_FIL E" SMALLINT NULL,
    "BANDED_ID" BIGINT NULL,
    "CLASSES_OFFERE D_SKEY" BIGINT NULL,
    "STUDENT_SK EY" BIGINT NULL,
    "SESSION_SK EY" BIGINT NULL,
    "CLASS_CAMPUS_S KEY" BIGINT NULL,
    "COMPUTED_D T" DATETIME NULL,
    "SESSION_DA Y" BIGINT NULL,
    "Count_Down " BIGINT NULL,
    "AFTER_DAY0_FIR ST_REG_FLAG" SMALLINT NULL
    )
    ;

  • --CELKO--

    #2
    Re: Update - can I do better statement?

    This is a mess. I woudl start over and get it right.

    1) Why do you use all uppercase letters? It is a screaming bitch to
    read and we did this because we had no choice in the 1950's with
    punch cards.
    2) Why do you have no keys in any table? Most columns are NULL-able, a
    sure sign that you have a rotten data model (if you have one at all).
    .. In fact you have more NULLs than the entire database for a major
    automobile company.
    3) Why are you storing computable totals in a table? That is how we
    did it with file systems; in SQL, we use a VIEW. Or we use a front-end
    report application.
    4) Why did you pick the unreadable alphabetic aliases? That is, why
    does CC remind the reader that the base table is "D_Bi_Stude nt"?
    5) Why did you use a "_key" suffix? That tells us HOW an attribute
    is use, not what the data element IS. Also, think about how stupid
    "unique_key " is - are there any non-unique keys???
    6) Is "curr_ind" actually a flag? We do not write with assembly
    level bit flags in SQL. It is redundant in a properly designed schema.
    The status is computed from the current state of the basic data.
    7) Why did you use so many proprietary data types?
    8) Why did you mix summary data and base data in the same table? Do
    you know what 1NF and the definition of a table are?
    9) Having both "session_dt " and "session_da y" is a bit
    redundant.
    10) Why are you keeping captions in a table with the data? A caption
    is part of a display and display is done in the front end, never in the
    database.
    11) Why are so many of your codes VARCHAR(n)? Most codes are fixed
    length.
    12) It looks like you need to split up your tables and normalize the
    schema. Think about the basic entities with which you are working -
    students, candidates, admissions personnel, etc. and then once those
    are in tables, think about their relationships.

    Comment

    • rcamarda

      #3
      Re: Update - can I do better statement?

      Celko, sometimes I cring when I see your replies.
      1. I use an ETL tool, it creates the DDL, I just copied and pasted. My
      extreme apologies.
      2. I use an ETL tool, it handles the relationships and data validation.
      I haven't put those in the tables and have relied on validation within
      the tool.
      3. This is a data warehouse, not an operational system. I build the
      data for speedy reporting, not data entry.
      4. Valid point, I need better hygen with alisases.
      5. this might be weakness in my SQL skills. Example. the business keys
      for the table im trying to update is ADMREP_ID, SESSION_ID and
      SESSION_DAY. I concatinate them into a single field so I have a unique
      key. I guess this would be my primary key.
      6. Yes, it is a Y/N field. Tables that are slowly changing dimensions
      have start_date, end_date, update_date and curr_ind. Y is the current
      record as it is in the operational system.
      7. This is an area that I would like to have more control within the
      ETL tool. it looks at the data and determines what type to create. Of
      course I could create the tables ahead of time, and not let the tool
      create them for me.
      8. the table I am trying to update is the summary table. Not sure what
      you are meaning.
      9. yes, if this were an operational database. This is a data warehouse
      for reporting. I want to keep the on the fly computations to a minimum.
      Also, the session_day is computed by taking the current date minus the
      last day to add drop. This allows us to compare a given day session by
      session. "How does day 0 for session 200703 compare to day 0 for
      session 200603". Day 0 (last day to add or drop classes) varies year to
      year and session to session.
      10. This data is for a data warehouse. I have 3 databases, SOURCE,
      STAGING and TARGET. Stating is as you describe, My admissions officer
      dimension has the keys and captions, where the student table has the
      admissions officer key, but not any information about the admissions
      officer. When I build the final data in Target, I have a lot of
      redundancy and pull in captions into the final tables.
      11. Good point, i've ment to convert those to char.
      12. Data warehouse, final data is de-normalized. (My vendor of our lead
      managment system says the same thing, which I remind them that I am not
      building data for data entry, but for reporting. This is also why I can
      smoke their home grown reporting with my data and Cognos tools).


      Thanks for the constructive suggestions Celko. I've learned a lot on
      how I might change the UPdate to run faster...no wait a minute, you
      just finished berating me and came just sort of calling me stupid...
      Never mind.


      --CELKO-- wrote:
      This is a mess. I would start over and get it right.
      >
      1) Why do you use all uppercase letters? It is a screaming bitch to
      read and we did this because we had no choice in the 1950's with
      punch cards.
      2) Why do you have no keys in any table? Most columns are NULL-able, a
      sure sign that you have a rotten data model (if you have one at all).
      . In fact you have more NULLs than the entire database for a major
      automobile company.
      3) Why are you storing computable totals in a table? That is how we
      did it with file systems; in SQL, we use a VIEW. Or we use a front-end
      report application.
      4) Why did you pick the unreadable alphabetic aliases? That is, why
      does CC remind the reader that the base table is "D_Bi_Stude nt"?
      5) Why did you use a "_key" suffix? That tells us HOW an attribute
      is use, not what the data element IS. Also, think about how stupid
      "unique_key " is - are there any non-unique keys???
      6) Is "curr_ind" actually a flag? We do not write with assembly
      level bit flags in SQL. It is redundant in a properly designed schema.
      The status is computed from the current state of the basic data.
      7) Why did you use so many proprietary data types?
      8) Why did you mix summary data and base data in the same table? Do
      you know what 1NF and the definition of a table are?
      9) Having both "session_dt " and "session_da y" is a bit
      redundant.
      10) Why are you keeping captions in a table with the data? A caption
      is part of a display and display is done in the front end, never in the
      database.
      11) Why are so many of your codes VARCHAR(n)? Most codes are fixed
      length.
      12) It looks like you need to split up your tables and normalize the
      schema. Think about the basic entities with which you are working -
      students, candidates, admissions personnel, etc. and then once those
      are in tables, think about their relationships.

      Comment

      • Ed Murphy

        #4
        Re: Update - can I do better statement?

        rcamarda wrote:
        UPDATE F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_RT _CURRENT -- (RT =
        rollingtotal)
        SET NEW_REGISTRATIO N_COUNT =
        (SELECT COUNT(DISTINCT STUDENT_ID)
        FROM
        (SELECT aa.session_id, aa.student_id, student_status
        FROM F_BI_REGISTRATI ON_TRACKING AA
        LEFT OUTER JOIN F_BI_STUDENT_SU MMARY_SESSION BB ON
        AA.STUDENT_ID=B B.STUDENT_ID AND AA.SESSION_ID=B B.SESSION_ID
        LEFT OUTER JOIN D_BI_STUDENT CC
        ON AA.STUDENT_ID = CC.STUDENT_ID AND CC.CURR_IND
        = 'Y'
        WHERE AA.SESSION_ID = OA.SESSION_ID
        AND ACTIVITY_DT <= OA.SESSION_DT
        AND CC.TALISMA_AO_I D = OA.ADMREP_ID
        AND STUDENT_STATUS = 'NEW'
        GROUP BY AA.SESSION_ID, aa.STUDENT_ID, STUDENT_STATUS
        HAVING SUM(ACTIVITY_CO UNT) 0
        ) as ppp
        )
        FROM F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_rt _current OA
        Do you have indexes on all appropriate columns?

        Which table contains STUDENT_STATUS?

        Does the join to D_BI_STUDENT need to be left outer? What about the
        join to F_BI_STUDENT_SU MMARY_SESSION? If so, can you eliminate the
        need by adding dummy rows (with some sort of this-is-a-dummy-row
        indicator) to those tables?

        Can you replace this form (boiled down from the original):

        SELECT COUNT(DISTINCT STUDENT_ID) FROM (
        SELECT SESSION_ID, STUDENT_ID, STUDENT_STATUS
        FROM ...
        GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
        HAVING SUM(ACTIVITY_CO UNT) 0
        )

        with this?

        SELECT COUNT(*) FROM (
        SELECT STUDENT_ID
        FROM ...
        GROUP BY STUDENT_ID
        HAVING SUM(ACTIVITY_CO UNT) 0
        )

        Comment

        • rcamarda

          #5
          Re: Update - can I do better statement?

          I have indexes build on all fields I use for where or joins. Engine
          tuning could not recommend improvments.
          Student_Status lives in the f_bi_student_su mmary_session table.
          admrep_id lives in d_bi_student.
          Your right about the left outer joins; inner joins would work. I should
          only have registrations for students that exist in the student
          dimension.
          Ill do some testing using the inner join instead.

          Your suggestion for the new SQL: Is your point that since I am using a
          where clause, it is redundant to have the additional group by's?
          Thanks for you input
          Rob

          Ed Murphy wrote:
          rcamarda wrote:
          >
          UPDATE F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_RT _CURRENT -- (RT =
          rollingtotal)
          SET NEW_REGISTRATIO N_COUNT =
          (SELECT COUNT(DISTINCT STUDENT_ID)
          FROM
          (SELECT aa.session_id, aa.student_id, student_status
          FROM F_BI_REGISTRATI ON_TRACKING AA
          LEFT OUTER JOIN F_BI_STUDENT_SU MMARY_SESSION BB ON
          AA.STUDENT_ID=B B.STUDENT_ID AND AA.SESSION_ID=B B.SESSION_ID
          LEFT OUTER JOIN D_BI_STUDENT CC
          ON AA.STUDENT_ID = CC.STUDENT_ID AND CC.CURR_IND
          = 'Y'
          WHERE AA.SESSION_ID = OA.SESSION_ID
          AND ACTIVITY_DT <= OA.SESSION_DT
          AND CC.TALISMA_AO_I D = OA.ADMREP_ID
          AND STUDENT_STATUS = 'NEW'
          GROUP BY AA.SESSION_ID, aa.STUDENT_ID, STUDENT_STATUS
          HAVING SUM(ACTIVITY_CO UNT) 0
          ) as ppp
          )
          FROM F_BI_ADMISSIONS _OFFICER_SUMMAR Y_BY_SESSION_rt _current OA
          >
          Do you have indexes on all appropriate columns?
          >
          Which table contains STUDENT_STATUS?
          >
          Does the join to D_BI_STUDENT need to be left outer? What about the
          join to F_BI_STUDENT_SU MMARY_SESSION? If so, can you eliminate the
          need by adding dummy rows (with some sort of this-is-a-dummy-row
          indicator) to those tables?
          >
          Can you replace this form (boiled down from the original):
          >
          SELECT COUNT(DISTINCT STUDENT_ID) FROM (
          SELECT SESSION_ID, STUDENT_ID, STUDENT_STATUS
          FROM ...
          GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
          HAVING SUM(ACTIVITY_CO UNT) 0
          )
          >
          with this?
          >
          SELECT COUNT(*) FROM (
          SELECT STUDENT_ID
          FROM ...
          GROUP BY STUDENT_ID
          HAVING SUM(ACTIVITY_CO UNT) 0
          )

          Comment

          • Ed Murphy

            #6
            Re: Update - can I do better statement?

            rcamarda wrote:
            Your suggestion for the new SQL: Is your point that since I am using a
            where clause, it is redundant to have the additional group by's?
            That, and also
            GROUP BY STUDENT_ID
            HAVING SUM(ACTIVITY_CO UNT) 0
            will pick up the same students as
            GROUP BY SESSION_ID, STUDENT_ID, STUDENT_STATUS
            HAVING SUM(ACTIVITY_CO UNT) 0
            (assuming that ACTIVITY_COUNT is never negative).

            Comment

            • Erland Sommarskog

              #7
              Re: Update - can I do better statement?

              rcamarda (robert.a.camar da@gmail.com) writes:
              I have created a table that contains buckets to hold activitives of
              enrollment for each of our admissions officer for each day of an
              enrollment session. I have an UPDATE that builds rolling totals and
              updates said table. I count unique students from a table that contains
              registration information.
              Everything works, however it runs pretty slowly and I am wondering if
              there is a better approach to writing the SQL. SQL's Engine Tuning
              advisor has 0% improvment.
              I'd appreciate any suggestions as to better ways to accomplish my goal.
              Or, compliments on my genius if there isnt a better way! *grin*
              Query tuning is rarely a trivial game, and when you don't have
              the completely picture, it is even more difficult. You posted two
              of three tables, but you did not include any indexes.

              And it would certainly have helped if you had posted a working query.
              When I add:

              CREATE TABLE D_BI_STUDENT(
              STUDENT_ID char(20) NULL,
              CURR_IND char(1) NULL,
              TALISMA_AO_ID varchar(15) NULL,
              STUDENT_STATUS varchar(19) NULL,
              )
              go

              and fix your inconsistent usage of upper- and lowercase, the query dies
              with:

              Msg 209, Level 16, State 1, Line 12
              Ambiguous column name 'ACTIVITY_DT'.
              Msg 209, Level 16, State 1, Line 16
              Ambiguous column name 'ACTIVITY_COUNT '.

              So all I know is that you have a query that is slow, and two tables
              without their indexes, but I don't even know the query.

              I can make two reflections from the query as posted:

              1) The self-join to F_BI_Registrati on_Tracking with alias BB serves no
              purpose at all. If it appears in the real query, there may be some
              performance to win by removing it.

              2) The outer join to D_BI_STUDENT is in practice an inner join, since you
              have the condition STUDENT_STATUS = 'NEW', which appears to relate to
              this table. I don't know what the intention is, but you should probably
              either move this condition to the ON clause, or you should make the
              outer join an inner join. Not that I think it will boost performance
              much, but it makes the query clearer.

              As for performance, roiling totals is a difficult thing, since SQL Server
              has no query construct to support this well.

              For further assistance, I suggest that you post the actualy query you have
              now, and that in you include the schema for all tables, as well as all
              indexes you have now. It also helps if you use aliases for all columns,
              and if you apply upper/lowercase consistently. (I run with a case-sensitive
              collation, and in general I recommend this for development.)

              And, of course, please state which version of SQL Server you are using.

              --
              Erland Sommarskog, SQL Server MVP, esquel@sommarsk og.se

              Books Online for SQL Server 2005 at

              Books Online for SQL Server 2000 at

              Comment

              • --CELKO--

                #8
                Re: Update - can I do better statement?

                >1. I use an ETL tool, it creates the DDL, I just copied and pasted. <<

                There are some open source "pretty printers" for SQL, which would be
                good to have around the shop. So many of the packages generate code
                for a compiler to use and not for a human being to read.
                >3. This is a data warehouse, not an operational system. <<
                Can you get a DW package, which would handle this stuff much better
                than an SQL aimed at OLTP? Or will the boss scream about the budget?
                DW is a really diffrent game and that explains some of your code
                decisions.
                >>. the business keys for the table im trying to update is ADMREP_ID, SESSION_ID and
                SESSION_DAY. I concatinate them into a single field [sic] so I have a
                unique key. <<

                Here we are back to the conceptual problems. Columns are not fields,
                nothing like them at all. The concatenation is a bad idea and you
                should use a three-column key instead.
                >6. Yes, it is a Y/N field. Tables that are slowly changing dimensions have start_date, end_date, update_date and curr_ind. Y is the current record [sic] as it is in the operational system. <<
                You can get that fact from the dates, can't you? There should be no
                need for flags in a DW. Also, that implies that the DW is getting live
                feeds from operations. Is that true?
                >7. This is an area that I would like to have more control within the ETL tool. it looks at the data and determines what type to create. Of course I could create the tables ahead of time, and not let the tool create them for me. <<
                That sounds like the answer. The cost of casting, the chance for
                errors, etc. is just too much. Can the ETL tool load the tables
                without having to immediately re-index them as it goes?
                >Thanks for the constructive suggestions Celko. I've learned a lot on how I might change the UPdate to run faster...no wait a minute, you just finished berating me and came just sort of calling me stupid... Never mind. <<
                LOL!!

                Comment

                • othellomy@yahoo.com

                  #9
                  Re: Update - can I do better statement?

                  with 50K rows, this takes about 5 minutes. However I have almost
                  9,000,000 rows of history. (15 years of 4 quarter sessions). If this
                  build is linear, it would take 1.25 hours.
                  Your update clause is way to complex and I think you should be very
                  lucky if you can get an answer in 1.5 hours for 9 million records (you
                  can use all the indexes in the world). My best estimation is it will
                  take more than 24 hours which is unrealistic.

                  Comment

                  • othellomy@yahoo.com

                    #10
                    Re: Update - can I do better statement?

                    with 50K rows, this takes about 5 minutes. However I have almost
                    9,000,000 rows of history. (15 years of 4 quarter sessions). If this
                    build is linear, it would take 1.25 hours.
                    Maybe you should do a linear build if you can. Then you can add up the
                    counts into a variable. However, it's not a nice solution but it just
                    might work.

                    Comment

                    • Tony Rogerson

                      #11
                      Re: Update - can I do better statement?

                      Can you get a DW package, which would handle this stuff much better
                      than an SQL aimed at OLTP? Or will the boss scream about the budget?
                      DW is a really diffrent game and that explains some of your code
                      decisions.
                      As usual you have not done your research on this, simply installing
                      Microsoft SQL Server and you will quickly realise (part of the set up) that
                      the product SQL Server is actually a name given to group the data engine
                      offering from Microsoft - relational engine, dimensioanl/aggregation/data
                      mining engine (analysis services), etl (integration services) messaging
                      (service broker).

                      It is a fundemental principal of a good consultant to do their research, for
                      most of us it comes naturally for you it obviously doesn't.

                      --
                      Tony Rogerson
                      SQL Server MVP
                      http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
                      Server Consultant
                      http://sqlserverfaq.com - free video tutorials


                      "--CELKO--" <jcelko212@eart hlink.netwrote in message
                      news:1165169193 .259749.173140@ 80g2000cwy.goog legroups.com...
                      >>1. I use an ETL tool, it creates the DDL, I just copied and pasted. <<
                      >
                      There are some open source "pretty printers" for SQL, which would be
                      good to have around the shop. So many of the packages generate code
                      for a compiler to use and not for a human being to read.
                      >
                      >>3. This is a data warehouse, not an operational system. <<
                      >
                      Can you get a DW package, which would handle this stuff much better
                      than an SQL aimed at OLTP? Or will the boss scream about the budget?
                      DW is a really diffrent game and that explains some of your code
                      decisions.
                      >
                      >>>. the business keys for the table im trying to update is ADMREP_ID,
                      >>>SESSION_ID and
                      SESSION_DAY. I concatinate them into a single field [sic] so I have a
                      unique key. <<
                      >
                      Here we are back to the conceptual problems. Columns are not fields,
                      nothing like them at all. The concatenation is a bad idea and you
                      should use a three-column key instead.
                      >
                      >>6. Yes, it is a Y/N field. Tables that are slowly changing dimensions
                      >>have start_date, end_date, update_date and curr_ind. Y is the current
                      >>record [sic] as it is in the operational system. <<
                      >
                      You can get that fact from the dates, can't you? There should be no
                      need for flags in a DW. Also, that implies that the DW is getting live
                      feeds from operations. Is that true?
                      >
                      >>7. This is an area that I would like to have more control within the ETL
                      >>tool. it looks at the data and determines what type to create. Of course
                      >>I could create the tables ahead of time, and not let the tool create
                      >>them for me. <<
                      >
                      That sounds like the answer. The cost of casting, the chance for
                      errors, etc. is just too much. Can the ETL tool load the tables
                      without having to immediately re-index them as it goes?
                      >
                      >>Thanks for the constructive suggestions Celko. I've learned a lot on how
                      >>I might change the UPdate to run faster...no wait a minute, you just
                      >>finished berating me and came just sort of calling me stupid... Never
                      >>mind. <<
                      >
                      LOL!!
                      >

                      Comment

                      Working...