Combining data from differnet Excel sheets

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ilan

    Combining data from differnet Excel sheets

    Hi all
    I need to add data from two Excel sheets (both on the same workbook)
    to an existing table in my SQL DB.
    The problem is that each sheet holds different fields for the same
    record, though the records are sorted by row numbers.
    (I had to split the fields to different sheets because Excel has a
    limit of 256 fields in each sheet)
    My sheets are quite large (~55,000 rows and 200 columns each) and I'll
    have to repeat this action many time, so some kind of an automation
    will nice (VB, DTS package?)

    I am new to SQLServer, so an example will be most helpful.

    Many thanks,
    Ilan
  • Simon Hayes

    #2
    Re: Combining data from differnet Excel sheets


    "Ilan" <levyi@walla.co .il> wrote in message
    news:78b6d73b.0 402260720.530da c71@posting.goo gle.com...[color=blue]
    > Hi all
    > I need to add data from two Excel sheets (both on the same workbook)
    > to an existing table in my SQL DB.
    > The problem is that each sheet holds different fields for the same
    > record, though the records are sorted by row numbers.
    > (I had to split the fields to different sheets because Excel has a
    > limit of 256 fields in each sheet)
    > My sheets are quite large (~55,000 rows and 200 columns each) and I'll
    > have to repeat this action many time, so some kind of an automation
    > will nice (VB, DTS package?)
    >
    > I am new to SQLServer, so an example will be most helpful.
    >
    > Many thanks,
    > Ilan[/color]

    I don't really understand your description, but you might look at using DTS
    to load the data into staging tables, then use TSQL to move the data to the
    production tables. That's a common general approach, but without more
    specific details of what you're trying to do, perhaps including a more exact
    description of the data and the destination table structure, it's not easy
    to be more precise.

    Simon


    Comment

    • Ilan

      #3
      Re: Combining data from differnet Excel sheets

      Simon,
      Sorry for not being clear enough.
      What I have are two sheets, each holding 200 columns (fields 1-200 at
      the first and fields 201-400 at the second), where each row (or
      record) starts at the first sheet and continues at the second. All the
      data is numeric.
      What I need to do is somehow concatenate the data on the corresponding
      rows from both sheets and (eventualy) append them to an existing table
      in MSSQL.
      I could try and export each sheet as a seperate CSV file and then work
      with them as text file from some DOS manipulation, but since they are
      very long (50,000 rows) it doesn't look easier.
      The data in the sheets in arranged according to the target SQL table,
      that is columns 1-200 in the first sheet are fields 1-200 in the
      table, and columns 1-200 in the second sheet are fields 201-400.
      I am new to MSSQL so I don't know how to work with the tables within
      SQL, but if you show me the path, I'll follow your leads.

      Thanks,
      Ilan

      "Simon Hayes" <sql@hayes.ch > wrote in message news:<403e40b5$ 1_3@news.bluewi n.ch>...[color=blue]
      > "Ilan" <levyi@walla.co .il> wrote in message
      > news:78b6d73b.0 402260720.530da c71@posting.goo gle.com...[color=green]
      > > Hi all
      > > I need to add data from two Excel sheets (both on the same workbook)
      > > to an existing table in my SQL DB.
      > > The problem is that each sheet holds different fields for the same
      > > record, though the records are sorted by row numbers.
      > > (I had to split the fields to different sheets because Excel has a
      > > limit of 256 fields in each sheet)
      > > My sheets are quite large (~55,000 rows and 200 columns each) and I'll
      > > have to repeat this action many time, so some kind of an automation
      > > will nice (VB, DTS package?)
      > >
      > > I am new to SQLServer, so an example will be most helpful.
      > >
      > > Many thanks,
      > > Ilan[/color]
      >
      > I don't really understand your description, but you might look at using DTS
      > to load the data into staging tables, then use TSQL to move the data to the
      > production tables. That's a common general approach, but without more
      > specific details of what you're trying to do, perhaps including a more exact
      > description of the data and the destination table structure, it's not easy
      > to be more precise.
      >
      > Simon[/color]

      Comment

      • DCM Fan

        #4
        Re: Combining data from differnet Excel sheets

        Here's how I might go about this:

        1) Create the 400-field table on SQL Server. (I don't have an easy way to that
        w/o knowing the data). If you don't have reall field names, then use Fld001,
        Fld002, Fld003, ... Fld400

        2) Add a 401st field, RowID, int NOT NULL

        3) Open your Excel workbook, and put the formula "=ROW(A1)" in cell A1 on both
        sheets, and fill it down.

        4) Insert a row at row 1, and put in the same field names you had in you SQL
        table, RowID, Fld001, Fld002, etc

        5) Place that Excel workbook either on the SQL Server directly, or on a machine
        that the SQL Server can access w/ Admin privs.

        6) Import the data (by right-clicking the destination table and choosing All
        tasks>import data). When the wizard comes up, choose Excel file as source, and
        the destination.

        In the next wizard screem choose Use a query to copy data.

        In the next screen, add Sheet1 and Sheet2, all fields.

        In the next screen (I think it's the sort screen), you can blow by it.

        In the next screen, specify the condition as Sheet1.RowID = Sheet2.RowID

        The next screen you tell it which table to append to. Pick the destination
        table, and finish (run it!)

        That should do it...assuming your destination table and your Excel file have
        field names exactly the same.

        Comment

        • Simon Hayes

          #5
          Re: Combining data from differnet Excel sheets

          levyi@walla.co. il (Ilan) wrote in message news:<78b6d73b. 0402261607.4967 4ae7@posting.go ogle.com>...[color=blue]
          > Simon,
          > Sorry for not being clear enough.
          > What I have are two sheets, each holding 200 columns (fields 1-200 at
          > the first and fields 201-400 at the second), where each row (or
          > record) starts at the first sheet and continues at the second. All the
          > data is numeric.
          > What I need to do is somehow concatenate the data on the corresponding
          > rows from both sheets and (eventualy) append them to an existing table
          > in MSSQL.
          > I could try and export each sheet as a seperate CSV file and then work
          > with them as text file from some DOS manipulation, but since they are
          > very long (50,000 rows) it doesn't look easier.
          > The data in the sheets in arranged according to the target SQL table,
          > that is columns 1-200 in the first sheet are fields 1-200 in the
          > table, and columns 1-200 in the second sheet are fields 201-400.
          > I am new to MSSQL so I don't know how to work with the tables within
          > SQL, but if you show me the path, I'll follow your leads.
          >
          > Thanks,
          > Ilan
          >
          > "Simon Hayes" <sql@hayes.ch > wrote in message news:<403e40b5$ 1_3@news.bluewi n.ch>...[color=green]
          > > "Ilan" <levyi@walla.co .il> wrote in message
          > > news:78b6d73b.0 402260720.530da c71@posting.goo gle.com...[color=darkred]
          > > > Hi all
          > > > I need to add data from two Excel sheets (both on the same workbook)
          > > > to an existing table in my SQL DB.
          > > > The problem is that each sheet holds different fields for the same
          > > > record, though the records are sorted by row numbers.
          > > > (I had to split the fields to different sheets because Excel has a
          > > > limit of 256 fields in each sheet)
          > > > My sheets are quite large (~55,000 rows and 200 columns each) and I'll
          > > > have to repeat this action many time, so some kind of an automation
          > > > will nice (VB, DTS package?)
          > > >
          > > > I am new to SQLServer, so an example will be most helpful.
          > > >
          > > > Many thanks,
          > > > Ilan[/color]
          > >
          > > I don't really understand your description, but you might look at using DTS
          > > to load the data into staging tables, then use TSQL to move the data to the
          > > production tables. That's a common general approach, but without more
          > > specific details of what you're trying to do, perhaps including a more exact
          > > description of the data and the destination table structure, it's not easy
          > > to be more precise.
          > >
          > > Simon[/color][/color]

          First of all, if you have a 400 column table, then you should probably
          review your data model - there aren't many situations where there is a
          genuine need for such a table. In fact, depending on what your data
          is, you might find that you can't do this anyway, since you are
          limited to 8060 bytes per row.

          But if you really do need to do this, then one possible solution is
          along the following lines:

          1. Add a column in Excel to identify each row. Make sure that this
          correctly 'links' the data on each sheet, ie. row 1 on sheet A is
          'continued' on row 1 on sheet B.

          2. Create two staging tables, each with 201 columns, for the ID column
          plus the 200 data columns.

          3. Load the sheets into the staging tables with DTS

          4. Insert the data into the final table:

          INSERT INTO dbo.Destination
          SELECT a.col1, a.col2, ..., a.col200, b.col201, b.col202, ...,
          b.col400
          FROM dbo.StagingA a
          JOIN dbo.Staging B b
          ON A.RowID = b.RowID

          However, I strongly suggest your review your data model, for the
          reasons I mentioned above.

          Simon

          Comment

          • Ilan

            #6
            Re: Combining data from differnet Excel sheets

            sql@hayes.ch (Simon Hayes) wrote in message news:<60cd0137. 0402270117.1d12 02fd@posting.go ogle.com>...[color=blue]
            > levyi@walla.co. il (Ilan) wrote in message news:<78b6d73b. 0402261607.4967 4ae7@posting.go ogle.com>...[color=green]
            > > Simon,
            > > Sorry for not being clear enough.
            > > What I have are two sheets, each holding 200 columns (fields 1-200 at
            > > the first and fields 201-400 at the second), where each row (or
            > > record) starts at the first sheet and continues at the second. All the
            > > data is numeric.
            > > What I need to do is somehow concatenate the data on the corresponding
            > > rows from both sheets and (eventualy) append them to an existing table
            > > in MSSQL.
            > > I could try and export each sheet as a seperate CSV file and then work
            > > with them as text file from some DOS manipulation, but since they are
            > > very long (50,000 rows) it doesn't look easier.
            > > The data in the sheets in arranged according to the target SQL table,
            > > that is columns 1-200 in the first sheet are fields 1-200 in the
            > > table, and columns 1-200 in the second sheet are fields 201-400.
            > > I am new to MSSQL so I don't know how to work with the tables within
            > > SQL, but if you show me the path, I'll follow your leads.
            > >
            > > Thanks,
            > > Ilan
            > >
            > > "Simon Hayes" <sql@hayes.ch > wrote in message news:<403e40b5$ 1_3@news.bluewi n.ch>...[color=darkred]
            > > > "Ilan" <levyi@walla.co .il> wrote in message
            > > > news:78b6d73b.0 402260720.530da c71@posting.goo gle.com...
            > > > > Hi all
            > > > > I need to add data from two Excel sheets (both on the same workbook)
            > > > > to an existing table in my SQL DB.
            > > > > The problem is that each sheet holds different fields for the same
            > > > > record, though the records are sorted by row numbers.
            > > > > (I had to split the fields to different sheets because Excel has a
            > > > > limit of 256 fields in each sheet)
            > > > > My sheets are quite large (~55,000 rows and 200 columns each) and I'll
            > > > > have to repeat this action many time, so some kind of an automation
            > > > > will nice (VB, DTS package?)
            > > > >
            > > > > I am new to SQLServer, so an example will be most helpful.
            > > > >
            > > > > Many thanks,
            > > > > Ilan
            > > >
            > > > I don't really understand your description, but you might look at using DTS
            > > > to load the data into staging tables, then use TSQL to move the data to the
            > > > production tables. That's a common general approach, but without more
            > > > specific details of what you're trying to do, perhaps including a more exact
            > > > description of the data and the destination table structure, it's not easy
            > > > to be more precise.
            > > >
            > > > Simon[/color][/color]
            >
            > First of all, if you have a 400 column table, then you should probably
            > review your data model - there aren't many situations where there is a
            > genuine need for such a table. In fact, depending on what your data
            > is, you might find that you can't do this anyway, since you are
            > limited to 8060 bytes per row.
            >
            > But if you really do need to do this, then one possible solution is
            > along the following lines:
            >
            > 1. Add a column in Excel to identify each row. Make sure that this
            > correctly 'links' the data on each sheet, ie. row 1 on sheet A is
            > 'continued' on row 1 on sheet B.
            >
            > 2. Create two staging tables, each with 201 columns, for the ID column
            > plus the 200 data columns.
            >
            > 3. Load the sheets into the staging tables with DTS
            >
            > 4. Insert the data into the final table:
            >
            > INSERT INTO dbo.Destination
            > SELECT a.col1, a.col2, ..., a.col200, b.col201, b.col202, ...,
            > b.col400
            > FROM dbo.StagingA a
            > JOIN dbo.Staging B b
            > ON A.RowID = b.RowID
            >
            > However, I strongly suggest your review your data model, for the
            > reasons I mentioned above.
            >
            > Simon[/color]

            Simon,
            I spent the last weekend trying to get Excel to work with this amount
            of data, without success. I guess I've reached his limits (there is a
            good description of Excel memory limits in:
            http://www.decisionmodels.com/memlimitsc.htm , FYI).
            So I decided I'll export the data to 10 separate CSV files (each
            holding ~50 columns) and then use another application (e.g. MathLab)
            to join them back together. Another option is to run all the
            calculations in MathLab.
            As for the data, what I have is meteorological data every 5 minutes
            for 6 months (hence the 52,560 rows). For this data I'm calculating 4
            other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
            4x96 plus 20 original columns, giving 404 columns). Then I have a
            total of 4 years data for each meteorological station (yielding
            420,480 rows) that needs to be multiplied by 15 meteorological
            station. Altogether I have over 6,000,000 records and 400 fields to
            analyze. The only solution I can think of for this amount of data is
            to put it on an SQL DB, and run queries that will be analyzed with a
            statistical program (e.g. SPSS, JMP or other). For the record, this is
            part of my PhD thesis on air pollution and meteorology in a coastal
            region.
            If anyone has any experience analyzing this amount of data that he is
            willing to share, I'll be happy to learn.

            Ilan

            Comment

            • DCM Fan

              #7
              Re: Combining data from differnet Excel sheets

              Ilan,

              You didn't say if the data load procedure that I suggested (and Simon
              reiterated, almost word-for-word) got your data into SQL Server.

              Now you seem to be asking a different question, like "Gee, what tool should I
              be using for this data?"

              We can't make that decision unless we know exactly what kinds of calculations
              you plan to do. For if you're doing complex math analysis (using advanced
              statistical models, complex numbers, or hyperbolic trigonometry), then you're
              probably better-off with a product like MathLab.

              If you're just doing normal stats (Avg, StDev, basic trig), then SQL Server
              will shine.

              Now, about your data model:

              <<As for the data, what I have is meteorological data every 5 minutes
              for 6 months (hence the 52,560 rows). For this data I'm calculating 4
              other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
              4x96 plus 20 original columns, giving 404 columns>>

              OK, so you have 20 values coming in every 5 minutes, and then 4 calculated
              values every hour up to 96 hours. I have to assume that your 4 calucalted
              values are "rolling" calucaltions; otherwise, your model would contain 4X96
              duplicate columns of data every 12 rows...

              ....do I have it right so far?

              Without knowing how the data is going to be used, it's hard to sugeest a good
              data model. Maybe the one-table is good because you know you're always showing
              all 400+ fields in your reports! (I don't know how any human could
              intelligently parse such printed data, but meteorologists might be special...)

              More likely, you're pulling reports like this: Show the 24/48/72/96 hour trends
              for these 3 locations from May to August for [special calculation 1 of the 4].
              That's a 12-cell report...much more manageable, useful and meaningful. If
              that's more like how the data will be used, then you might break the massive
              table into 5 tables:

              Table1: The 20 fields
              Table2: 96 fields for the first calc (+ the key field(s) from Table1)
              Table3: 96 fields for the 2nd calc (+ the key field(s) from Table1)
              Table4: 96 fields for the 3rd calc (+ the key field(s) from Table1)
              Table5: 96 fields for the 4th calc (+ the key field(s) from Table1)

              Another approach would be to ditch the 96-field idea altogether, instead using
              4X96 RECORDS in a realted table, using sound relational models, just in case
              your 4 calc fields becomes 5 some day! And what if your 20 fields becomes 21,
              22?

              That might be ultra-ugly in your case, because you'd have a parent table with 6
              million records, and a child table with 2.4 BILLION records. (ea of the 6 mil
              would have ~400 child records)

              I only sugest that as a matter of completeness... a true relational approach
              probably ISN'T what you need. Technically, the relational model dictates that
              you'd have 120,000,000 Measurement records (6 Mil X 20), with everything else
              (the 4X96) being calcualted as needed! That's the absolute most-flexible way to
              do this, because it doesn't care how many original (20) or calc (4) fields
              there are.

              But it might also be the SLOWEST execution. (if your calcs are just averages,
              that's not a problem)

              Anyway, I'm just throwing all of this out as food for thought. This much I
              know:

              SQL Server can certainly load the data from those Excel files...

              Comment

              • Ilan

                #8
                Re: Combining data from differnet Excel sheets

                dcmfan@aol.comSPNOAM (DCM Fan) wrote in message news:<200403010 41339.02011.000 00734@mb-m19.aol.com>...[color=blue]
                > Ilan,
                >
                > You didn't say if the data load procedure that I suggested (and Simon
                > reiterated, almost word-for-word) got your data into SQL Server.[/color]

                I did try your procedure with a sample file and it works greate,
                except for two problems: First, Excel couldn't handle the calculations
                of files this size (it stopped responding), so obviously I cannot use
                it for my needs (I only found this limit AFTER posting my original
                message). Second, I have to repeat this action many times, so I'm
                hopping for a way to automate the proccess like using VB.
                [color=blue]
                > Now you seem to be asking a different question, like "Gee, what tool should I
                > be using for this data?"
                >
                > We can't make that decision unless we know exactly what kinds of calculations
                > you plan to do. For if you're doing complex math analysis (using advanced
                > statistical models, complex numbers, or hyperbolic trigonometry), then you're
                > probably better-off with a product like MathLab.
                >
                > If you're just doing normal stats (Avg, StDev, basic trig), then SQL Server
                > will shine.[/color]

                Well, this is why I wanted to use Excel in the first place. I am
                calculating some basic trig (sin, cos) and algebra both on multiple
                rows (e.g. every hour) and on columns. This is easier to do on a
                Spreadsheet or a matrix (hence MatLab) then on a Database.
                [color=blue]
                > Now, about your data model:
                >
                > <<As for the data, what I have is meteorological data every 5 minutes
                > for 6 months (hence the 52,560 rows). For this data I'm calculating 4
                > other parameters for every 1 hour, 2 hours, ..., 96 hours (hence the
                > 4x96 plus 20 original columns, giving 404 columns>>
                >
                > OK, so you have 20 values coming in every 5 minutes, and then 4 calculated
                > values every hour up to 96 hours. I have to assume that your 4 calucalted
                > values are "rolling" calucaltions; otherwise, your model would contain 4X96
                > duplicate columns of data every 12 rows...
                >
                > ...do I have it right so far?[/color]

                Yes.
                [color=blue]
                > Without knowing how the data is going to be used, it's hard to sugeest a good
                > data model. Maybe the one-table is good because you know you're always showing
                > all 400+ fields in your reports! (I don't know how any human could
                > intelligently parse such printed data, but meteorologists might be special...)[/color]

                Thats where 3D graphics come in, displaying the data in such a way
                that a human eye can SEE it and make some sense of it all.
                [color=blue]
                > More likely, you're pulling reports like this: Show the 24/48/72/96 hour trends
                > for these 3 locations from May to August for [special calculation 1 of the 4].
                > That's a 12-cell report...much more manageable, useful and meaningful. If
                > that's more like how the data will be used, then you might break the massive
                > table into 5 tables:
                >
                > Table1: The 20 fields
                > Table2: 96 fields for the first calc (+ the key field(s) from Table1)
                > Table3: 96 fields for the 2nd calc (+ the key field(s) from Table1)
                > Table4: 96 fields for the 3rd calc (+ the key field(s) from Table1)
                > Table5: 96 fields for the 4th calc (+ the key field(s) from Table1)[/color]

                Yes, I will also be making these kind of analysis/queries on the data,
                but I'm not sure I understand why 5 tables are better then putting all
                the data in one. This is another issue I know I need the advise of
                experts.
                [color=blue]
                > Another approach would be to ditch the 96-field idea altogether, instead using
                > 4X96 RECORDS in a realted table, using sound relational models, just in case
                > your 4 calc fields becomes 5 some day! And what if your 20 fields becomes 21,
                > 22?[/color]

                Hmmm. Well, this is certainly a food for thought, and I see I'll have
                to re-think the way I want to use the database and what kind of data
                I'll need to pull out of it.
                On a second thought, this is one of the attraction of doing pure
                research: you don't always know what kind of answers you'll get, and
                what new questions these answers will raise.

                Many thanks,
                Ilan
                PS, I'll be doing this thinking in the next week in Paris, France on
                vacation. Please excuse if I don't post replies.

                Comment

                • DCM Fan

                  #9
                  Re: Combining data from differnet Excel sheets

                  <<First, Excel couldn't handle the calculations
                  of files this size (it stopped responding), so obviously I cannot use
                  it for my needs (I only found this limit AFTER posting my original
                  message). Second, I have to repeat this action many times, so I'm
                  hopping for a way to automate the proccess like using VB.>>

                  Oh, so you still need the Excel files, but you also need a way to load
                  historical data into SQL Server ultimately. I was under the impression that
                  Excel would go away completely....

                  <<Well, this is why I wanted to use Excel in the first place. I am
                  calculating some basic trig (sin, cos) and algebra both on multiple
                  rows (e.g. every hour) and on columns. This is easier to do on a
                  Spreadsheet or a matrix (hence MatLab) then on a Database.>>

                  Yes, easier for humans b/c of its 2-dimensional element. Databases are more
                  3-D-ish, and a bit more difficult to grasp AT FIRST, but once you see the
                  light, you'll wish you had started in SQL. The programming language of SQL
                  Server (T-SQL) will give you ever set operation you need....

                  <<Thats where 3D graphics come in, displaying the data in such a way
                  that a human eye can SEE it and make some sense of it all.>>

                  No doubt....then you're obviously going to be using some other tool for that!

                  <<Yes, I will also be making these kind of analysis/queries on the data,
                  but I'm not sure I understand why 5 tables are better then putting all
                  the data in one. This is another issue I know I need the advise of
                  experts.>>

                  Well, if you're the only person who's going to use this tool, it might not
                  matter. But if you're going to build a web front-end where hundreds of users
                  can query the data and get reports, then it makes sense to break it up. If you
                  ever need data from all 5 tables, great, join them on the key fields. But if
                  you don't, then you can save processing time by limiting the domain to the
                  target calcuations.

                  <<On a second thought, this is one of the attraction of doing pure
                  research: you don't always know what kind of answers you'll get, and
                  what new questions these answers will raise.>>

                  But it seems that the technology is actually getting in the way of your
                  research...

                  Comment

                  Working...