Converting Excel to Normalised Database

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • PhilOfWalton
    Recognized Expert Top Contributor
    • Mar 2016
    • 1430

    Converting Excel to Normalised Database

    I am trying to create a normalised database from a linked Excel file as a general purpose database.

    Suppose the Excel File is
    Code:
    Person Number, Name, Address Lines, City, State, Zip Code
    I have already created the following tables.
    Code:
    TblCities
        CityID       		AutoNumber  PK
        City          		Text
        StateID    		Long               FK
    
    TblPersons
        PersonID		AutoNumber  PK
        PerdonNo            Long
        PersonName	        Text
        AddressID		Long		FK
    
    TblStates
        StateID		AutoNumber  PK
        State		Text
    
    TblAddress
        AddressID		AutoNumber  PK
        AddressLines	Text
        ZipCode		Text
        CityID		Long		FK
    Note the arbitrary order of the above tables.

    I also have a table that gives the Excel field names that match the Access Table & Field Names.

    Now in order to populate them, I must do it in the correct order:-
    TblStates, TblCities, TblAddress, TblPersons

    I had hoped that MsysRelationshi ps would give me the correct information, but although I get the relevant table names & linking fields, the Table names seem to be arbitrarily in the szObject column or the szReferenceObje ct column.

    Any thoughts please.

    Phil
  • twinnyfo
    Recognized Expert Moderator Specialist
    • Nov 2011
    • 3653

    #2
    Hey Phili,

    Could you explain this one a bit better? I “thought” that if you have created the tables yourself (and established the relationships yourself) that all would be well during the import. I personally don’t care what MsysRelationshi ps says, because sometimes Access creates its own names—that can always be changed later on if we really need to.

    Is the data not importing correctly?

    Comment

    • PhilOfWalton
      Recognized Expert Top Contributor
      • Mar 2016
      • 1430

      #3
      Hi Twinnyfo,

      Sorry the explanation was not clear.

      I am trying to create a general purpose Db that can be "incorporat ed" into a client's Db to Import / Link one or more Excel files, Scan those files and put the data into the Client's existing tables.

      Now to do that, the tables have to be built up in a certain order, and the first ones to be populated must me those with no Foreign keys (end of a chain).

      Hence in the example above, nothing depends on the TblStates, but the TblCities relies on knowing the State that it is in (StateID).

      Likewise, the TblAddress needs the correct city, so the appropriate CityID must be established before the TblAddress can be populated.


      Here is is an initial attempt, but please ignore the last 3 columns - I think I can pull this information from MsysRelationshi ps.

      Having defined the details of the Excel file in an Access table (Folder, File and Sheet), I link the Excel File and call it "TblExcelPlant" . Also load the Date Last Modified. I also define the Excel Link Field in case there is more than 1 Excel file, and they all have a common unique field that potentially combines the Excel files.

      In the subform, the first column (Excel Field Name) lists all the field names found in the Excel File.
      The second column (Access Table Name) shows all the tables in the database (obtained from MsysObjects)
      The third column (Access Field Name) shows all the fields in the table defined in the second column.

      So we can say for example the field called "Project Manager" in the Excel table has to be loaded to the "LastName" in the TblEmployees.

      Hope that is clearer

      Phil




      Hope
      Attached Files

      Comment

      • twinnyfo
        Recognized Expert Moderator Specialist
        • Nov 2011
        • 3653

        #4
        So, now your project makes sense (which is amazing, as usual), but I am not exactly sure what the problem is, as things look as they should from your pic.

        perhaps an example of:
        the Table names seem to be arbitrarily in the szObject column or the szReferenceObje ct column.
        That might help?

        Comment

        • PhilOfWalton
          Recognized Expert Top Contributor
          • Mar 2016
          • 1430

          #5
          Sorry for the delay in getting back. Getting old is a pain.

          Here are two images

          and

          Can you see the logic?

          Phil
          Attached Files

          Comment

          • PhilOfWalton
            Recognized Expert Top Contributor
            • Mar 2016
            • 1430

            #6
            Replying to my own question, I see it now

            The szObject is the ∞ side of the relationship, and szReferencedObj ect is the 1 side.

            So I have to process the 1 sides before the ∞ sides.

            Take the example tblProjectEmplo yees which need 3 tables - tblYears, tblProjects and tblEmployees. tblYears depends on nothing, so that's not a problem. The other 2 tables have both the 1 side and the ∞ side in them, so I get back to the original question of "which order do I populate the tables? and how do I program that in?

            Phil

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #7
              Sounds like you need to recurse the relationships until you get to the bottom level nodes (all tables that that are referenced only and does not reference other tables) and then work back up.

              Comment

              • PhilOfWalton
                Recognized Expert Top Contributor
                • Mar 2016
                • 1430

                #8
                @Rabbit

                Not quite sure what you have in mind here. The only experience I have had, is recursively going through a folder to find sub folders, sub sub folders etc, but in that case we have a linear tree structure.

                In my case the twigs and branches get grafted back to the trunk or other twigs and branches. Quite incestuous.

                Can you be more specific, please

                Phil

                Comment

                • Rabbit
                  Recognized Expert MVP
                  • Jan 2007
                  • 12517

                  #9
                  In pseudocode, it would probably look something like this:

                  Code:
                  Sub RecurseRelationship(tableName)
                      arrayRelations = Get relations for tableName
                  
                      For Each item in arrayRelations
                          If item has a many on the tableName side and 1 on the otherTable
                              RecurseRelationship(otherTable)
                          End If
                      Next
                  
                      ' If it gets here, then it's found a bottom level table or a table that has no relationships
                      Call LoadTable(tableName)
                  End Sub
                  The actual code will probably have to account for situations that I didn't think of or include in the pseudocode. For example, you may need to keep an array of parsed tables so you're not thrown into some infinite loop of references.

                  Comment

                  • PhilOfWalton
                    Recognized Expert Top Contributor
                    • Mar 2016
                    • 1430

                    #10
                    Thanks, that That looks feasible.

                    To make things very simple, below is a relationship diagram of 4 tables, assume they all have primary keys and required Foreign keys.

                    Now we can see that we have to populate tables C & D first because it will not appear in the szObjects in MSysRelationshi ps:- There is only the one side of the relationship.

                    So then if we find the table pointed to by the relationship from Table D (Table A) and try to populate that, we will fail.

                    First we have to find the table pointed to by C (Table B), populate that, then find the table pointed to by C (back to A) which, providing we know that D has been populated, we can now populate Table A.

                    So your Pseudo code will probably find the bottom level (which as mentioned is easy to find directly in MSysRelationshi ps) and it might find the next level up, but as in this example, that may not be sufficient information.

                    I need to produce a query that looks something like this:-
                    Code:
                    [B]Table      Sequence[/B]
                       D          1
                       C          2
                       B          3
                       A          4
                    Although C & D could both have the same value or be reversed.

                    Any ideas please?

                    Phil
                    Attached Files

                    Comment

                    • Rabbit
                      Recognized Expert MVP
                      • Jan 2007
                      • 12517

                      #11
                      Should work fine, the code should follow down the left path or right path first, work its way back up, then go down the other path.
                      Last edited by Rabbit; Oct 13 '18, 01:56 AM.

                      Comment

                      • Rabbit
                        Recognized Expert MVP
                        • Jan 2007
                        • 12517

                        #12
                        How are you coming along?

                        Comment

                        • PhilOfWalton
                          Recognized Expert Top Contributor
                          • Mar 2016
                          • 1430

                          #13
                          Hi Rabbit.

                          It's hard going, but I think it's coming along OK

                          This is where I am so far:-
                          I can select a level, and if I have got it right, it shows the fields in a table, where the data comes from, and where it is used. Level 1 is the base line tables, and in the case of this DB there are 6 levels (ignore the 7 - now corrected)

                          In this picture I am selecting the field "Cost Account - Level 2" from the Excel table "TblExcelCo sts" as the equivalent of Field "CostDescriptio n" in the Acess Table "tblCostTyp es"

                          Other problems, but I am starting a new thread.

                          Thanks for your concern

                          Phil
                          Attached Files

                          Comment

                          Working...