Merge Records From Multiple Sources

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Steve

    Merge Records From Multiple Sources

    I have a people table of about 25 fields. The table is initially created each
    year from 5 different sources. The records from each source are appended to the
    people table. Any person may be in one or multiple sources. For those in
    multiple sources, the data for a person may not be complete in any source. For
    example, a person is in three sources; source#2 has the SSN and source#3 has the
    membership#. For those people who are in multiple sources, I need to merge the
    records of the person into one record that contains all the data for that person
    that is available in all the sources that contain that person. From the previous
    example, I need to add the membership# from source#3 to the record with the SSN#
    in source#2.

    I'm looking for some suggestions on a startegy for merging records.

    Thanks!

    Steve


  • John Vinson

    #2
    Re: Merge Records From Multiple Sources

    On Thu, 05 Feb 2004 01:40:23 GMT, "Steve" <santus@penn.co m> wrote:
    [color=blue]
    >I have a people table of about 25 fields. The table is initially created each
    >year from 5 different sources. The records from each source are appended to the
    >people table. Any person may be in one or multiple sources. For those in
    >multiple sources, the data for a person may not be complete in any source. For
    >example, a person is in three sources; source#2 has the SSN and source#3 has the
    >membership#. For those people who are in multiple sources, I need to merge the
    >records of the person into one record that contains all the data for that person
    >that is available in all the sources that contain that person. From the previous
    >example, I need to add the membership# from source#3 to the record with the SSN#
    >in source#2.
    >
    >I'm looking for some suggestions on a startegy for merging records.[/color]

    Update queries on joined tables will do it. Create a Query joining the
    two tables; put a criterion of IS NULL on the SSN in the table you
    want to update with the SSN; and update to

    [othertable].[SSN]

    or similarly for the other fields.

    The problem may be - this presumes that you have some field that you
    can count on to uniquely identify a person's record in all five
    tables! Do you have such a field? I would *not* recommend using names
    to do this; you could have two people who happen to have the same name
    (which Steve Jones do you mean, the janitor or the CFO?); or you could
    have a name in one table as "Steve Jones" and in the other as "Stephen
    Jones".

    John W. Vinson[MVP]
    Come for live chats every Tuesday and Thursday

    Comment

    • Steve

      #3
      Re: Merge Records From Multiple Sources

      John,

      Thanks for responding!

      My example over-simplified the problem. Suppose a person is in 4 sources. The
      record I want is:
      FName, LName, SSN, MembershipNum, SchoolDistrict, Position, City, State, Zip

      The available data in the sources are:
      #1: FName, LName, SSN, City, State, Zip
      #2: FName, LName, MembershipNum, City, State, Zip
      #3: FName, LName, SchoolDistrict, City, State, Zip
      #4: FName, LName, Position, City, State, Zip

      Steve


      "John Vinson" <jvinson@STOP_S PAM.WysardOfInf o.com> wrote in message
      news:56e3201837 fiscabjt2i4i9ot j1ks5i9pm@4ax.c om...[color=blue]
      > On Thu, 05 Feb 2004 01:40:23 GMT, "Steve" <santus@penn.co m> wrote:
      >[color=green]
      > >I have a people table of about 25 fields. The table is initially created each
      > >year from 5 different sources. The records from each source are appended to[/color][/color]
      the[color=blue][color=green]
      > >people table. Any person may be in one or multiple sources. For those in
      > >multiple sources, the data for a person may not be complete in any source.[/color][/color]
      For[color=blue][color=green]
      > >example, a person is in three sources; source#2 has the SSN and source#3 has[/color][/color]
      the[color=blue][color=green]
      > >membership#. For those people who are in multiple sources, I need to merge[/color][/color]
      the[color=blue][color=green]
      > >records of the person into one record that contains all the data for that[/color][/color]
      person[color=blue][color=green]
      > >that is available in all the sources that contain that person. From the[/color][/color]
      previous[color=blue][color=green]
      > >example, I need to add the membership# from source#3 to the record with the[/color][/color]
      SSN#[color=blue][color=green]
      > >in source#2.
      > >
      > >I'm looking for some suggestions on a startegy for merging records.[/color]
      >
      > Update queries on joined tables will do it. Create a Query joining the
      > two tables; put a criterion of IS NULL on the SSN in the table you
      > want to update with the SSN; and update to
      >
      > [othertable].[SSN]
      >
      > or similarly for the other fields.
      >
      > The problem may be - this presumes that you have some field that you
      > can count on to uniquely identify a person's record in all five
      > tables! Do you have such a field? I would *not* recommend using names
      > to do this; you could have two people who happen to have the same name
      > (which Steve Jones do you mean, the janitor or the CFO?); or you could
      > have a name in one table as "Steve Jones" and in the other as "Stephen
      > Jones".
      >
      > John W. Vinson[MVP]
      > Come for live chats every Tuesday and Thursday
      > http://go.compuserve.com/msdevapps?loc=us&access=public[/color]


      Comment

      • John Vinson

        #4
        Re: Merge Records From Multiple Sources

        On Thu, 05 Feb 2004 03:57:38 GMT, "Steve" <santus@penn.co m> wrote:
        [color=blue]
        >John,
        >
        >Thanks for responding!
        >
        >My example over-simplified the problem. Suppose a person is in 4 sources. The
        >record I want is:
        >FName, LName, SSN, MembershipNum, SchoolDistrict, Position, City, State, Zip[/color]

        Ok, let's call this MasterTable... what's its Primary Key?
        MembershipNum? If it has none... it's not a table, it's a random
        collection of data which cannot be updated.
        [color=blue]
        >The available data in the sources are:
        >#1: FName, LName, SSN, City, State, Zip
        >#2: FName, LName, MembershipNum, City, State, Zip
        >#3: FName, LName, SchoolDistrict, City, State, Zip
        >#4: FName, LName, Position, City, State, Zip[/color]

        Ok. Some painful questions here:

        Table1 has a record for Fred Smith, SSN 445-44-4444, in Poughkeepsie;
        and also a record for Fred Smith, SSN 454-44-4444, also in
        Poughkeepsie.

        Are they the same person? Is either one the same person as the Fred
        Smith, SSN unknown, in Schenectady in the master table? If so, which
        one? Or maybe it's the Frederick Smythe, SSN unknown, in Poughkeepsie

        You MUST - NO OPTION!!! - have some way of uniquely identifying which
        record is which. Names *are not such a method*. They're not stable,
        they're not unique, and they're not reliable.

        ASSUMING... and it's a heck of a big assumption - that the names are
        reliable in this case, what do you want to do if Table1 has one
        city/state/zip and Table2 a different City/State/Zip, and MasterTable
        hase these blank? Maybe the person moved; which address should Access
        use?

        ASSUMING... even bigger assumption... that every record has FName,
        LName, City, State and ZIP in all five tables, reliably and
        accurately, you can construct five update queries. Create a unique
        Index on these five fields in MasterTable. Create a Query joining
        MasterTable to Table1 on the five fields and update MasterTable.SSN to

        [Table1].[SSN]

        Run the query by clicking the ! icon.

        If you'll be doing this kind of exercise in the future, you may want
        to look into a technique called "Replicatio n". It's not for the faint
        of heart, but it's designed so that you can make five replicas of a
        database on five laptops, have people indpendently update all five,
        and synchronize the data back into one table. See
        http://www.trigeminal.com and search for Michka's Replication FAQ
        there, it's the definitive word on the subject.

        John W. Vinson[MVP]
        Come for live chats every Tuesday and Thursday

        Comment

        Working...