Match data using first 5 characters

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Luvyourbag
    New Member
    • Mar 2015
    • 5

    Match data using first 5 characters

    I have two separate tables in access and I need to find providers whose address is the same in both tables. Because the address may be spelled slightly different between the two tables I would like to match on the first 5 characters. how can I perform a query that returns results for providers who have the same license and an address where the first 5 characters match?
  • Stewart Ross
    Recognized Expert Moderator Specialist
    • Feb 2008
    • 2545

    #2
    You can use the Left function to return the first five characters of a string (e.g.
    Left([YourField], 5)) but there are better ways to implement string matching, including the use of Soundex functions amongst other things. Try the use of Left at present, as all the other ways to do so are advanced and require the use of custom VBA functions which will be somewhat beyond your experience at present.

    -Stewart

    Comment

    • jforbes
      Recognized Expert Top Contributor
      • Aug 2014
      • 1107

      #3
      You might also find this interesting: http://bytes.com/topic/access/insigh...tring-matching

      Comment

      • Luvyourbag
        New Member
        • Mar 2015
        • 5

        #4
        Thank you for the replies. I've used the left string function but it gives me address which are not the same. I've tried joining the addresses in both tables and added a left string in my query and its omitting some addresses.

        Comment

        • Rabbit
          Recognized Expert MVP
          • Jan 2007
          • 12517

          #5
          Please show us your code along with a sample of data that isn't matching that should and data that is matching that shouldn't.

          Comment

          • Luvyourbag
            New Member
            • Mar 2015
            • 5

            #6
            Below is the coding I'm using. I'M querying on a name as an example. I should get one match, which I do not, if I remove the join on the address I get one match. Hope this makes sense.

            Code:
            Select distinct Isg_br_info.hcid, left [all pcps].[address],5 AS address1, into isg_Pcpmatch
            From isg_mbr_pcp_info inner join [all pcps] on (isg_pcp_info.license = [all pcps].[license] and (isg_mbr_pcp_info.address = [all pcps ].address)
            Where ((isg_mbr_pcp_info.hcid)=jones) and (off ([isg_mbr_pcp_info].[pcpid]=[all pcps].[pcpid], "yes","no"))="no";
            Last edited by Rabbit; Mar 13 '15, 06:13 PM. Reason: Please use [code] and [/code] tags when posting code or formatted data.

            Comment

            • Luvyourbag
              New Member
              • Mar 2015
              • 5

              #7
              I just figured out that the query is running correctly. Thank you for taking the time to address my question.

              Comment

              • Rabbit
                Recognized Expert MVP
                • Jan 2007
                • 12517

                #8
                One thing you should be wary of is that matching only on the first 5 characters can result in false positives.

                Comment

                • Luvyourbag
                  New Member
                  • Mar 2015
                  • 5

                  #9
                  You're right, I just saw that. I thought I had figured it out but I'm still running into issues. If I query on an ID I get the results I'm expecting but when I run a query and join it by address and use the left string it doesn't show the results even though the address in both tables is the same. Any suggestions?

                  Comment

                  • Rabbit
                    Recognized Expert MVP
                    • Jan 2007
                    • 12517

                    #10
                    Whenever you try to match non-equal data using a fuzzy match, you will always run the risk of false positives. There's no way around that, that is the nature of fuzzy matching.

                    The question is what rate of false positives are you comfortable with? And how do you want to handle those that have multiple hits?

                    That will dictate what fuzzy matching algorithm you use along with complexity of the algorithm you want to use and its performance cost.

                    The levenshtein algorithm linked above is an example of a much more complex and accurate fuzzy matching algorithm that allows you to move the target around until you find a reasonable false positive rate. But the drawback of such power comes at the cost of performance. It's a costly algorithm to run.

                    The soundex algorithm mentioned above often results in more false positives but is much less costly. It is a middle ground between the simple left 5 match and the complex levenshtein match.

                    These are only 2 of the many algorithms available. More info can be found here along with a listing of some algorithms: http://en.wikipedia.org/wiki/Approxi...tring_matching

                    As far as handling conflicting matches, that requires human intervention. Often, a report is created when there can be multiple hits and a human decides which one is the correct match.

                    Comment

                    Working...