Matching field by 5 first characters

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Yoni Hasid
    New Member
    • Feb 2011
    • 30

    Matching field by 5 first characters

    Hello,
    I am trying to match tables into a query based on 5 first characters in one field. I used expression: =left([field name]), 5) however it does not return matched entries as I needed. Basically, I need to tell query match two fields based on first characters and and show all records from Table 1 and only those records from table 2 where first 5 letters of the fields are matching.

    Here is my SQL from the query:

    Code:
    SELECT [Master - Account by SVP Consolidated].[Scode Description], Master_Missing_Fields.Account, [Master - Account by SVP Consolidated].[Ops SVP], [Master - Account by SVP Consolidated].[AM SVP (Global)]
    FROM Master_Missing_Fields LEFT JOIN [Master - Account by SVP Consolidated] ON (Master_Missing_Fields.Account =left([Master - Account by SVP Consolidated].[Scode Description], 5));
    Last edited by NeoPa; Mar 16 '11, 01:03 AM. Reason: Added CODE tags
  • TheSmileyCoder
    Recognized Expert Moderator Top Contributor
    • Dec 2009
    • 2322

    #2
    From what I can tell the SQL looks good enough. I tried setting up a few test tables and used a join like yours without problems.

    When you say "however it does not return matched entries as I needed." what exactly do you mean? Try to be more speficic.

    Comment

    • Yoni Hasid
      New Member
      • Feb 2011
      • 30

      #3
      The issue I now have is that the fields that I am matching may contain characters that I need not only at the left but in the middle and end. I tried to use =right, and =mid expressions, but is there a better way to match one field to another by any part of the field? Let say I have :

      Catwalk to match to Cat
      And
      Cat Walk to match to Cat
      And
      Ca twalk to match to Cat?

      Makes sense?

      Comment

      • NeoPa
        Recognized Expert Moderator MVP
        • Oct 2006
        • 32645

        #4
        So, essentially you want SQL to recognise records as matched when the values in them don't really match, but a human might be able to recognise similarities between the two values. This sounds like the bones of a doctoral thesis.

        I'm sorry. I think you're out of luck on this one. You could get somewhere with specific checks to handle specific situations, but nothing as general as you stipulate I'm afraid.

        Comment

        • ADezii
          Recognized Expert Expert
          • Apr 2006
          • 8834

          #5
          @Yoni - I'm sure that I can come up with a strictly code-based solution, but I'm still a little hazy as to the exact nature of the request. Can you post some sample, accurate, data as well as what the results should be?

          Comment

          • Yoni Hasid
            New Member
            • Feb 2011
            • 30

            #6
            I suppose I was not clear. I would like to have a most closest match based on the data in fields. The match can contain 3-5 characters that can appear at the beginning or in the middle. I saw some posts on another site where people say it was possible by perhaps combining different expressions but I was not successful to implement them. Are there options to use WHERE vs Like in a query? Thanks

            Comment

            • ADezii
              Recognized Expert Expert
              • Apr 2006
              • 8834

              #7
              Again, kindly post some sample Data along with expected Results, so that your request is crystal clear.

              Comment

              • Yoni Hasid
                New Member
                • Feb 2011
                • 30

                #8
                My apologies.
                I attached both tables:

                Table 1 (SVP_Account_Na me)
                Table2 (Account_Name_I ncomplete)

                I need to match Scode Description field from Table1 with Acct Name in Table2 so it can attach appropriate Ops SVP within the query. You will notice that Account Names in Table2 vary and may appear as exact or partial match based on one of the words in a name string. I guess may be "best possible match" would be a good solution for me?

                Thank you in advance !
                Attached Files

                Comment

                • Rabbit
                  Recognized Expert MVP
                  • Jan 2007
                  • 12517

                  #9
                  Are you seriously trying to match up SCode to Acct Name? Why are these fields free text fields? You're going to have a hard time linking them up. We are talking Google level matching algorithms here.

                  I assume FacilitiesEast matches to East Facilities. But what does Facilities match to? I see two possibilities in there. And who knows what other weird matches you're going to need? I mean, I'm not even sure which one North America is supposed to match up to. It's probably Global North America but there's always the possibility the user meant NA Northeast Admin.

                  You won't be able to use a phoenetic algorithm. Your best bet is probably an n-gram technique, possibly a 2-gram at the word level. But if that's not sensitive enough then you'll need to go down to the character level.

                  Really, the solution is not a stopgap matching algorithm. You need to fix the data.

                  Comment

                  • Yoni Hasid
                    New Member
                    • Feb 2011
                    • 30

                    #10
                    Hi. Agree, the data is very "fuzzy" since we inhereted this database. I am trying to at least get closest match between the two tables and "assign" Ops SVP and still show all non-matching records as well...

                    Comment

                    • Rabbit
                      Recognized Expert MVP
                      • Jan 2007
                      • 12517

                      #11
                      What's the size of the dataset? If it's imperative to match all records, your only sure fire method is manually going through and matching it.

                      If you're just trying to match as many as possible, then there are lots of algorithms. You could use a simple one in which you check on a word by word basis. This wouldn't give many positive matches.

                      If you want as many positive matches as you can, I think your best bet is an n-gram algorithm. But beware that it's a fairly advanced algorithm and would require a lot of work to implement correctly. Unless, of course, you could find an existing library.

                      Comment

                      • ADezii
                        Recognized Expert Expert
                        • Apr 2006
                        • 8834

                        #12
                        @Yoni - What if the following Scenario comprised a Match (<==>):
                        1. [Scode Description] and [Acct Name] are equal:
                          1. Eliminations <==> Eliminations
                          2. DBank <==> Dbank
                          3. East Facilities <==> East Facilities
                          4. Global North America <==> Global North America
                        2. Greater than 50% of the Components in [Scode Description] are also contained in [Acct Name] in some manner:
                          1. East Facilities <==> FacilitiesEast (both East and Facilites are contained in FacilitiesEast (100%), but Facilities would not be a Match since only Facilities in East Facilities in contained within (50%).
                          2. Southfield Facilities ==> No Match in [Acct Name], since both Components are not contained within the [Acct Name].
                          3. Global North America <==> North America (both North and America contained in [Acct Name] 66.6%.

                        P.S. - I do realize that this approach is rather simplistic, but the only other alternative as I see it is what Rabbit pointed out, more involved Algorithms. Matches could be written to a Results Table consisting of [Scode Description], [Acct Name], and [Ops SVP].

                        Comment

                        • Yoni Hasid
                          New Member
                          • Feb 2011
                          • 30

                          #13
                          I think your second approach may work for my purpose. Would you by chance have a code that I can analyze and possibly use for this query?

                          Comment

                          • ADezii
                            Recognized Expert Expert
                            • Apr 2006
                            • 8834

                            #14
                            @Yoni - To whom are you referring to, Rabbit or ADezii, in your last Post?

                            Comment

                            • Rabbit
                              Recognized Expert MVP
                              • Jan 2007
                              • 12517

                              #15
                              I think he's talking to you ADezii. He must have mistook your #2 as an alternate approach rather than a complementary approach.

                              Yoni, since the fields are free form, what do you intend to do about mispellings, typos, abbreviations, etc?

                              Comment

                              Working...