What sort of loop?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Squiddley1957
    New Member
    • Jun 2018
    • 22

    #16
    Thanks for all your help

    Comment

    • Rabbit
      Recognized Expert MVP
      • Jan 2007
      • 12517

      #17
      I also would prefer to just use a select query and not update the original data. The query itself should run relatively quickly and it's never a bad idea to keep the original data.

      As far as the fuzzy name matching goes, we have various articles on this forum that go over different methods of approaching that. I've found soundex to be wildly inaccurate.

      Double Metaphone
      Levenshtein
      N-grams

      As far as the birthdates go, you can calculate the difference between the two. Combine that with a name matching score and you can come up with a confidence score of how closely 2 entities match.

      Comment

      • Squiddley1957
        New Member
        • Jun 2018
        • 22

        #18
        Thanks Rabbit - I've had a quick look at the first article - vely interlesting. With regard to my database, as I explained, someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained. I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size. Perhaps I will try both approaches and see what happens.

        Comment

        • Rabbit
          Recognized Expert MVP
          • Jan 2007
          • 12517

          #19
          someone who fastened on a name from the original source would search on that. That is why it is important that all versions of the name are retained.
          No one is saying you shouldn't keep the different versions of the name found in the raw data. In fact, we're saying the same thing you're saying, that you should keep the original data.

          I understand that, in principle, it's better to use a temporary field but I don't think that much is gained in time terms - not on a database of this size.
          We're not saying using a "temporary" field (calculated field is what we are referring to) will gain you time. The opposite, it's a small time cost. We're saying it maintains the fidelity of the original data. Which is what you're saying you want to keep.

          Comment

          • Squiddley1957
            New Member
            • Jun 2018
            • 22

            #20
            Hello Rabbit
            I'm conscious that this might sound ungrateful - Far from it. I get it that one wouldn't want to affect the integrity of data by removing or altering some part - but I don't see the harm in adding to it; after all data rarely comes ready set with a unique id.

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #21
              Adding to it is fine, no one is saying you can't add to the data. It's your choice whether you add to it using a query or add to it by inserting related entities into an "also known as" table like TwinnyFo suggests. All we are saying is you should avoid updating the original data if possible. Augmenting the original data is perfectly fine and expected, we are not saying you shouldn't do that. It's just that your earlier posts say you want to update the original data from one value to another value.
              Last edited by Rabbit; Dec 4 '18, 11:53 PM.

              Comment

              Working...