Algorithm used by difflib.get_close_match

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Guillermo

    Algorithm used by difflib.get_close_match


    Hi all,

    Does anyone know whether this function uses edit distance? If not,
    which algorithm is it using?

    Regards,

    Guillermo
  • Jon Clements

    #2
    Re: Algorithm used by difflib.get_clo se_match

    On Sep 2, 2:17 pm, Guillermo <guillermo.lis. ..@googlemail.c omwrote:
    Hi all,
    >
    Does anyone know whether this function uses edit distance? If not,
    which algorithm is it using?
    >
    Regards,
    >
    Guillermo
    help(difflib.ge t_close_matches ) will give you your first clue...

    Comment

    • Wojtek Walczak

      #3
      Re: Algorithm used by difflib.get_clo se_match

      On Tue, 2 Sep 2008 06:17:37 -0700 (PDT), Guillermo wrote:
      Does anyone know whether this function uses edit distance? If not,
      which algorithm is it using?
      The following passage comes from difflib.py:

      SequenceMatcher is a flexible class for comparing pairs of sequences of
      any type, so long as the sequence elements are hashable. The basic
      algorithm predates, and is a little fancier than, an algorithm
      published in the late 1980's by Ratcliff and Obershelp under the
      hyperbolic name "gestalt pattern matching". The basic idea is to find
      the longest contiguous matching subsequence that contains no "junk"
      elements (R-O doesn't address junk). The same idea is then applied
      recursively to the pieces of the sequences to the left and to the right
      of the matching subsequence. This does not yield minimal edit
      sequences, but does tend to yield matches that "look right" to
      people.

      HTH.

      --
      Regards,
      Wojtek Walczak,

      Comment

      Working...