metaphone(), levanshtein() and similar text

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • willl69

    metaphone(), levanshtein() and similar text

    Hi Guys,

    I have been writing a database search for my site, to increase the
    accuracy and chance of a successful resut i have used the metaphone() and
    similar_text() comparisons to find the database entries that contain the
    most words closely resembling the entered search criteria (only words with
    a 80%+ similarity are recorded). The value for each word over 80% is
    stored in an array, then the average worked out from that array to gauge
    the rows ranking in the search results.

    However, each row in the database searched contains, different amounts of
    words, some with many words and some with very few. This means that the
    colum with the higher amount of words has a greater chance of containing
    words that score higher than 80% of the search criteria.

    I was wondering if anybody knows a mathematical way of making this a more
    even search, or any tips how i can make this more accurate. My site
    already searches using fulltext, this is just a backup catering for
    results with similar spellings etc.

    I dont know if any of that made sense, but any input would be
    appreciated.

    Cheers

    Will

  • konsu

    #2
    Re: metaphone(), levanshtein() and similar text

    it is difficult to understand from your description what the problem is. are
    you talking about cases when a database field containing, for example, just
    two words, one of which matches the search word and the other one does not,
    not being reported in the search results? if so, why do you think it should
    be reported, it just has 50% percent of words matching the search criteria.
    what do you mean by an "even" search?

    konstantin



    "willl69" <willl69@gmail. com> wrote in message
    news:40120dd720 48f1058c4f271e2 b4a5c0c@localho st.talkaboutpro gramming.com...[color=blue]
    > Hi Guys,
    >
    > I have been writing a database search for my site, to increase the
    > accuracy and chance of a successful resut i have used the metaphone() and
    > similar_text() comparisons to find the database entries that contain the
    > most words closely resembling the entered search criteria (only words with
    > a 80%+ similarity are recorded). The value for each word over 80% is
    > stored in an array, then the average worked out from that array to gauge
    > the rows ranking in the search results.
    >
    > However, each row in the database searched contains, different amounts of
    > words, some with many words and some with very few. This means that the
    > colum with the higher amount of words has a greater chance of containing
    > words that score higher than 80% of the search criteria.
    >
    > I was wondering if anybody knows a mathematical way of making this a more
    > even search, or any tips how i can make this more accurate. My site
    > already searches using fulltext, this is just a backup catering for
    > results with similar spellings etc.
    >
    > I dont know if any of that made sense, but any input would be
    > appreciated.
    >
    > Cheers
    >
    > Will
    >[/color]


    Comment

    Working...