Q: Algorithm/Solution for finding words in non delimited string

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • brock@gunter-smith.com

    Q: Algorithm/Solution for finding words in non delimited string

    I'd like to be able to take a string and search within it for all words
    (of the longest length possible) that are possibly contained within it
    (in sequence, we're not re-ordering the letters in the string).
    Obviously the brute force approach (which may be the only solution) is
    to iterate through a dictionary file searching for occurances of each
    entry within the string.

    If anyone has done anything similar to this, were there any other
    methods used to reduce the number of iterations required like using a
    list of common words that are not generally elements of other words
    that can be quickly broken out from the string? Or are there libraries
    that may be of use in efficiently processing this type of search?

    e.g. given the string "themeether ", possible solutions might be
    {'the','meet',' her'} or {'theme','ether '}

  • Carl Vondrick

    #2
    Re: Q: Algorithm/Solution for finding words in non delimited string

    brock@gunter-smith.com wrote:
    I'd like to be able to take a string and search within it for all words
    (of the longest length possible) that are possibly contained within it
    (in sequence, we're not re-ordering the letters in the string).
    Obviously the brute force approach (which may be the only solution) is
    to iterate through a dictionary file searching for occurances of each
    entry within the string.
    It sounds like you are after a LCS (Longest Common Subsequence)
    implementation. Just google for "longest common subsequence" and you'll
    get a thousand ways to do it. Wikipedia has one that seems to work
    well: http://en.wikipedia.org/wiki/Longest...quence_problem

    LCS is used in diff algorithms.
    If anyone has done anything similar to this, were there any other
    methods used to reduce the number of iterations required like using a
    list of common words that are not generally elements of other words
    that can be quickly broken out from the string? Or are there libraries
    that may be of use in efficiently processing this type of search?
    >
    e.g. given the string "themeether ", possible solutions might be
    {'the','meet',' her'} or {'theme','ether '}
    >

    Comment

    • Chung Leong

      #3
      Re: Q: Algorithm/Solution for finding words in non delimited string

      brock@gunter-smith.com wrote:
      I'd like to be able to take a string and search within it for all words
      (of the longest length possible) that are possibly contained within it
      (in sequence, we're not re-ordering the letters in the string).
      Obviously the brute force approach (which may be the only solution) is
      to iterate through a dictionary file searching for occurances of each
      entry within the string.
      >
      If anyone has done anything similar to this, were there any other
      methods used to reduce the number of iterations required like using a
      list of common words that are not generally elements of other words
      that can be quickly broken out from the string? Or are there libraries
      that may be of use in efficiently processing this type of search?
      >
      e.g. given the string "themeether ", possible solutions might be
      {'the','meet',' her'} or {'theme','ether '}
      That's very similiar to the Thai word-breaking problem. Is that what
      you're trying to do, in fact? There's a link listing some of the
      approaches:



      Comment

      Working...