Spell Checker

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Ganon11
    Recognized Expert Specialist
    • Oct 2006
    • 3651

    Spell Checker

    Hey all,

    So this was an assignment in my Data Structures and Algorithms class - we were supposed to make a C++ implementation of the spell checker described here. A Java version was also provided, which I pretty much stole line for line (translating as necessary, of course). I got the program working...most of the time. I'll post the code below.

    Basically, my program reads in the big.txt file, and puts all the valid words into a map (key is string, value is int). If the word is already in the map, it increments the int (so that the int reflects how often the word appeared).

    After the file has been read, my driver program asks the user for a misspelled word. Once it has this word, the driver program calls the correct method. This method checks, in this order: (1) If the word is found in the map (in which case it is valid, and we don't need to do anything else), (2) If a word within edit distance 1 (that is, adding 1 letter, deleting one letter, switching two letters, or altering one letter) is in the map (in which case it is a valid substitution, and is returned), and (3) If a word within edit distance 2 is found in the map (in which case it is a valid substitution, and is returned).

    Here's the code:

    [CODE=cpp]#include <iostream>
    #include <vector>
    #include <map>
    #include <fstream>
    #include <cctype>
    using namespace std;

    class SpellCheck {
    public:
    SpellCheck(stri ng filename) {
    ifstream in(filename.c_s tr());
    string temp;
    while (in) {
    getline(in, temp);
    string word;
    while (temp.length() > 0) {
    if (nextWord(temp, word)) {
    map<string, int>::iterator itr = nWords.find(wor d);
    if (itr == nWords.end()) {
    nWords.insert(p air<string, int>(word, 1));
    } else {
    (*itr).second++ ;
    }
    }
    }
    }
    in.close();
    }

    string correct(string word) {
    if (nWords.find(wo rd) != nWords.end()) return word;
    vector<string> edit1 = edits(word);
    map<int, string> candidates;
    for (int i = 0; i < edit1.size(); i++) {
    map<string, int>::iterator itr = nWords.find(edi t1[i]);
    if (itr != nWords.end()) {
    pair<string, int> myPair = *itr;
    candidates.inse rt(pair<int, string>(myPair. second, myPair.first));
    }
    }
    if (candidates.siz e() > 0) {
    map<int, string>::iterat or itr = candidates.begi n();
    pair<int, string> max = *itr;
    itr++;
    while (itr != candidates.end( ))
    if ((*itr).first > max.first) max = *itr;
    return max.second;
    }
    candidates.clea r();
    for (int i = 0; i < edit1.size(); i++) {
    vector<string> edit2 = edits(edit1[i]);
    for (int i = 0; i < edit2.size(); i++) {
    map<string, int>::iterator itr = nWords.find(edi t2[i]);
    if (itr != nWords.end()) {
    pair<string, int> myPair = *itr;
    candidates.inse rt(pair<int, string>(myPair. second, myPair.first));
    }
    }
    }
    if (candidates.siz e() > 0) {
    map<int, string>::iterat or itr = candidates.begi n();
    pair<int, string> max = *itr;
    itr++;
    while (itr != candidates.end( ))
    if ((*itr).first > max.first) max = *itr;
    return max.second;
    }
    return word;
    }

    private:
    vector<string> edits(string word) {
    vector<string> result;
    string temp;
    for (int i = 0; i < word.length(); i++) {
    temp = word.substr(0, i) + word.substr(i+1 );
    result.push_bac k(temp);
    }
    for (int i = 0; i < word.length() - 1; i++) {
    temp = word.substr(0, i) + word[i+1] + word[i] + word.substr(i+2 );
    result.push_bac k(temp);
    }
    for (int i = 0; i < word.length(); i++) {
    for (char c = 'a'; c <= 'z'; c++) {
    temp = word.substr(0, i) + c + word.substr(i+1 );
    result.push_bac k(temp);
    }
    }
    for (int i = 0; i < word.length() + 1; i++) {
    for (char c = 'a'; c <= 'z'; c++) {
    temp = word.substr(0, i) + c + word.substr(i);
    result.push_bac k(temp);
    }
    }
    return result;
    }

    bool nextWord(string & sentence, string& word) {
    while (true) {
    int pos = sentence.find(" ");

    if (pos != string::npos) {
    word = sentence.substr (0, pos);

    sentence = sentence.substr (pos);
    while (sentence.lengt h() > 0 && !isalpha(senten ce[0])) {
    sentence = sentence.substr (1);
    }
    bool isWord = true;
    for (int i = 0; i < word.length(); i++) {
    if (isalpha(word[i])) {
    word[i] = tolower(word[i]);
    } else if (ispunct(word[i])) {
    word = word.substr(0, i) + word.substr(i+1 );
    } else isWord = false;
    }
    if (!isWord) {
    continue;
    }
    return true;
    } else {
    word = sentence;
    sentence = "";
    bool isWord = true;
    for (int i = 0; i < word.length(); i++) {
    if (isalpha(word[i])) word[i] = tolower(word[i]);
    else isWord = false;
    }
    if (!isWord) {
    return false;
    }
    return true;
    }
    }
    }

    map<string, int> nWords;
    };[/CODE]

    And the driver program:

    [CODE=cpp]#include <iostream>
    #include "SpellCheck .h"
    using namespace std;

    int main() {
    string filename = "big.txt";
    SpellCheck checker(filenam e);

    char choice = 'y';
    while (tolower(choice ) == 'y') {
    cout << "Enter a word to check: ";
    string word;
    cin >> word;
    cout << "Did you mean " << checker.correct (word) << "?" << endl;
    cout << endl;
    cout << "Do you wish to continue (Y/N)? ";
    cin >> choice;
    }
    return 0;
    }[/CODE]

    Here are a few sample runs:

    Code:
    $ SCTest
    Enter a word to check: speling
    Did you mean spelling?
    
    Do you wish to continue (Y/N)? y
    Enter a word to check: spleling
    Did you mean spelling?
    
    Do you wish to continue (Y/N)? y
    Enter a word to check: monkee
    Did you mean monkey?
    
    Do you wish to continue (Y/N)? y
    Enter a word to check: freedmo
    Did you mean freedom?
    
    Do you wish to continue (Y/N)? n
    So, as you can see, it's working correctly for several words. I have also tried the word "acommodate " (incorrect spelling of accommodate), and it works correctly (this error was within edit distance 2). However, when I input "teh" and "wierd" (for "the" and "weird", respectively), my program freezes.

    I've already gotten full credit for this assignment (and, as a matter of fact, so did everyone else, even though most people didn't even get it half-working like mine), so there's no grade on the line (and more importantly, no cheating involved) - but does anyone know why it's hanging on these specific words?
    Last edited by Ganon11; Oct 17 '07, 11:15 PM. Reason: Removing some debugging cout statements
  • Ganon11
    Recognized Expert Specialist
    • Oct 2006
    • 3651

    #2
    I found and fixed 2 errors:

    1) Inside the correct() method, I used a set of nested for loops...but tried to use i for both. Whoops. Made it a j.
    2) The word accepted to correct() was not necessarily all lowercase, so I added a method to make it all lowercase before continuing.

    It still hangs for certain words.

    Comment

    • priyajain
      New Member
      • Mar 2016
      • 1

      #3
      Originally posted by Ganon11
      I found and fixed 2 errors:

      1) Inside the correct() method, I used a set of nested for loops...but tried to use i for both. Whoops. Made it a j.
      2) The word accepted to correct() was not necessarily all lowercase, so I added a method to make it all lowercase before continuing.

      It still hangs for certain words.

      what is your header file "spellcheck .h" work

      Comment

      Working...