spell checker for python

**dwblas** · May 20 '10, 04:38 PM

You should strip the newline character(s) from the file words. You can also read the file one record at one time and convert into a set as in the following code, although there is nothing wrong with the way you are doing it.

Code:

fText = open(sys.argv[1], 'r')
for rec in fText:
    rec = rec.strip()
    setText.add(rec)

**vino7** · May 20 '10, 11:48 PM

ooh thanks, but see the problem i have is that the text that im comparing with the dictionary is in a paragraph so i need to split the words so that it becomes 1 word in 1 line. i think i should use split.lines but i dont know where that code should go

**Glenton** · May 21 '10, 03:12 AM

Originally posted by vino7

hey guys, im having some trouble with this. I've got 2 text files, one in a paragraph with some wrong words, and a dictionary file with 1 word in 1 line. Im not sure how to separate the words in the paragraph to become 1 word to 1 line so that i can compare it to the dictionary file. ive put the codes up but if anyone finds anything else wrong with the code please let me know. thanks

import sys

# setText holds the words from the sample text file
setText = set()
# setProperDic holds the words from the dictionary text file
setProperDic = set()
# setWrong holds the incorrect words after comparing each set
setWrong = set()

fText = open(sys.argv[1], 'r')
line = fText.readline( )
while line != '':
setText.add(lin e)
line = fText.readline( )
fText.close()

fProperDic = open(sys.argv[2], 'r')
line = fProperDic.read line()
while line != '':
setProperDic.ad d(line)
line = fProperDic.read line()
fProperDic.clos e()

# Find the values in setText
# that don't exist in setDict
setWrong = setText - setProperDic

# Output each entry seperately
# in alphabetical order
for x in sorted(setWrong ):
print(x)

Can you please used code tags when posting code. It's hard to read otherwise.

But it seems the issue is here:

Code:

fText = open(sys.argv[1], 'r')
line = fText.readline()
while line != '':
    setText.add(line)
    line = fText.readline()
    fText.close()

Firstly, there's a far more efficient way to read files! And, assuming that each line has multiple words, you need to split each line. When debugging this kind of thing (which is a big part of programming), it's often helpful to include a print statement in your code so you can see what the variables are as the code is running, which will tell you straight away if it's doing what you want/expect.

For now I'll suppose that there's no punctuation, just words separated by spaces in each line.

Code:

fText = open(sys.argv[1], 'r')
for line in fText:  #note you can treat the file object as an iterable!
    setText.update(line.strip().split(" "))  #see interactive session below about fText.close()

The key line makes use of three useful methods:
- "update" is a set method and it adds all the elements of an iterable to a set (obviously, if there's a repetition only one gets added).
- "strip" is a string method that gets rid of white space (tabs, returns, spaces etc) at the beginning and end of a string
- "split" is a string method that returns a list of strings separated by the specified separator.

The below interactive session should help clarify:

Code:

In [22]: t="  hello hello hello mum how are you\n"

In [23]: t
Out[23]: '  hello hello hello mum how are you\n'

In [24]: t.strip()
Out[24]: 'hello hello hello mum how are you'

In [25]: t.split(" ")
Out[25]: ['', '', 'hello', 'hello', 'hello', 'mum', 'how', 'are', 'you\n']

In [26]: t.strip().split(" ")
Out[26]: ['hello', 'hello', 'hello', 'mum', 'how', 'are', 'you']

In [27]: s=set()

In [28]: s.update(t)

In [29]: s
Out[29]: set(['\n', ' ', 'a', 'e', 'h', 'l', 'm', 'o', 'r', 'u', 'w', 'y'])

In [30]: s=set()

In [31]: s.update(t.strip().split(" "))

In [32]: s
Out[32]: set(['are', 'hello', 'how', 'mum', 'you'])

spell checker for python

spell checker for python

Comment

Comment

Comment