spell checker for python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • vino7
    New Member
    • May 2010
    • 2

    spell checker for python

    hey guys, im having some trouble with this. I've got 2 text files, one in a paragraph with some wrong words, and a dictionary file with 1 word in 1 line. Im not sure how to separate the words in the paragraph to become 1 word to 1 line so that i can compare it to the dictionary file. ive put the codes up but if anyone finds anything else wrong with the code please let me know. thanks


    import sys

    # setText holds the words from the sample text file
    setText = set()
    # setProperDic holds the words from the dictionary text file
    setProperDic = set()
    # setWrong holds the incorrect words after comparing each set
    setWrong = set()

    fText = open(sys.argv[1], 'r')
    line = fText.readline( )
    while line != '':
    setText.add(lin e)
    line = fText.readline( )
    fText.close()

    fProperDic = open(sys.argv[2], 'r')
    line = fProperDic.read line()
    while line != '':
    setProperDic.ad d(line)
    line = fProperDic.read line()
    fProperDic.clos e()

    # Find the values in setText
    # that don't exist in setDict
    setWrong = setText - setProperDic

    # Output each entry seperately
    # in alphabetical order
    for x in sorted(setWrong ):
    print(x)
  • dwblas
    Recognized Expert Contributor
    • May 2008
    • 626

    #2
    You should strip the newline character(s) from the file words. You can also read the file one record at one time and convert into a set as in the following code, although there is nothing wrong with the way you are doing it.
    Code:
    fText = open(sys.argv[1], 'r')
    for rec in fText:
        rec = rec.strip()
        setText.add(rec)

    Comment

    • vino7
      New Member
      • May 2010
      • 2

      #3
      ooh thanks, but see the problem i have is that the text that im comparing with the dictionary is in a paragraph so i need to split the words so that it becomes 1 word in 1 line. i think i should use split.lines but i dont know where that code should go

      Comment

      • Glenton
        Recognized Expert Contributor
        • Nov 2008
        • 391

        #4
        Originally posted by vino7
        hey guys, im having some trouble with this. I've got 2 text files, one in a paragraph with some wrong words, and a dictionary file with 1 word in 1 line. Im not sure how to separate the words in the paragraph to become 1 word to 1 line so that i can compare it to the dictionary file. ive put the codes up but if anyone finds anything else wrong with the code please let me know. thanks


        import sys

        # setText holds the words from the sample text file
        setText = set()
        # setProperDic holds the words from the dictionary text file
        setProperDic = set()
        # setWrong holds the incorrect words after comparing each set
        setWrong = set()

        fText = open(sys.argv[1], 'r')
        line = fText.readline( )
        while line != '':
        setText.add(lin e)
        line = fText.readline( )
        fText.close()

        fProperDic = open(sys.argv[2], 'r')
        line = fProperDic.read line()
        while line != '':
        setProperDic.ad d(line)
        line = fProperDic.read line()
        fProperDic.clos e()

        # Find the values in setText
        # that don't exist in setDict
        setWrong = setText - setProperDic

        # Output each entry seperately
        # in alphabetical order
        for x in sorted(setWrong ):
        print(x)
        Can you please used code tags when posting code. It's hard to read otherwise.

        But it seems the issue is here:
        Code:
        fText = open(sys.argv[1], 'r')
        line = fText.readline()
        while line != '':
            setText.add(line)
            line = fText.readline()
            fText.close()
        Firstly, there's a far more efficient way to read files! And, assuming that each line has multiple words, you need to split each line. When debugging this kind of thing (which is a big part of programming), it's often helpful to include a print statement in your code so you can see what the variables are as the code is running, which will tell you straight away if it's doing what you want/expect.

        For now I'll suppose that there's no punctuation, just words separated by spaces in each line.

        Code:
        fText = open(sys.argv[1], 'r')
        for line in fText:  #note you can treat the file object as an iterable!
            setText.update(line.strip().split(" "))  #see interactive session below about fText.close()
        The key line makes use of three useful methods:
        - "update" is a set method and it adds all the elements of an iterable to a set (obviously, if there's a repetition only one gets added).
        - "strip" is a string method that gets rid of white space (tabs, returns, spaces etc) at the beginning and end of a string
        - "split" is a string method that returns a list of strings separated by the specified separator.

        The below interactive session should help clarify:
        Code:
        In [22]: t="  hello hello hello mum how are you\n"
        
        In [23]: t
        Out[23]: '  hello hello hello mum how are you\n'
        
        In [24]: t.strip()
        Out[24]: 'hello hello hello mum how are you'
        
        In [25]: t.split(" ")
        Out[25]: ['', '', 'hello', 'hello', 'hello', 'mum', 'how', 'are', 'you\n']
        
        In [26]: t.strip().split(" ")
        Out[26]: ['hello', 'hello', 'hello', 'mum', 'how', 'are', 'you']
        
        In [27]: s=set()
        
        In [28]: s.update(t)
        
        In [29]: s
        Out[29]: set(['\n', ' ', 'a', 'e', 'h', 'l', 'm', 'o', 'r', 'u', 'w', 'y'])
        
        In [30]: s=set()
        
        In [31]: s.update(t.strip().split(" "))
        
        In [32]: s
        Out[32]: set(['are', 'hello', 'how', 'mum', 'you'])

        Comment

        Working...