How to count line number of incorrect words in a set using a dictionary

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lightning18
    New Member
    • May 2010
    • 16

    How to count line number of incorrect words in a set using a dictionary

    I have a txt file with words in it and I had to print the incorrect words into a set which I have.
    Now I need to find in which line the incorrect words are in the text file and print it as a dictionary
    for e.g. it should look like together:
    togeher 3 4 #word being the incorrect word and 3 and 4 the line number where it is located in the txt file.

    I know I need to use a line counter but dont know how to use it

    words = [] # is my txt file
    text1 # is my set of incorrect words
    i have done this so far:
    d = {} # an empty dictionary
    key = text1
    value = linecounter # dont know what to assign the value to
    for
    Last edited by Niheel; May 24 '10, 03:47 AM. Reason: punctuation, please read guidelines on how to post on bytes
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    Assume you have a function correct(word) that returns False if a word is incorrect. The following untested code would compile a dictionary of the incorrect words where the words are keys and the line numbers are contained in a list associated with the keys.
    Code:
    f = open("words.txt")
    dd = {}
    for i, word in enumerate(f):
        if not correct(word.strip()):
            dd.setdefault(word, []).append(i)
    f.close()

    Comment

    • lightning18
      New Member
      • May 2010
      • 16

      #3
      Originally posted by bvdet
      Assume you have a function correct(word) that returns False if a word is incorrect. The following untested code would compile a dictionary of the incorrect words where the words are keys and the line numbers are contained in a list associated with the keys.
      Code:
      f = open("words.txt")
      dd = {}
      for i, word in enumerate(f):
          if not correct(word.strip()):
              dd.setdefault(word, []).append(i)
      f.close()
      cheers for your reply ive done this
      Code:
      d = {}
      def correct(word):
          for i, word in enumerate(words):
              if not correct(word.strip()):
                  d.setdefault(word, []).append(i)
          print(d)
      but how would i print the word and the line number next to each
      e.g.
      loo: 5 8 # numbers being the line number
      so would i create something like
      keys = text1 # text1 being my incorrect words
      then what though
      Last edited by bvdet; May 23 '10, 01:26 PM. Reason: Add code tags

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Please use code tags when posting code.

        To print the contents of a dictionary, iterate on the dictionary and format the printing of the key and value. In this case, the value is a list of line numbers, so...
        Code:
        >>> dd = {"word1": [6,9], "word2": [5], "word3": [0,3,8]}
        >>> for key in dd:
        ... 	print "%s: %s" % (key, ", ".join([str(n) for n in dd[key]]))
        ... 	
        word1: 6, 9
        word3: 0, 3, 8
        word2: 5
        >>>

        Comment

        • lightning18
          New Member
          • May 2010
          • 16

          #5
          i have only been programming for 2 months and a bit so im not fully following you
          so is the code that i prevously posted correct ?
          and do i have to add what u said in you post by
          Code:
          for key in d:
          print ....

          Comment

          • bvdet
            Recognized Expert Specialist
            • Oct 2006
            • 2851

            #6
            Your definition of function correct() is not right. You did not understand my post that explained that you need a function or some code that decides for you if a word is correct or not. If a word is correct, do not add it to the dictionary. If a word is incorrect, add it. In pseudo code:
            Code:
            def correct(word):
                If word is a correct word, return True
                Else, word is incorrect, return False
            My post regarding printing - I used a list comprehension to create a string of the line numbers. It is equivalent to:
            Code:
            >>> dd[key]
            [5, 12, 16]
            >>> tem = []
            >>> for n in dd[key]:
            ... 	tem.append(str(n))
            ... 	
            >>> ", ".join(tem)
            '5, 12, 16'
            >>>

            Comment

            • Glenton
              Recognized Expert Contributor
              • Nov 2008
              • 391

              #7
              You don't have to do what he said, but your code is not correct. You have correct called in your definition of correct. Of course this is actually allowed, but I don't think it's what you intend, and it wouldn't work in your situation.

              What @bvdet was asking was how are you going to determine whether a word is correct or not?

              Comment

              • lightning18
                New Member
                • May 2010
                • 16

                #8
                Originally posted by Glenton
                You don't have to do what he said, but your code is not correct. You have correct called in your definition of correct. Of course this is actually allowed, but I don't think it's what you intend, and it wouldn't work in your situation.

                What @bvdet was asking was how are you going to determine whether a word is correct or not?
                i have already created a list of incorrect words by comparing the txtfile and a dictionary i used now all i want to do is count the line number for each incorrect word in the txt file

                Comment

                • lightning18
                  New Member
                  • May 2010
                  • 16

                  #9
                  i have tried another piece of code this:
                  Code:
                  from collections import defaultdict
                  d = defaultdict(list)
                  for lineno, word in enumerate(words):
                      if word in text1:
                          d[word].append(lineno)
                  print(d)
                  but it prints the incorrect word and like which place it is not the line it is located in

                  Comment

                  • Glenton
                    Recognized Expert Contributor
                    • Nov 2008
                    • 391

                    #10
                    Okay, re-reading it seems we've been missing you about the line numbers. Sorry about that.

                    Am I right in saying that your original text file has a bunch of lines each with a bunch of words, and you're trying to figure out how to figure out which line the incorrectly spelled words are in. But all you have is the words list.

                    I don't see how this is possible. There seems to be no information linking the words in the words list to the line number from the original file. So probably the best is to do this when your extracting the information from the file in the first place! Ie fiddle around with the code that you used to create the incorrect file list.

                    Regarding getting the line number, something like this will work fine:
                    Code:
                    myfile=open("file.txt")
                    for i,line in myfile:
                        print i+1,line  #i starts from 0, so if you don't want that, you need to add 1
                    myfile.close()

                    Comment

                    • lightning18
                      New Member
                      • May 2010
                      • 16

                      #11
                      Originally posted by Glenton
                      Okay, re-reading it seems we've been missing you about the line numbers. Sorry about that.

                      Am I right in saying that your original text file has a bunch of lines each with a bunch of words, and you're trying to figure out how to figure out which line the incorrectly spelled words are in. But all you have is the words list.

                      I don't see how this is possible. There seems to be no information linking the words in the words list to the line number from the original file. So probably the best is to do this when your extracting the information from the file in the first place! Ie fiddle around with the code that you used to create the incorrect file list.

                      Regarding getting the line number, something like this will work fine:
                      Code:
                      myfile=open("file.txt")
                      for i,line in myfile:
                          print i+1,line  #i starts from 0, so if you don't want that, you need to add 1
                      myfile.close()
                      this is what i have
                      # text is a list of my txt file
                      # words is a list of my incorrect words
                      i want to find the line number of the incorrect words in the txt file ?

                      Comment

                      • Glenton
                        Recognized Expert Contributor
                        • Nov 2008
                        • 391

                        #12
                        Originally posted by lightning18
                        this is what i have
                        # text is a list of my txt file
                        # words is a list of my incorrect words
                        i want to find the line number of the incorrect words in the txt file ?
                        Oh, so you're just looking for the index command.

                        Code:
                        In [5]: text="helo mum how arew you".split(" ")
                        
                        In [6]: text
                        Out[6]: ['helo', 'mum', 'how', 'arew', 'you']
                        
                        In [7]: words=["arew","helo"]
                        
                        In [8]: for w in words:
                           ...:     print w, text.index(w)
                           ...:     
                           ...:     
                        arew 3
                        helo 0
                        A quick browse through the python docs or a text book or whatever is a good idea just to get a feel for what's possible.

                        Unless I'm still not understanding what you're wanting!

                        Comment

                        • Glenton
                          Recognized Expert Contributor
                          • Nov 2008
                          • 391

                          #13
                          Oh, so maybe the word appears multiple times. Similar idea.

                          Eg this function:
                          Code:
                          def findLineNos(text,word):
                              "returns a list of all the line numbers where word appears"
                              ans=[]
                              reps = text.count(word)
                              n=0
                              for i in range(reps):
                                  ans.append(text[n:].index(word)+n)
                                  n=text[n:].index(word)+1
                              return ans
                          
                          text="helo mum how arew you helo mum how arew you".split(" ")
                          words=["arew","helo","false"]
                          
                          for w in words:
                              print w,findLineNos(text,w)
                          returns this:
                          Code:
                          arew [3,8]
                          helo [0,5]
                          false []

                          Comment

                          • lightning18
                            New Member
                            • May 2010
                            • 16

                            #14
                            cheers grenton that is pretty much what i want a set of incorrect words and the line number its located in the txtfile howver i get an error this is my code
                            the error is:
                            syntaxerror: invalid syntac
                            Code:
                            import sys
                            import string
                            
                            text = []
                            infile = open(sys.argv[1], 'r').read()
                            for punct in string.punctuation:
                                infile = infile.replace(punct, "")
                                text = infile.split()
                                
                            dict = open(sys.argv[2], 'r').read()
                            dictset = []
                            dictset = dict.split()
                                
                            words = []
                            words = list(set(text) - set(dictset))
                            words = [text.lower() for text in words]
                            words.sort()
                            
                            def findline(text, word):
                                ans = []
                                reps = text.count(word)
                                n = 0
                                for i in range(reps):
                                    ans.append(text[n:].index(word)+n)
                                    n = text[n:].index(word)+1
                                return ans
                            for w in words:
                                print(w,findline(text, w)

                            Comment

                            • Glenton
                              Recognized Expert Contributor
                              • Nov 2008
                              • 391

                              #15
                              You'll need to be more specific than that on the error code. I can't run your file cos I don't have your inputs, so I'm guessing just by reading your code.

                              However, looking at it, it seems that text is a list of words, with no line information. Changing your line 8 to
                              Code:
                              text = infile.split("\n")
                              will mean that text is a list of the lines from the text file, rather than a list of words.

                              This should make it possible.

                              Comment

                              Working...