Hey guys,
I am comparing two documents - if a word is in both documents, it gets added as a new key to a dictionary.
As the dictionary value I would like to store the documents name and the line# the word was found on.
Here is what I have so far with comments:
I am comparing two documents - if a word is in both documents, it gets added as a new key to a dictionary.
As the dictionary value I would like to store the documents name and the line# the word was found on.
Here is what I have so far with comments:
Code:
dic = {} def matchtermer(): f3 = open('korpus/avis.txt') f4 = open("ordliste_output_kort.txt") text3 = f3.read() text4 = f4.read() ordliste2 = text3.split() ordliste3 = text4.split() wordlist2 = [] for word1 in ordliste2: #this part removes end characters that aren't part of the word and makes all lowercase # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordlist2.append(word2.lower()) for word in wordlist2: # and finally this compares the two documents if word in ordliste3: if word not in dic.keys(): dic[word]=[] #if word not in dic, create it #dic[word].append(docname, linenumber) - this is what I want to do - obviously this does not work return dic
Comment