I have 2 files. I have to calculate the monogram (uni-gram) and at the next step calculate bi-gram probability of the first file in terms of the words repetition of the second file. (the files are text files). for this, first I have to write a function that calculates the number of total words and unique words of the file, because the monogram is calculated by the division of unique word to the total word for each word. and at last write it to a new file. The code I wrote(it's just for computing uni-gram) doesn't work. how can I change it to work correctly? and how can I calculate bi-grams probability?
Code:
def CalculateMonoGram (file1, file2):
with open (file1, encoding="utf_8") as f1:
counts={}
s1=f1.read()
x1=s1.split()
for word in x1:
counts[word]=counts.get(word,0)+1
total=sum(counts.values())
with open (file2, encoding="utf_8") as f2:
s2=f2.read()
x2=s2.split()
monogram=[]
for item in x2:
monogram[item]=counts(item)/total
with open ("LexiconMonogram.txt", "w", encoding="utf_8") as f3:
f3.write(monogram)