computing uni-gram and bigram probability using python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • thistle
    New Member
    • Mar 2015
    • 9

    computing uni-gram and bigram probability using python

    I have 2 files. I have to calculate the monogram (uni-gram) and at the next step calculate bi-gram probability of the first file in terms of the words repetition of the second file. (the files are text files). for this, first I have to write a function that calculates the number of total words and unique words of the file, because the monogram is calculated by the division of unique word to the total word for each word. and at last write it to a new file. The code I wrote(it's just for computing uni-gram) doesn't work. how can I change it to work correctly? and how can I calculate bi-grams probability?

    Code:
    def CalculateMonoGram (file1, file2):
        with open (file1, encoding="utf_8") as f1:
            counts={}
            s1=f1.read()
            x1=s1.split()
            for word in x1:
                counts[word]=counts.get(word,0)+1
    
            total=sum(counts.values())            
    
        with open (file2, encoding="utf_8") as f2:
            s2=f2.read()
            x2=s2.split()
    
    
        monogram=[]
        for item in x2:
            monogram[item]=counts(item)/total
    
    
        with open ("LexiconMonogram.txt", "w", encoding="utf_8") as f3:
            f3.write(monogram)
    Last edited by bvdet; May 18 '15, 05:33 PM. Reason: Please use code tags when posting code [code]....[/code]
Working...