[Python] Read .txt file and analayze

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • thaking
    New Member
    • Dec 2010
    • 1

    [Python] Read .txt file and analayze

    Hello all;

    I'm working huffman coding of any .txt file, so first I need to analyse this text file. I need to read it, then analyse.
    I need "exit" like table:
    *************** *************
    letter frequency(how many times same latter repeated) Huffman code(this will come later)

    *************** *********

    I started with:
    Code:
    f = open('test.txt', 'r')    #open test.tx
        for lines in f:
            print lines          #to ensure if all work...
    Can anyone help me?
  • Sean Pedersen
    New Member
    • Dec 2010
    • 30

    #2
    Code:
    def countLetters(line, letter):
        ret = 0
        for character in line:
            if character == letter: ret += 1
        return ret
    
    for line in open("file.txt"):
        line = line.strip()
        print line
        print "Has", countLetters(line, "a"), "of letter a."
        print "Has", countLetters(line, "e"), "of letter e."
        print "Has", countLetters(line, "i"), "of letter i."
        print "Has", countLetters(line, "o"), "of letter o."
        print "Has", countLetters(line, "u"), "of letter u."
        print "Has", countLetters(line, "y"), "of letter y."
        print

    Comment

    • twohot
      New Member
      • Dec 2010
      • 8

      #3
      Modifying Sean's Code:

      Code:
      #!/bin/python3
      
      def countLetters(line, letter):
         ret = 0
         for character in line:
            if character == letter: ret += 1
         return ret
      
      alphabet = 'abcdefghijklmnopqrstuvwxyz'
      #you can automate this using ascii codes too
      
      for line in open("file.txt",'r'):
         line = line.strip() #remove trailing spaces
         print (line)
         for letter in alphabet:
            print ('Has {0} of letter {1}'.\
            format(str(countLetters(line,letter)),letter))

      output:

      gbfchjwshfcjkwh ndfnxh;iquw;qem iziqmeuzngyegbf yewgybqgzeqzydg lndhlqhd;jkjnen jcejnrcjercbvvb dbggngnjmmnmsnv sdmsnfsfcsf>mNm vbmsdbfluahereg nctfbaxzmasqojm wqi;htrugttgp
      Has 3 of letter a
      Has 9 of letter b
      Has 7 of letter c
      Has 7 of letter d
      Has 10 of letter e
      Has 9 of letter f
      Has 12 of letter g
      Has 8 of letter h
      Has 4 of letter i
      Has 9 of letter j
      Has 2 of letter k
      Has 3 of letter l
      Has 11 of letter m
      Has 13 of letter n
      Has 1 of letter o
      Has 1 of letter p
      Has 8 of letter q
      Has 4 of letter r
      Has 8 of letter s
      Has 4 of letter t
      Has 4 of letter u
      Has 4 of letter v
      Has 5 of letter w
      Has 2 of letter x
      Has 4 of letter y
      Has 5 of letter z
      Last edited by twohot; Dec 27 '10, 06:56 AM. Reason: forgot to wrap the code section in code tags

      Comment

      • Michael Colon
        New Member
        • Dec 2010
        • 2

        #4
        Code:
        #-------------------------------------------------#
        # Set Variables                                   #
        #-------------------------------------------------#
        
        input = open("file.txt")
        whitelist = ('a','b','c','d','e','f','g') # whitelist of letters
        letters = {}
        
        #-------------------------------------------------#
        #  Functions                                      #
        #-------------------------------------------------#
        
        def count_letter(c):
          if c in letters:
            letters[c] += 1  # if letter in letters add one
          else:
            letters[c] = 1   # if letter not in letters set add letter to dictionary object
            
        
        def print_letters(letters):
        
          for k,v in letters.items():
            if k in whitelist:
              print "Has %s of letter %s" % (v,k) # print out count for each letter
            
            
        #-------------------------------------------------#
        #  Run code                                       #
        #-------------------------------------------------#
        
        
        for line in input:          # for each line in input file
          for letter in line:       # for each letter in line
            count_letter(letter)    # tally a count of each letter
        
        print_letters(letters)
        Here I use a more pythonic syntax, which means less lines of code. If you count everything and whitelist the characters your concerned with then your code can be easily modified in the future.

        Hope this helps!

        Comment

        • twohot
          New Member
          • Dec 2010
          • 8

          #5
          Very neat Michael

          Comment

          Working...