how to ingrate my code to read txt file in dirctory(folder)and subdirectory? PLZ help

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • alivip
    New Member
    • Mar 2008
    • 17

    how to ingrate my code to read txt file in dirctory(folder)and subdirectory? PLZ help

    how to ingrate my code to read text in in parent folder contain sub folders and files for example folder name is cars and sub file is Toyota,Honda and BMW and Toyota contain file name Camry and file name corolla, file name Honda contain folder accord and BMW contain file name X5

    Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,H onda and BMW) and files ?

    please help ASAP

    code is find most frequent word in one text file and print them in decrease order
    and I wont it to find most frequant word in all text files (together) under specific folder

    Code:
    # count words in a text and show the first ten items
    # by decreasing frequency
     
    # sample text for testing
    
    import sys
    import string
    import re
    file = open ("arb.txt", "r")
    text = file.read ( )
    file.close ( )
     
    word_freq = {}
     
    word_list = text.split()
     
    for word in word_list:
        # word all lower case
        word = word.lower()
        # strip any trailing period or comma
        word = word.rstrip('.,/"-_;\[]()')
        # build the dictionary
        count = word_freq.get(word, 0)
        word_freq[word] = count + 1
     
    # create a list of (freq, word) tuples
    freq_list = [(freq, word) for word, freq in word_freq.items()]
     
    # sort the list by the first element in each tuple (default)
    freq_list.sort(reverse=True)
     
    for n, tup in enumerate(freq_list):
        # print the first ten items
        if n < 10:
            freq, word = tup
            print freq, word
  • jlm699
    Contributor
    • Jul 2007
    • 314

    #2
    Originally posted by alivip
    Is there way to enter name of parent folder(cars) and search in all sub folder(Toyota,H onda and BMW) and files ?
    I'm sorry, but it's very difficult to understand what it is that you are asking. I can provide you with some direction however...

    Perhaps something you're looking for is os.walk. Here is a sample:
    [code=python]
    >>> for root, dirs, files in os.walk(os.getc wd()):
    ... print 'Looking into %s' % root.split('\\' )[-1]
    ... print 'Found %d dirs and %d files' % (len(dirs), len(files))
    ... for idx, dir in enumerate(dirs) :
    ... print 'Directory #%d: %s' % (idx + 1, dir)
    ... for idx, file in enumerate(files ):
    ... print 'File #%d: %s' % (idx + 1, file)
    ...
    Looking into pythtests
    Found 2 dirs and 16 files
    Directory #1: graphics
    Directory #2: Question
    File #1: bckmch.py
    File #2: cmdtest.py
    File #3: cobyla.py
    File #4: elseerr.py
    File #5: fileio.py
    File #6: ldict.py
    File #7: lid
    File #8: mainbody
    File #9: matrixprint.py
    File #10: matrx_print.py
    File #11: test.py
    File #12: test2.py
    File #13: topload
    File #14: totalbottle
    File #15: trivgame.py
    File #16: wxtemplate.py
    Looking into graphics
    Found 0 dirs and 8 files
    File #1: Buttons.py
    File #2: dice_class.py
    File #3: ghostchars.py
    File #4: graphics.py
    File #5: graphics.pyc
    File #6: graphics22.py
    File #7: graphics22.pyc
    File #8: hw6-template.py
    Looking into Question
    Found 0 dirs and 0 files
    >>> [/code]
    Hope that helps a little bit

    Comment

    • alivip
      New Member
      • Mar 2008
      • 17

      #3
      I mean example of parent dirctory (folder) is cars and example of subdirectory (folder) is (BMW,Honda,Toyo ta) so I wont to trace directory and all subdirctory
      to find most frequant word in all text files (together) under specific folder


      and I did not understand what your code mean

      Comment

      • alivip
        New Member
        • Mar 2008
        • 17

        #4
        thanx M.r jlm699 your reply was helpfull

        but it does not match what I wont exactly

        modifyig code is

        Code:
        # count words in a text and show the first ten items
        # by decreasing frequency
         
        # sample text for testing
        
        import sys
        import string
        import re
        import os.path
        for root, dirs, files in os.walk(os.getcwd()):
          print 'Looking into %s' % root.split('\\')[-1]
          print 'Found %d dirs and %d files' % (len(dirs), len(files))
          for idx, dir in enumerate(dirs):
            print 'Directory #%d: %s' % (idx + 1, dir)
            for idx, file in enumerate(files):
              print 'File #%d: %s' % (idx + 1, file)
              ff = open (file, "r")
              text = ff.read ( )
              ff.close ( )
             
              word_freq = {}
             
              word_list = text.split()
             
              for word in word_list:
                # word all lower case
                 word = word.lower()
                # strip any trailing period or comma
                 word = word.rstrip('.,/"-_;\[]()')
                # build the dictionary
                 count = word_freq.get(word, 0)
                 word_freq[word] = count + 1
             
            # create a list of (freq, word) tuples
              freq_list = [(freq, word) for word, freq in word_freq.items()]
             
            # sort the list by the first element in each tuple (default)
              freq_list.sort(reverse=True)
             
              for n, tup in enumerate(freq_list):
                # print the first ten items
                 if n < 10:
                    freq, word = tup
                    print freq, word
        the output like

        File #12: listtoDict.py
        14 with
        6 python
        6 for
        File #13: parseAddresses
        3 python
        1 with
        1 will
        and I need to find frequacy of word in all text file not seperat for examle the previos output shud be like

        15 with
        9 python
        6 for
        1 will

        so add frequancy of word in (File #12: listtoDict.py) with (File #13: parseAddresses) and print thim in one list

        Comment

        • jlm699
          Contributor
          • Jul 2007
          • 314

          #5
          Originally posted by alivip
          and I need to find frequacy of word in all text file
          Just move your word_freq dictionary declaration to before you begin the for loop, and then move the sorting/printing of that structure to after the for loop. And you'll achieve this.

          Comment

          • jlm699
            Contributor
            • Jul 2007
            • 314

            #6
            Here's the modifications that I suggest above and the resulting output.[CODE=python]
            import sys, os

            word_freq = {}

            for root, dirs, files in os.walk(os.getc wd()):
            print 'Looking into %s' % root.split('\\' )[-1]
            print 'Found %d dirs and %d files' % (len(dirs), len(files))

            for idx, file in enumerate(files ):
            ff = open (os.path.join(r oot, file), "r")
            text = ff.read ( )
            ff.close ( )

            word_list = text.strip().sp lit()

            for word in word_list:
            word = word.lower().rs trip('.,/"-_;\\[]()')
            if word.isalpha():
            # build the dictionary
            count = word_freq.get(w ord, 0)
            word_freq[word] = count + 1

            # create a list of (freq, word) tuples
            freq_list = [(freq, word) for word, freq in word_freq.items ()]

            # sort the list by the first element in each tuple (default)
            freq_list.sort( reverse=True)

            for n, tup in enumerate(freq_ list):
            # print the first ten items
            if n < 10:
            freq, word = tup
            print freq, word[/CODE]
            Output:
            [CODE=output]
            Microsoft Windows XP [Version 5.1.2600]
            (C) Copyright 1985-2001 Microsoft Corp.

            C:\Documents and Settings\Admini strator>cd Desktop\pythtes ts

            C:\Documents and Settings\Admini strator\Desktop \pythtests>pyth on walkncount.py
            Looking into pythtests
            Found 2 dirs and 17 files
            Looking into graphics
            Found 0 dirs and 8 files
            Looking into Question
            Found 0 dirs and 0 files
            46 the
            17 and
            14 of
            14 a
            12 is
            10 to
            10 in
            8 you
            8 this
            8 that

            C:\Documents and Settings\Admini strator\Desktop \pythtests>[/CODE]

            Comment

            • alivip
              New Member
              • Mar 2008
              • 17

              #7
              thanx alot
              but it is actualy read all file but print frequancy of only one of them
              not print frequancy of word in all file which I wont

              Comment

              • jlm699
                Contributor
                • Jul 2007
                • 314

                #8
                Originally posted by alivip
                read all file but print frequancy of only one of them
                Ok... I'm not sure exactly what you mean by that but I think that you're trying to say you only want to display the frequency of words in the file with the highest frequencies?

                [CODE=python]
                import sys, os

                highest_freq = [(0,'Blank')]
                high_file_name = ''

                for root, dirs, files in os.walk(os.getc wd()):
                # print 'Looking into %s' % root.split('\\' )[-1]
                # print 'Found %d dirs and %d files' % (len(dirs), len(files))

                for idx, file in enumerate(files ):
                # print 'File #%d: %s' % (idx + 1, file)
                ff = open (os.path.join(r oot, file), "r")
                text = ff.read ( )
                ff.close ( )

                word_freq = {}
                word_list = text.strip().sp lit()

                for word in word_list:
                word = word.lower().rs trip('.,/"-_;\\[]()')
                if word.isalpha():
                # build the dictionary
                word_freq[word] = word_freq.get(w ord, 0) + 1

                # create a list of (freq, word) tuples
                freq_list = [(freq, word) for word, freq in word_freq.items ()]

                # sort the list by the first element in each tuple (default)
                freq_list.sort( reverse=True)
                if freq_list:
                if freq_list[0][0] > highest_freq[0][0]:
                highest_freq = freq_list
                high_file_name = file

                print 'Highest frequency file: %s' % high_file_name
                for n, tup in enumerate(highe st_freq):
                if n < 10:
                freq, word = tup
                print freq, word
                raw_input('\nHi t enter to exit')
                [/code]
                Output:
                Code:
                Highest frequency file: graphics.py
                93 def
                44 return
                36 the
                31 in
                29 of
                26 for
                25 if
                23 to
                19 class
                19 a
                
                Hit enter to exit
                This is a crude example so I apologize; however I don't understand what you're trying to do or why. So working with what you've given this is the most I can make of your question.

                Comment

                • alivip
                  New Member
                  • Mar 2008
                  • 17

                  #9
                  realy I aprechat your trying to help
                  but unfortionatly that is no wat I ment

                  I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file

                  Comment

                  • jlm699
                    Contributor
                    • Jul 2007
                    • 314

                    #10
                    Originally posted by alivip
                    realy I aprechat your trying to help
                    but unfortionatly that is no wat I ment

                    I meat is read all files in directory compin all words in all files and put them in new file then find freqancy of each word in taht new file
                    So basically, you're saying you want to combine the contents of all the files into a new file, and then find the frequency of the words in that file?

                    Well to do that w/o creating a new file would be a very slight change from a previous post:
                    [code=python]
                    import sys, os

                    word_freq = {}

                    for root, dirs, files in os.walk(os.getc wd()):
                    print 'Looking into %s' % root.split('\\' )[-1]
                    print 'Found %d dirs and %d files' % (len(dirs), len(files))

                    for idx, file in enumerate(files ):
                    ff = open (os.path.join(r oot, file), "r")
                    text = ff.read ( )
                    ff.close ( )

                    word_list = text.strip().sp lit()

                    for word in word_list:
                    word = word.lower().rs trip('.,/"-_;\\[]()')
                    if word.isalpha():
                    # build the dictionary
                    count = word_freq.get(w ord, 0)
                    word_freq[word] = count + 1

                    # create a list of (freq, word) tuples
                    freq_list = [(freq, word) for word, freq in word_freq.items ()]

                    # sort the list by the first element in each tuple (default)
                    freq_list.sort( reverse=True)

                    for n, tup in enumerate(freq_ list):
                    # print the first ten items
                    if n < 10:
                    print "%s times: %s" % tup
                    raw_input('\nHi t enter to exit')
                    [/code]

                    Comment

                    • alivip
                      New Member
                      • Mar 2008
                      • 17

                      #11
                      thank you very much M.r jlm699 it is work fine now

                      I integrat program to be GUI using Tkinter and insted search in curent direction I try to be from browser
                      as you can see

                      Code:
                      # a look at the Tkinter Text widget
                      
                      # use ctrl+c to copy, ctrl+x to cut selected text,
                      
                      # ctrl+v to paste, and ctrl+/ to select all
                        # count words in a text and show the first ten items
                       # by decreasing frequency
                      
                      import Tkinter as tk
                      import os, glob
                      import sys
                      import string
                      import re
                      import tkFileDialog      
                      def most_frequant_word():    
                       a= tkFileDialog.askdirectory()
                       browser= os.listdir(a)
                      
                      
                       for root, dirs, files in os.walk(browser):
                          print 'Looking into %s' % root.split('\\')[-1]
                          print 'Found %d dirs and %d files' % (len(dirs), len(files))
                       
                          for idx, file in enumerate(files):
                           ff = open (os.path.join(root, file), "r")
                           text = ff.read ( )
                           ff.close ( )
                           
                           word_list = text.strip().split()
                           
                           for word in word_list:
                            word = word.lower().rstrip('.,/"-_;\\[]()')
                      
                            if word.isalpha():
                                      # build the dictionary
                             count = word_freq.get(word, 0)
                             word_freq[word] = count + 1
                       
                             # create a list of (freq, word) tuples
                             freq_list = [(freq, word) for word, freq in word_freq.items()]
                           
                             # sort the list by the first element in each tuple (default)
                             freq_list.sort(reverse=True)
                          
                           for n, tup in enumerate(freq_list):
                          # print the first ten items
                            if n < 15:
                              print "%s times: %s" % tup
                              text1.insert(tk.INSERT, freq)
                              text1.insert(tk.INSERT, word)
                              text1.insert(tk.INSERT, "\n")
                              
                       raw_input('\nHit enter to exit')
                       
                      root = tk.Tk(className = " most_frequant_word")
                      # text entry field, width=width chars, height=lines text
                      v1 = tk.StringVar()
                      text1 = tk.Text(root, width=50, height=20, bg='green')
                      text1.pack()
                      # function listed in command will be executed on button click
                      button1 = tk.Button(root, text='result', command=most_frequant_word)
                      button1.pack(pady=5)
                      text1.focus()
                      root.mainloop()
                      but give me this error

                      Exception in Tkinter callback
                      Traceback (most recent call last):
                      File "C:\Python25\li b\lib-tk\Tkinter.py", line 1403, in __call__
                      return self.func(*args )
                      File "C:\Documen ts and Settings\Admini strator\Desktop \ICS482\hw3\pro gramAli.py", line 21, in most_frequant_w ord
                      for root, dirs, files in os.walk(browser ):
                      File "C:\Python25\li b\os.py", line 285, in walk
                      names = listdir(top)
                      TypeError: coercing to Unicode: need string or buffer, list found
                      could you please help me to solve this problem

                      Comment

                      • alivip
                        New Member
                        • Mar 2008
                        • 17

                        #12
                        I fix the error now
                        but
                        it will not insert to the textbox
                        it just print then hanging

                        Code:
                        # a look at the Tkinter Text widget
                        
                        # use ctrl+c to copy, ctrl+x to cut selected text,
                        
                        # ctrl+v to paste, and ctrl+/ to select all
                          # count words in a text and show the first ten items
                         # by decreasing frequency
                        
                        import Tkinter as tk
                        import os, glob
                        import sys
                        import string
                        import re
                        import tkFileDialog      
                        def most_frequant_word():    
                         browser= tkFileDialog.askdirectory()
                         #browser= os.listdir(a)
                        
                        
                         for root, dirs, files in os.walk(browser):
                            print 'Looking into %s' % root.split('\\')[-1]
                            print 'Found %d dirs and %d files' % (len(dirs), len(files))
                            #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
                            #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
                            for idx, file in enumerate(files):
                             print 'File #%d: %s' % (idx + 1, file)
                              #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))
                             ff = open (os.path.join(root, file), "r")
                             text = ff.read ( )
                             ff.close ( )
                             word_freq = {}
                             
                             word_list = text.strip().split()
                             
                             for word in word_list:
                              word = word.lower().rstrip('.,/"-_;\\[]()')
                        
                              if word.isalpha():
                                        # build the dictionary
                               count = word_freq.get(word, 0)
                               word_freq[word] = count + 1
                         
                               # create a list of (freq, word) tuples
                               freq_list = [(freq, word) for word, freq in word_freq.items()]
                             
                               # sort the list by the first element in each tuple (default)
                               freq_list.sort(reverse=True)
                            
                             for n, tup in enumerate(freq_list):
                            # print the first ten items
                              if n < 50:
                                print "%s times: %s" % tup
                                text1.insert(tk.INSERT, freq)
                                text1.insert(tk.INSERT, word)
                                text1.insert(tk.INSERT, "\n")
                                
                         raw_input('\nHit enter to exit')
                         
                        root = tk.Tk(className = " most_frequant_word")
                        # text entry field, width=width chars, height=lines text
                        v1 = tk.StringVar()
                        text1 = tk.Text(root, width=50, height=20, bg='green')
                        text1.pack()
                        # function listed in command will be executed on button click
                        button1 = tk.Button(root, text='Brows', command=most_frequant_word)
                        button1.pack(pady=5)
                        text1.focus()
                        root.mainloop()
                        code try to insert
                        Code:
                         print "%s times: %s" % tup
                                text1.insert(tk.INSERT, freq)
                                text1.insert(tk.INSERT, word)
                                text1.insert(tk.INSERT, "\n")
                        when I wont to insert fil name and directory to the textbox it will hang also
                        code is comment

                        Code:
                        print 'Looking into %s' % root.split('\\')[-1]
                            print 'Found %d dirs and %d files' % (len(dirs), len(files))
                            #text1.insert(tk.INSERT,'Looking into %s' % root.split('\\')[-1])
                            #text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
                            for idx, file in enumerate(files):
                             print 'File #%d: %s' % (idx + 1, file)
                              #text1.insert(tk.INSERT, 'File #%d: %s' % (idx + 1, file))

                        Comment

                        Working...