Searching for more than one word in multiple files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • grantstech
    New Member
    • Jun 2007
    • 16

    Searching for more than one word in multiple files

    I have successfully created a program that searches for a word in multiple files but now I need to be able to search by more than one word. I have add code from a previous discussion to my original program but I am unsure how they should fit together. Can someone clear this up for me?

    [code=python]
    #!C:\PYTHON25\P YTHON.EXE

    import os
    import re
    dir_name= r'c:\Python25\b ooks\books\book s'
    word=raw_input( "Enter a word to search for: ")
    word2=raw_input ("Enter a second word to search for: ")
    keyList = ['word', 'word2']
    entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name) if os.path.isfile( os.path.join(di r_name, fn))]
    for file_name in entryList:
    for line in file(file_name) .readlines():
    if word in line:
    print line
    patt = re.compile('|'. join(keyList), re.IGNORECASE)
    for fn in dir_name:
    f = open(fn)
    for line in f:
    if patt.search(lin e.lower()):
    print line
    f.close()
    [/code]
  • bartonc
    Recognized Expert Expert
    • Sep 2006
    • 6478

    #2
    Originally posted by grantstech
    I have successfully created a program that searches for a word in multiple files but now I need to be able to search by more than one word. I have add code from a previous discussion to my original program but I am unsure how they should fit together. Can someone clear this up for me?

    [code=python]
    #!C:\PYTHON25\P YTHON.EXE

    import os
    import re
    dir_name= r'c:\Python25\b ooks\books\book s'
    word=raw_input( "Enter a word to search for: ")
    word2=raw_input ("Enter a second word to search for: ")
    keyList = ['word', 'word2']
    entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name) if os.path.isfile( os.path.join(di r_name, fn))]
    for file_name in entryList:
    for line in file(file_name) .readlines():
    if word in line:
    print line
    patt = re.compile('|'. join(keyList), re.IGNORECASE)
    for fn in dir_name:
    f = open(fn)
    for line in f:
    if patt.search(lin e.lower()):
    print line
    f.close()
    [/code]
    Here's one way:[CODE=python]import os
    import re
    dir_name = r'c:\Python25\b ooks\books\book s'

    ##word2 = raw_input("Ente r a second word to search for: ")
    ###removed quotes#
    ##keyList = [word, word2]

    def FindWord(word, fileList):
    for file_name in fileList:
    for line in file(file_name) .readlines():
    if word in line:
    print line

    def FindWords(wordL ist, fileList):
    patt = re.compile('|'. join(wordList), re.IGNORECASE)
    for fn in dir_name:
    f = open(fn)
    for line in f.readlines(): # added .readlines()
    if patt.search(lin e.lower()): # probably don't need .lower()
    print line
    f.close()


    entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name)
    if os.path.isfile( os.path.join(di r_name, fn))]

    words = raw_input("Ente r one or more words to search for: ")
    keyList = words.split()
    if len(keylist) > 1:
    FindWords(keyli st, entryList)
    else:
    FindWord(words, entryList)
    [/CODE]

    Comment

    • grantstech
      New Member
      • Jun 2007
      • 16

      #3
      Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
      Thanks for all your help.

      [CODE=PYTHON]

      import os
      import re
      dir_name = r'c:\Python25\b ooks\books\book s'


      def FindWord(word, fileList):
      for file_name in fileList:
      for line in file(file_name) .readlines():
      if word in line:
      print line

      def FindWords(wordL ist, fileList):
      patt = re.compile('|'. join(wordList), re.IGNORECASE)
      for fn in dir_name:
      f = open(fn)
      for line in f.readlines():
      if patt.search(lin e):
      print line
      f.close()


      entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name)
      if os.path.isfile( os.path.join(di r_name, fn))]

      words = raw_input("Ente r one or more words to search for: ")
      keyList = words.split()
      print keyList
      if len(keyList) > 1:
      FindWords(words , entryList)
      else:
      FindWord(words, entryList)
      [/CODE]

      Enter one or more words to search for: bird goat tree
      ['bird', 'goat', 'tree']

      Traceback (most recent call last):
      File "C:/Python25/searchtest.py", line 29, in <module>
      FindWords(words , entryList)
      File "C:/Python25/searchtest.py", line 15, in FindWords
      f = open(fn)
      IOError: [Errno 2] No such file or directory: 'c'

      Comment

      • bvdet
        Recognized Expert Specialist
        • Oct 2006
        • 2851

        #4
        Originally posted by grantstech
        Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
        Thanks for all your help.

        [CODE=PYTHON]

        import os
        import re
        dir_name = r'c:\Python25\b ooks\books\book s'


        def FindWord(word, fileList):
        for file_name in fileList:
        for line in file(file_name) .readlines():
        if word in line:
        print line

        def FindWords(wordL ist, fileList):
        patt = re.compile('|'. join(wordList), re.IGNORECASE)
        for fn in dir_name:
        f = open(fn)
        for line in f.readlines():
        if patt.search(lin e):
        print line
        f.close()


        entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name)
        if os.path.isfile( os.path.join(di r_name, fn))]

        words = raw_input("Ente r one or more words to search for: ")
        keyList = words.split()
        print keyList
        if len(keyList) > 1:
        FindWords(words , entryList)
        else:
        FindWord(words, entryList)
        [/CODE]

        Enter one or more words to search for: bird goat tree
        ['bird', 'goat', 'tree']

        Traceback (most recent call last):
        File "C:/Python25/searchtest.py", line 29, in <module>
        FindWords(words , entryList)
        File "C:/Python25/searchtest.py", line 15, in FindWords
        f = open(fn)
        IOError: [Errno 2] No such file or directory: 'c'
        You have left out some code. Look at this and then look at your code:[code=Python]>>> dir_name = r'c:\Python25\b ooks\books\book s'
        >>> for fn in dir_name:
        ... print fn
        ...
        c
        :
        \
        P
        y
        t
        h
        o
        n
        2
        5
        \
        b
        o
        o
        k
        s
        \
        b
        o
        o
        k
        s
        \
        b
        o
        o
        k
        s
        >>> [/code]

        Comment

        • bartonc
          Recognized Expert Expert
          • Sep 2006
          • 6478

          #5
          Originally posted by grantstech
          Here is what I have now. It searchs on just fine but when I add a second word it gives me an error. I added a print statement to see if it was splitting the input and it is. I have listed the error I keep getting at the bottom. I can't figure out what is wrong.
          Thanks for all your help.

          [CODE=PYTHON]

          import os
          import re
          dir_name = r'c:\Python25\b ooks\books\book s'


          def FindWord(word, fileList):
          for file_name in fileList:
          for line in file(file_name) .readlines():
          if word in line:
          print line

          def FindWords(wordL ist, fileList):
          patt = re.compile('|'. join(wordList), re.IGNORECASE)
          for fn in dir_name:
          f = open(fn)
          for line in f.readlines():
          if patt.search(lin e):
          print line
          f.close()


          entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name)
          if os.path.isfile( os.path.join(di r_name, fn))]

          words = raw_input("Ente r one or more words to search for: ")
          keyList = words.split()
          print keyList
          if len(keyList) > 1:
          FindWords(words , entryList)
          else:
          FindWord(words, entryList)
          [/CODE]

          Enter one or more words to search for: bird goat tree
          ['bird', 'goat', 'tree']

          Traceback (most recent call last):
          File "C:/Python25/searchtest.py", line 29, in <module>
          FindWords(words , entryList)
          File "C:/Python25/searchtest.py", line 15, in FindWords
          f = open(fn)
          IOError: [Errno 2] No such file or directory: 'c'
          My bad. Sorry. It should be:[CODE=python]

          def FindWords(wordL ist, fileList):
          patt = re.compile('|'. join(wordList), re.IGNORECASE)
          for fn in fileList:
          f = open(fn)
          for line in f.readlines():
          if patt.search(lin e):
          print line
          f.close()
          [/CODE]

          Comment

          • grantstech
            New Member
            • Jun 2007
            • 16

            #6
            Thanks guys.
            I got it where it is searching for all the words but I need to fine tune it some more.
            First of all, when I put in a word like "eat", it is finding everything with those letters in it like "beat". Is there a way to make it only pull up the exact word?

            Also it is bring up all of the lines that have one of the words in it. Is there a way to change it so that it only prints the lines that have all of the words in it?

            Comment

            • Smygis
              New Member
              • Jun 2007
              • 126

              #7
              #!C:\PYTHON25\P YTHON.EXE

              Shuld always be

              #!/usr/bin/env python

              And never anything else.

              Unlike windows who executes files after ther file extention *nix systems reads the first line of every file before its executed.

              And if that line begins with #! the rest of the file is sent as an argument to the specified enviroment. in our case, python.

              Comment

              • bartonc
                Recognized Expert Expert
                • Sep 2006
                • 6478

                #8
                Originally posted by grantstech
                Thanks guys.
                I got it where it is searching for all the words but I need to fine tune it some more.
                First of all, when I put in a word like "eat", it is finding everything with those letters in it like "beat". Is there a way to make it only pull up the exact word?

                Also it is bring up all of the lines that have one of the words in it. Is there a way to change it so that it only prints the lines that have all of the words in it?
                It would be very helpful if you would get in the habit of posting the working code (especially if you still have questions). It helps others figure out this type of problem when they get stuck and it helps us see what the heck you're talking about.

                That said:
                The "fine tuning" comes down to learning the Regular Expression language and I'm not sure that I'm reading to start calling this the Python/Regex Forum, just yet. Regular-Expression.info is a good place to start with that.

                Comment

                • grantstech
                  New Member
                  • Jun 2007
                  • 16

                  #9
                  Originally posted by bartonc
                  It would be very helpful if you would get in the habit of posting the working code (especially if you still have questions). It helps others figure out this type of problem when they get stuck and it helps us see what the heck you're talking about.

                  That said:
                  The "fine tuning" comes down to learning the Regular Expression language and I'm not sure that I'm reading to start calling this the Python/Regex Forum, just yet. Regular-Expression.info is a good place to start with that.

                  It's not much different that is listed above:

                  [CODE=Python]
                  import os
                  import re
                  dir_name = r'c:\Python25\b ooks\books\book s'

                  def FindWord(word, fileList):
                  for file_name in fileList:
                  for line in file(file_name) .readlines():
                  if word in line:
                  print line

                  def FindWords(wordL ist, fileList):
                  patt = re.compile('|'. join(wordList), re.IGNORECASE)
                  for fn in fileList:
                  f = open(fn)
                  for line in f.readlines():
                  if patt.search(lin e):
                  print line
                  f.close()
                  entryList = [os.path.join(di r_name, fn) for fn in os.listdir(dir_ name)
                  if os.path.isfile( os.path.join(di r_name, fn))]

                  words = raw_input("Ente r one or more words to search for: ")
                  keyList = words.split()
                  if len(keyList) > 1:
                  FindWords(keyLi st, entryList)
                  else:
                  FindWord(words, entryList)
                  [/CODE]

                  Comment

                  • bvdet
                    Recognized Expert Specialist
                    • Oct 2006
                    • 2851

                    #10
                    Given a file name and a key word list, this function will print any line that contains a word in the key word list:[code=Python]def matchAnyWord(fn , keyList):
                    patt = re.compile('(?< ![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList ), re.IGNORECASE)
                    f = open(fn)
                    for line in f:
                    if patt.search(lin e.lower()):
                    print line
                    f.close()[/code]Given a file name and a key word list, this function will print any line that contains all of the words in the key word list:[code=Python]
                    def matchAllWords(f n, keyList):
                    pattList = [re.compile('(?< ![a-z])%s(?![a-z])' % key) for key in keyList]
                    f = open(fn)
                    for line in f:
                    matchList = []
                    for patt in pattList:
                    matchList.appen d(patt.search(l ine.lower()))
                    print matchList
                    if None not in matchList:
                    print line
                    f.close()[/code]

                    Comment

                    • grantstech
                      New Member
                      • Jun 2007
                      • 16

                      #11
                      Great, thanks, bvdet. I'll try to incorporate that.

                      Originally posted by bvdet
                      Given a file name and a key word list, this function will print any line that contains a word in the key word list:[code=Python]def matchAnyWord(fn , keyList):
                      patt = re.compile('(?< ![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList ), re.IGNORECASE)
                      f = open(fn)
                      for line in f:
                      if patt.search(lin e.lower()):
                      print line
                      f.close()[/code]Given a file name and a key word list, this function will print any line that contains all of the words in the key word list:[code=Python]
                      def matchAllWords(f n, keyList):
                      pattList = [re.compile('(?< ![a-z])%s(?![a-z])' % key) for key in keyList]
                      f = open(fn)
                      for line in f:
                      matchList = []
                      for patt in pattList:
                      matchList.appen d(patt.search(l ine.lower()))
                      print matchList
                      if None not in matchList:
                      print line
                      f.close()[/code]

                      Comment

                      • bvdet
                        Recognized Expert Specialist
                        • Oct 2006
                        • 2851

                        #12
                        Here is an interactive exercise:[code=Python]>>> keyList = ['thread', 'needle']
                        >>> patt = re.compile('(?< ![a-z])%s(?![a-z])' % '(?![a-z])|(?<![a-z])'.join(keyList ), re.IGNORECASE)
                        >>> patt.search('Th e thread was threaded through several needles')
                        <_sre.SRE_Mat ch object at 0x00DB6138>
                        >>> print patt.search('Th e threads were threaded through several needles')
                        None
                        >>> pattList = [re.compile('(?< ![a-z])%s(?![a-z])' % key) for key in keyList]
                        >>> for patt in pattList:
                        ... print patt.search('Th e thread was threaded through several needles')
                        ...
                        <_sre.SRE_Mat ch object at 0x00DB62F8>
                        None
                        >>> for patt in pattList:
                        ... print patt.search('Th e thread was threaded through a needle')
                        ...
                        <_sre.SRE_Mat ch object at 0x00DB6288>
                        <_sre.SRE_Mat ch object at 0x00DB6288>
                        >>> [/code]The re expression was modified to exclude matches if a key word was preceded or followed by any letter in the set '[a-z]'.

                        Comment

                        Working...