How to Search for a string pattern in a MS Word doc using python

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • yorlina007
    New Member
    • Jun 2008
    • 2

    How to Search for a string pattern in a MS Word doc using python

    Hi All,

    I have been trying to read word doc and search for a particluar string pattern using python.I have used the py32win API .

    I am succesfull till opening the file like

    import win32com.client
    word = win32com.client .Dispatch('Word .Application')
    f = word.Documents. Open("<filename >.doc")

    I am not sure how to proceed further reading each line and compare for the pattern.

    If anyone has any idea , plz respond ...

    Thanks in advance.
    Anil.
  • bvdet
    Recognized Expert Specialist
    • Oct 2006
    • 2851

    #2
    You don't have to open the word document to search for a string. For example, to find email addresses in a word document:[code=Python]import re

    fn = 'sample.doc'
    fStr = open(fn, 'rb').read()
    patt = re.compile(r'\b[a-zA-Z0-9.]+@[a-zA-Z0-9]+\.[a-z]{3}\b')
    print patt.findall(fS tr)[/code]

    Comment

    Working...