Analysing Word documents (slow) What's wrong with this code please!

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • jmdeschamps

    Analysing Word documents (slow) What's wrong with this code please!

    Anyone has a hint how else to get faster results?
    (This is to find out what was bold in the document, in order to grab
    documents ptoduced in word and generate html (web pages) and xml
    (straight data) versions)

    # START =============== =========
    import win32com.client
    import tkFileDialog, time

    # Launch Word
    MSWord = win32com.client .Dispatch("Word .Application")

    myWordDoc = tkFileDialog.as kopenfilename()

    MSWord.Document s.Open(myWordDo c)

    boldRanges=[] #list of bold ranges
    boldStart = -1
    boldEnd = -1
    t1= time.clock()
    for i in range(len(MSWor d.Documents[0].Content.Text)) :
    if MSWord.Document s[0].Range(i,i+1).B old : # testing for bold
    property
    if boldStart == -1:
    boldStart=i
    else:
    boldEnd= i
    else:
    if boldEnd != -1:
    boldRanges.appe nd((boldStart,b oldEnd))
    boldStart= -1
    boldEnd = -1
    t2 = time.clock()
    MSWord.Quit()

    print boldRanges #see what we got
    print "Analysed in ",t2-t1
    # END =============== =============== =======

    Thanks in advance
  • Daniel Dittmar

    #2
    Re: Analysing Word documents (slow) What's wrong with this code please!

    jmdeschamps wrote:[color=blue]
    > Anyone has a hint how else to get faster results?
    > (This is to find out what was bold in the document, in order to grab
    > documents ptoduced in word and generate html (web pages) and xml
    > (straight data) versions)[/color]
    [...][color=blue]
    > for i in range(len(MSWor d.Documents[0].Content.Text)) :
    > if MSWord.Document s[0].Range(i,i+1).B old : # testing for bold[/color]

    Perhaps you can search for bold text. The Word search dialog allows this.
    And when you use the keybord macro recording feature of Word, you can
    probably figure out how to use that search feature from Python.

    Daniel



    Comment

    • Eric Brunel

      #3
      Re: Analysing Word documents (slow) What's wrong with this code please!

      jmdeschamps wrote:[color=blue]
      > Anyone has a hint how else to get faster results?
      > (This is to find out what was bold in the document, in order to grab
      > documents ptoduced in word and generate html (web pages) and xml
      > (straight data) versions)
      >
      > # START =============== =========
      > import win32com.client
      > import tkFileDialog, time
      >
      > # Launch Word
      > MSWord = win32com.client .Dispatch("Word .Application")
      >
      > myWordDoc = tkFileDialog.as kopenfilename()
      >
      > MSWord.Document s.Open(myWordDo c)
      >
      > boldRanges=[] #list of bold ranges
      > boldStart = -1
      > boldEnd = -1
      > t1= time.clock()
      > for i in range(len(MSWor d.Documents[0].Content.Text)) :
      > if MSWord.Document s[0].Range(i,i+1).B old : # testing for bold
      > property[/color]

      Vaguely knowing how pythoncom works, you'd really better avoid asking for
      MSWord.Document s[0] at each loop step: pythoncom will fetch the COM objects
      corresponding to all attributes and methods you ask for dynamically and it may
      cost a lot of time. So doing:

      doc = MSWord.Document s[0]
      for i in range(len(doc.C ontent.text)):
      if doc.Range(i,i+1 ).Bold: ...

      may greatly improve performances.
      [color=blue]
      > if boldStart == -1:
      > boldStart=i
      > else:
      > boldEnd= i
      > else:
      > if boldEnd != -1:
      > boldRanges.appe nd((boldStart,b oldEnd))
      > boldStart= -1
      > boldEnd = -1
      > t2 = time.clock()
      > MSWord.Quit()
      >
      > print boldRanges #see what we got
      > print "Analysed in ",t2-t1
      > # END =============== =============== =======
      >
      > Thanks in advance[/color]


      --
      - Eric Brunel <eric dot brunel at pragmadev dot com> -
      PragmaDev : Real Time Software Development Tools - http://www.pragmadev.com

      Comment

      • jmdeschamps

        #4
        Re: Analysing Word documents (slow) What's wrong with this code please!

        "Daniel Dittmar" <daniel.dittmar @sap.com> wrote in message news:<bugcnf$d5 r$1@news1.wdf.s ap-ag.de>...[color=blue]
        > jmdeschamps wrote:[color=green]
        > > Anyone has a hint how else to get faster results?
        > > (This is to find out what was bold in the document, in order to grab
        > > documents ptoduced in word and generate html (web pages) and xml
        > > (straight data) versions)[/color]
        > [...][/color]
        ....[color=blue]
        > Perhaps you can search for bold text. The Word search dialog allows this.
        > And when you use the keybord macro recording feature of Word, you can
        > probably figure out how to use that search feature from Python.
        >
        > Daniel[/color]

        Thanks Paul Prescod suggested this also, works great!

        Jean-Marc

        Comment

        • jmdeschamps

          #5
          Re: Analysing Word documents (slow) What's wrong with this code please!

          Eric Brunel <eric.brunel@N0 SP4M.com> wrote in message news:<bugpf4$i7 $1@news-reader4.wanadoo .fr>...[color=blue]
          > jmdeschamps wrote:[color=green]
          > > Anyone has a hint how else to get faster results?
          > > (This is to find out what was bold in the document, in order to grab
          > > documents ptoduced in word and generate html (web pages) and xml
          > > (straight data) versions)
          > >
          > > # START =============== =========
          > > import win32com.client
          > > import tkFileDialog, time
          > >
          > > # Launch Word
          > > MSWord = win32com.client .Dispatch("Word .Application")
          > >
          > > myWordDoc = tkFileDialog.as kopenfilename()
          > >
          > > MSWord.Document s.Open(myWordDo c)
          > >
          > > boldRanges=[] #list of bold ranges
          > > boldStart = -1
          > > boldEnd = -1
          > > t1= time.clock()
          > > for i in range(len(MSWor d.Documents[0].Content.Text)) :
          > > if MSWord.Document s[0].Range(i,i+1).B old : # testing for bold
          > > property[/color]
          >
          > Vaguely knowing how pythoncom works, you'd really better avoid asking for
          > MSWord.Document s[0] at each loop step: pythoncom will fetch the COM objects
          > corresponding to all attributes and methods you ask for dynamically and it may
          > cost a lot of time. So doing:
          >
          > doc = MSWord.Document s[0]
          > for i in range(len(doc.C ontent.text)):
          > if doc.Range(i,i+1 ).Bold: ...
          >
          > may greatly improve performances.
          >[color=green]
          > >[/color][/color]
          ....
          Thanks, it does! And using builtin Find object also.

          Jean-Marc

          Comment

          Working...