Parsing word Doc using PERL in Windows

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • pramodkh
    New Member
    • Nov 2007
    • 23

    Parsing word Doc using PERL in Windows

    Hi All

    I am parsing a word doc using perl. I am using Win32::OLE module for this.

    I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

    I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
    Here is my code :
    Code:
    use Win32::OLE;
    use Win32::OLE::Enum;
    use File::Copy;
    use strict;
    
    my $fileName = "C:\\FileName.doc";
    my $document = Win32::OLE -> GetObject($fileName);
    #Creating a new excel sheet
    my $xl_app=Win32::OLE->new('Excel.Application','Quit');
    
    
    my $paragraphs = $document->Paragraphs();
    my $enumerate = new Win32::OLE::Enum($paragraphs);
    
    while(defined($paragraph = $enumerate->Next()))
    {
    	$style = $paragraph->{Style}->{NameLocal};
    	$text = $paragraph->{Range}->{Text};
    	$text =~ s/[\n\r]//g;
    	$text =~ s/\x0b/\n/g;
    	$text =~ s/\x07//g;
    	print "\nStyle = $style";
    	print "\nText = $text";
    }
    Thanks & Regards
    Pramod
    Last edited by pramodkh; Mar 12 '08, 07:40 AM. Reason: typo error in Title
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    Originally posted by pramodkh
    Hi All

    I am parsing a word doc using perl. I am using Win32::OLE module for this.

    I am able to get the Paragraphs/styles/Text from the word doc. But facing some problem when I am trying to get a Text with Bullets/numbering. It Only displays the text, but i want to get the bullet number also from the text line.

    I am not getting the different methods available in Paragraphs(like Range, Style....etc). How do i get the list of methods? Please Help me.
    Here is my code :
    Code:
    use Win32::OLE;
    use Win32::OLE::Enum;
    use File::Copy;
    use strict;
    
    my $fileName = "C:\\FileName.doc";
    my $document = Win32::OLE -> GetObject($fileName);
    #Creating a new excel sheet
    my $xl_app=Win32::OLE->new('Excel.Application','Quit');
    
    
    my $paragraphs = $document->Paragraphs();
    my $enumerate = new Win32::OLE::Enum($paragraphs);
    
    while(defined($paragraph = $enumerate->Next()))
    {
    	$style = $paragraph->{Style}->{NameLocal};
    	$text = $paragraph->{Range}->{Text};
    	$text =~ s/[\n\r]//g;
    	$text =~ s/\x0b/\n/g;
    	$text =~ s/\x07//g;
    	print "\nStyle = $style";
    	print "\nText = $text";
    }
    Thanks & Regards
    Pramod
    You were provided some reading material on this over at Dev Shed . Did that help at all?

    Regards,

    Jeff

    Comment

    • pramodkh
      New Member
      • Nov 2007
      • 23

      #3
      Yes I did go through the Documentation. But looks like that is outdated. I use Office2003.
      After some googling i came to know that all the methods are same as that of VB. So I created a macro in MSWord and Recorded some actions like adding Headings, Text, bullets etc.I checked the source code of the Macro(which is in VB) and found out the following information:
      Code:
      Sub BulletMacro()
      '
      ' BulletMacro Macro
      ' Macro recorded 2008-03-13 by ing03125
      '
          Selection.TypeParagraph
          Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
          Selection.TypeText Text:="TE_Testcase: test"
          Selection.TypeParagraph
          Selection.TypeParagraph
          Selection.Style = ActiveDocument.Styles("Heading 2,Paragraph Title,l2")
          Selection.TypeText Text:=" TE_Testcase: Test1"
          Selection.TypeParagraph
          Selection.TypeParagraph
          With ListGalleries(wdBulletGallery).ListTemplates(1).ListLevels(1)
              .NumberFormat = ChrW(61623)
              .TrailingCharacter = wdTrailingTab
              .NumberStyle = wdListNumberStyleBullet
              .NumberPosition = InchesToPoints(0.25)
              .Alignment = wdListLevelAlignLeft
              .TextPosition = InchesToPoints(0.5)
              .TabPosition = InchesToPoints(0.5)
              .ResetOnHigher = 0
              .StartAt = 1
              With .Font
                  .Bold = wdUndefined
                  .Italic = wdUndefined
                  .StrikeThrough = wdUndefined
                  .Subscript = wdUndefined
                  .Superscript = wdUndefined
                  .Shadow = wdUndefined
                  .Outline = wdUndefined
                  .Emboss = wdUndefined
                  .Engrave = wdUndefined
                  .AllCaps = wdUndefined
                  .Hidden = wdUndefined
                  .Underline = wdUndefined
                  .Color = wdUndefined
                  .Size = wdUndefined
                  .Animation = wdUndefined
                  .DoubleStrikeThrough = wdUndefined
                  .Name = "Symbol"
              End With
              .LinkedStyle = ""
          End With
          ListGalleries(wdBulletGallery).ListTemplates(1).Name = ""
          Selection.Range.ListFormat.ApplyListTemplate ListTemplate:=ListGalleries( _
              wdBulletGallery).ListTemplates(1), ContinuePreviousList:=False, ApplyTo:= _
              wdListApplyToWholeList, DefaultListBehavior:=wdWord10ListBehavior
          Selection.TypeText Text:="first"
          Selection.TypeParagraph
          Selection.TypeText Text:="second"
          Selection.TypeParagraph
          Selection.TypeText Text:="third"
          Selection.TypeParagraph
          Selection.TypeBackspace
      End Sub
      But looks like this info is also not useful for me. Because when I try to print the Numbering associated with Headings/Bullets, Perl will print only the Style and does not give the value associated with it. like this:

      Style = Heading 1,Chapter Title,l1,TOC,1

      Please let me know if anyone knows about it.

      Thanks
      Pramod

      Comment

      Working...