extract repeating text segments

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mrlambdin
    New Member
    • Apr 2008
    • 3

    extract repeating text segments

    I have an archive file (PIDATA) that contains multiple (>30) segments of text like this:

    Code:
    [b]Archive[0]: d:\archives\piarch.012  (500MB, Used: 9.0%)[/b]
            PIarcfilehead[$Workfile: piarfile.cxx $ $Revision: 114 $]::
              Version: 5 Path: d:\archives\piarch.012
              State: 4 Type: 0 (fixed) Write Flag: 1 Shift Flag: 1
              Record Size: 1024 Count: 512000  Add Rate/Hour: 4118.3
              Offsets: Primary: 25853/128000 Overflow: 491596/512000
                   [B]Start Time: 1-Apr-08 22:02:38[/B]
                     [B]End Time: Current Time[/B]
                  Backup Time: 2-Apr-08 02:01:07
    The program repeats this over and over, naming each segment "Archive[1,2,3..]" and I need to extract the bold sections, and print them on one line... for example, I'd like THIS:

    Archive[0]: d:\archives\pia rch.012 (500MB, Used: 9.0%) ..... Start Time: 1-Apr-08 22:02:38 ..... End Time: Current Time

    ALL on one line.

    Here's my PERL script, but it doesn't seem to work.

    [CODE=perl]#!/usr/bin/perl
    while(<PIDATA>) {

    if (m/Archive.[\d+].*/) {
    $m1 =~ "$MATCH";
    }
    if (m/Start\sTime.*/) {
    $m2 =~ "$MATCH";
    }
    if (m/End\sTime.*/) {
    $m3 =~ "$MATCH";
    }
    print "$m1 \s $m2 \s $m3\n\n";

    }
    [/CODE]

    I tried to loop over the text file, and redirect the output, but the file is empty after running this.

    HELP!
    Last edited by eWish; Apr 3 '08, 02:14 PM. Reason: Please use code tags
  • nithinpes
    Recognized Expert Contributor
    • Dec 2007
    • 410

    #2
    The square brackets inside pattern need to be escaped, else it will be mistaken for character class. Also, what is the $MATCH that you are trying to match after matching the desired pattern.
    From your description, I feel all you need is to print out those emphasised lines. Try this:
    Code:
    while(<PIDATA>){
         print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
    }

    Comment

    • mrlambdin
      New Member
      • Apr 2008
      • 3

      #3
      I've removed my code and placed yours into the file piarchive.pl... . NO errors, but when I redirect output, the file is empty. Something's wrong...
      I run this:

      perl piarchive.pl > output

      And I get the file "output" but it's empty.

      Comment

      • nithinpes
        Recognized Expert Contributor
        • Dec 2007
        • 410

        #4
        Originally posted by mrlambdin
        I've removed my code and placed yours into the file piarchive.pl... . NO errors, but when I redirect output, the file is empty. Something's wrong...
        I run this:

        perl piarchive.pl > output

        And I get the file "output" but it's empty.
        For the given sample data, I got the desired output.
        Code:
        open(PIDATA,"data.txt") or die "open failed:$!";
        while(<PIDATA>){
         print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
        }
        On command line, I executed the following line:
        Code:
        perl archive.pl > C:\\output.txt
        The file output.txt contained:
        Code:
        Archive[0]: d:\archives\piarch.012  (500MB, Used: 9.0%)
                       Start Time: 1-Apr-08 22:02:38
                         End Time: Current Time
        If you want this in one line, modify
        Code:
        while(<PIDATA>){
         print $_ if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
        }
        to:

        Code:
        while(<PIDATA>){
         chomp;
         print "$_ ..." if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
        }

        Comment

        • nithinpes
          Recognized Expert Contributor
          • Dec 2007
          • 410

          #5
          Alternately, you can write into the output file within the script:
          Code:
          open(PIDATA,"data.txt") or die "open failed:$!";
          open(OUT,"output.txt") or die "create failed:$!";
          while(<PIDATA>){
           chomp;
           print OUT "$_ ..." if((/^\s*Archive\[\d+\].*/)||(/^\s*Start\s+Time.*/)||(/^\s*End\s+Time.*/)) ;
          }
          Last edited by nithinpes; Apr 4 '08, 07:00 AM. Reason: typo

          Comment

          • mrlambdin
            New Member
            • Apr 2008
            • 3

            #6
            Thanks for the help!

            Comment

            Working...