how to match a paragraph by regexp?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • poolboi
    New Member
    • Jan 2008
    • 170

    how to match a paragraph by regexp?

    hey guys,

    another regexp problem which i'm probably not good at here goes
    my text file has this


    Code:
     [B]/* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
    I AM A GIRL
     /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
     /* 4c COMMAND EXECUTED */ 
     /* 4c SESSION=01547 USERID=SASTST 2008-05-13 09:46:12 */
     /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */
     /* 1 TB VTP-1 SESSION=01547 USERID STARTED 2008-05-13 09:46:11 */
     I AM A BOY
     /* 3 SESSION=01547 USERID 2008-05-13 09:46:12 */
     /* 4c COMMAND EXECUTED */ 
     /* 2 TB HLRi VTP-1 SESSION=01547 USERID ENDED 2008-05-13 09:47:12 */[/B]
    if u see it's like 2 paragraphs
    i need to match the paragraphs as u can see each paragraph starts with "1 TB" and end with "2 TB" how can i match the one paragraph?

    previously i was using this to match but this one only go through one line
    how can i make it match a paragraph?

    [CODE=perl]
    foreach $data1 (@$data)
    {
    chomp ($data1);

    if ($data1 =~ /1 TBHLR.+word/){ ## i replace the word with anything i wanna match
    [/CODE]
    Last edited by eWish; May 13 '08, 12:55 PM. Reason: Added code tags
  • nithinpes
    Recognized Expert Contributor
    • Dec 2007
    • 410

    #2
    Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
    Code:
    open(IN,"data.txt")or die "failed:$!";
    @file=<IN>;
    foreach(@file) {
     if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
    unless(/2 TB/) {
       $str.=$_  ;     ## append lines until line with '2 TB' is reached
    } 
    else {
    $str.= $_ ; 
    push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
    undef $str;
    }  
    }
    
    #print "\n\n$_\n\n" foreach(@data);
    foreach $data1 (@data)
    {
    ##### $data1 will be a paragraph now
    
    }

    Comment

    • poolboi
      New Member
      • Jan 2008
      • 170

      #3
      great thanks!

      erm just some clarification

      the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

      and what's the use of putting undef $str??

      correct me if i'm wrong
      :)

      Comment

      • poolboi
        New Member
        • Jan 2008
        • 170

        #4
        Originally posted by nithinpes
        Since the paragraphs are not like typical paragraphs, one way to do it is that you need to process the array containing the file-input to get an array in which each element is a paragraph and not a line.
        Code:
        open(IN,"data.txt")or die "failed:$!";
        @file=<IN>;
        foreach(@file) {
         if(/1 TB/) {$str = $_ ; next;} ## look for lines with '1 TB'
        unless(/2 TB/) {
           $str.=$_  ;     ## append lines until line with '2 TB' is reached
        } 
        else {
        $str.= $_ ; 
        push @data,$str;   ## on reaching '2 TB' push the concatenated string to array
        undef $str;
        }  
        }
        
        #print "\n\n$_\n\n" foreach(@data);
        foreach $data1 (@data)
        {
        ##### $data1 will be a paragraph now
        
        }
        hm...nithinpes
        sorry, i think there could be a problem
        cos say i wanna just the paragraph from "1 TB" to "Command Execute"
        i just change to :

        [CODE=perl]
        foreach (@file)
        {
        if(/1 TB/) {$str = $_; next;}
        unless (/COMMAND EXECUTED/){
        $str.=$_;
        }else{
        $str.=$_;
        push @data,$str;
        undef $str;
        }
        }

        print "$_" foreach(@data);[/CODE]

        but it still prints everything as per normal
        it doesn't print the paragraph from "TB 1" to "COMMAND EXECUTED"

        Comment

        • poolboi
          New Member
          • Jan 2008
          • 170

          #5
          sorrie sorrie i define something wrongly in the earlier part of my script
          alright it's working
          thanks :)
          u can go back to my previous question on "undef" and ".="
          many thanks
          :)

          Comment

          • nithinpes
            Recognized Expert Contributor
            • Dec 2007
            • 410

            #6
            Originally posted by poolboi
            great thanks!

            erm just some clarification

            the opertor ".=" is an overload operator right? meaning to say it appends yr previous data

            and what's the use of putting undef $str??

            correct me if i'm wrong
            :)
            The operator ".=" is used in string concatention context.For ex:
            Code:
            $a = "Hello";
            $b= " poolboi";
            $c= $a.$b; 
            print $c;  ## prints "Hello poolboi"
            In this case,
            Code:
            $str.=$_;
            is equivalent to
            Code:
            $str=$str.=$_;
            Regarding 'undef $str'. This is used to undefine $str(delete previously stored value) inorder to avoid concatenating string for the next paragraph to previous paragraph(strin g).
            But this line is actually irrelevant in the above code as the line:
            Code:
            if(/1 TB/) {$str = $_ ; next;}
            will redefine/reassign the value from scratch whenever a line with '1 TB' pattern is seen.

            Comment

            • poolboi
              New Member
              • Jan 2008
              • 170

              #7
              hm..thanks for the explanation

              hm..just discovered a problem
              it's similar to on top

              i got these info in my textfile now

              [CODE=text]
              /* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
              /* 4 ABC:DEFG=123345 56,BSERV=T22; */
              /* 4c COMMAND EXECUTED */
              /* 1 TBHLR HLRi VTP-11 SESSION=01548 USERID=user STARTED 2008-05-
              [/CODE]

              wheni use this code:
              [CODE=perl]
              foreach $data1 (@$data)
              {
              chomp ($data1);
              if ($data1 =~ /4 \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
              print "$data1\n";
              }
              [/CODE]
              it returns me

              /* 4 ABC:DEFG=123345 56,BSERV=T22; */

              but when i use the paragraphing code...

              [CODE=perl]
              foreach (@file)
              {
              if(/3 SESSION/) {$str = $_; next;}
              unless (/4c COMMAND/){
              $str.=$_;

              }else{
              $str.=$_;
              push @data,$str;
              undef $str;
              }}

              foreach $data1 (@data)
              {
              if($data1 =~ /4c \D\D\D:/ && $data1 !~ /3 SESSION/ && $data1 !~ /4c COMMAND/){
              print "$data1\n";

              }}
              [/CODE]
              nothing returns

              is there a problem with the 2nd code? i suppose it's suppose to return the same for both

              Comment

              • nithinpes
                Recognized Expert Contributor
                • Dec 2007
                • 410

                #8
                That's the expected behaviour. In your first code, your are matching with one line at a time. Hence, it returns the line

                /* 4 ABC:DEFG=123345 56,BSERV=T22; */

                which fits all the three matching conditions that you have put.
                But in the second code, you are taking one paragraph at a time. So, for the first iteration $data1 will be
                /* 3 SESSION=01547 USERID=user 2008-05-13 09:46:12 */
                /* 4 ABC:DEFG=123345 56,BSERV=T22; */
                /* 4c COMMAND EXECUTED */

                which doesn't return true for all the 3 match conditions.
                Last edited by nithinpes; May 14 '08, 08:55 AM. Reason: removed initial quote

                Comment

                Working...