Counting in while loop splitting

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • kumarboston
    New Member
    • Sep 2007
    • 55

    Counting in while loop splitting

    Hi All,
    I have an output data from CHARMM program which I am trying to parse. So, there are three variable in my output program "HEAD, TAIL, WAT" on which I have to count the number of occurence each time and print the values.
    I have attached the output data of the program and my script file.
    [code=data]
    CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
    MEMB POPC 41 O3 1.0000
    CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
    MEMB POPC 30 C21 2.0000
    MEMB POPC 30 O22 3.0000
    MEMB POPC 30 C22 1.0000
    MEMB POPC 41 C3 3.0000
    MEMB POPC 41 O31 2.0000
    MEMB POPC 41 C31 2.0000
    MEMB POPC 41 O32 3.0000
    CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
    TIP3 TIP3 257 OH2 4.0000
    TIP3 TIP3 524 OH2 3.0000
    TIP3 TIP3 3687 OH2 2.0000
    TIP3 TIP3 3798 OH2 7.0000
    TIP3 TIP3 4038 OH2 3.0000
    TIP3 TIP3 5218 OH2 3.0000
    TIP3 TIP3 7177 OH2 1.0000
    CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
    CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
    MEMB POPC 30 C21 1.0000
    MEMB POPC 30 O22 2.0000
    MEMB POPC 30 C22 2.0000
    MEMB POPC 41 C3 7.0000
    MEMB POPC 41 O31 5.0000
    MEMB POPC 41 C31 3.0000
    MEMB POPC 41 O32 3.0000
    MEMB POPC 41 C32 1.0000
    CHARMM> coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
    TIP3 TIP3 524 OH2 1.0000
    TIP3 TIP3 2474 OH2 1.0000
    TIP3 TIP3 3687 OH2 1.0000
    TIP3 TIP3 3798 OH2 7.0000
    TIP3 TIP3 4038 OH2 4.0000
    TIP3 TIP3 5196 OH2 1.0000
    TIP3 TIP3 5218 OH2 2.0000
    TIP3 TIP3 7177 OH2 2.0000
    CHARMM> coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
    MEMB POPC 41 O3 2.0000
    CHARMM> coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
    MEMB POPC 30 C21 1.0000
    MEMB POPC 30 O22 3.0000
    MEMB POPC 30 C22 2.0000
    MEMB POPC 41 C3 5.0000
    MEMB POPC 41 O31 1.0000
    MEMB POPC 41 C31 2.0000
    MEMB POPC 41 O32 2.0000
    [/code]

    here is my perl code
    [code=perl]
    #!/usr/bin/perl
    use strict;
    use warnings;

    my $i = 0;
    my ($cnt1, $cnt2, $cnt3) = 0;
    my $temp = "temp.dat";

    open (A,"<$temp");
    while(my $line = <A>)
    {
    if($line=~/ MEMB\s+POPC\s+\ S+\s+\S+\s+(\S+ )/)
    {
    if($1 > 0)
    {
    $cnt1++;
    }
    }
    elsif($line=~/ MEMB\s+POPC\s+\ S+\s+\S+\s+(\S+ )/)
    {
    if($1 > 0)
    {
    $cnt2++;
    }
    }
    elsif($line=~/ TIP3\s+TIP3\s+\ S+\s+\S+\s+(\S+ )/)
    {
    if($1 > 0)
    {
    $cnt3++;
    }
    }
    elsif($line=~/coor contact cut 4.5 sele HEAD/)
    {
    printf "%4d %5d %5d\n",$i,$cnt1 ,$cnt2,$cnt3;
    $cnt1=0;$cnt2=0 ;$cnt3=0;
    $i++;
    }
    else
    {
    next;
    }
    }
    printf "%4d %5d %5d\n",$i,$cnt1 ,$cnt2;$cnt3;
    [/code]

    As you can see from the data and code, I am trying to parse the content of each group (HEAD, TAIL, WAT) and writing the data, the problem I am facing is not able to count for the HEAD and the TAIL portion of the data as the regular expression i am using is not correct. Any help on this will be appreciated.

    Thanks
  • RonB
    Recognized Expert Contributor
    • Jun 2009
    • 589

    #2
    The regex's aren't the problem, it's the logic.

    If you data is as consistent as the example, you could go this route, if not, then we'd need to make a minor change.

    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use Data::Dumper;
    
    $/ = "\n CHARMM>";
    
    my %count;
    while ( <DATA> ) {
        chomp;
        my ($head, @data) = split /\n/;
        my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
        $count{$key}++ for @data;
    }
    
    print Dumper \%count;
    
    
    __DATA__
    CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
     MEMB POPC 41   O3             1.0000
     CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
     MEMB POPC 30   C21            2.0000
     MEMB POPC 30   O22            3.0000
     MEMB POPC 30   C22            1.0000
     MEMB POPC 41   C3             3.0000
     MEMB POPC 41   O31            2.0000
     MEMB POPC 41   C31            2.0000
     MEMB POPC 41   O32            3.0000
     CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
     TIP3 TIP3 257  OH2            4.0000
     TIP3 TIP3 524  OH2            3.0000
     TIP3 TIP3 3687 OH2            2.0000
     TIP3 TIP3 3798 OH2            7.0000
     TIP3 TIP3 4038 OH2            3.0000
     TIP3 TIP3 5218 OH2            3.0000
     TIP3 TIP3 7177 OH2            1.0000
     CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
     CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
     MEMB POPC 30   C21            1.0000
     MEMB POPC 30   O22            2.0000
     MEMB POPC 30   C22            2.0000
     MEMB POPC 41   C3             7.0000
     MEMB POPC 41   O31            5.0000
     MEMB POPC 41   C31            3.0000
     MEMB POPC 41   O32            3.0000
     MEMB POPC 41   C32            1.0000
     CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
     TIP3 TIP3 524  OH2            1.0000
     TIP3 TIP3 2474 OH2            1.0000
     TIP3 TIP3 3687 OH2            1.0000
     TIP3 TIP3 3798 OH2            7.0000
     TIP3 TIP3 4038 OH2            4.0000
     TIP3 TIP3 5196 OH2            1.0000
     TIP3 TIP3 5218 OH2            2.0000
     TIP3 TIP3 7177 OH2            2.0000
     CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
     MEMB POPC 41   O3             2.0000
     CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
     MEMB POPC 30   C21            1.0000
     MEMB POPC 30   O22            3.0000
     MEMB POPC 30   C22            2.0000
     MEMB POPC 41   C3             5.0000
     MEMB POPC 41   O31            1.0000
     MEMB POPC 41   C31            2.0000
     MEMB POPC 41   O32            2.0000
    Outputs:
    Code:
    $VAR1 = {
              'TAIL' => 22,
              'WAT' => 15,
              'HEAD' => 2
            };

    Comment

    • kumarboston
      New Member
      • Sep 2007
      • 55

      #3
      Thanks Ron for the reply,
      I agree with you and tried your suggestion in my script. My data file is regular and I am trying to print the values of HEAD, TAIL and WAT each time the program encounters if in a set of three. So for example, the final processing output be like this: 3 column -> 1st one for HEAD, 2nd one for TAIL and 3rd one for WAT and the parsing should be done in the group of three, so for example when the program matches HEAD, TAIL and WAT first time then it will be row 1 and again matches then row number 2 and so on..., if there is no entry for any then it should print zero there.

      I hope i am clear what i am stating above.
      Thanks once again.

      Comment

      • RonB
        Recognized Expert Contributor
        • Jun 2009
        • 589

        #4
        The code I posted will do that. The only thing I left out was the 2 printf statements (the 1st one being inside a conditional block) and setting the initial values to 0.

        Which part are you having trouble accomplishing?

        The more I look at this, the more it looks like your homework assignment and not a real world problem that you need to solve.

        Comment

        • kumarboston
          New Member
          • Sep 2007
          • 55

          #5
          Hi RonB,
          Thanks for the clarification. This is not an homework assinment, my output file from the CHARMM program is 4.8 GB and from that I am extracting the data for my research project.
          I am having trouble in printing the values for each row of data, I tried to print iniside the "while loop" and it all printed in cummulative sum of the counts.

          Thanks

          Comment

          • RonB
            Recognized Expert Contributor
            • Jun 2009
            • 589

            #6
            This assumes that 'WAT' is not the last group in the file, which is what your sample and code suggested.

            Code:
            #!/usr/bin/perl
             
            use strict;
            use warnings;
            use Data::Dumper;
             
            $/ = "\n CHARMM>";
            
            printf "%4s  %5s  %5s\n", 'HEAD', 'TAIL', 'WAT';
            
            my %count;
            while ( <DATA> ) {
                chomp;
                my ($head, @data) = split /\n/;
                my ($key) = $head =~ /(HEAD|TAIL|WAT)/;
            
                if ($key eq 'HEAD') {  # assign default value of 0 for each key
            
                    # this can be done in the other if block,
                    # but I think it makes more sense here
                    $count{$_} = 0 for keys %count;
                }
                $count{$key}++ for @data;
            
                if ($key eq 'WAT') {
                    printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
                }
            }
            printf "%4d  %5d  %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
            
            
            __DATA__
            CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
             MEMB POPC 41   O3             1.0000
             CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
             MEMB POPC 30   C21            2.0000
             MEMB POPC 30   O22            3.0000
             MEMB POPC 30   C22            1.0000
             MEMB POPC 41   C3             3.0000
             MEMB POPC 41   O31            2.0000
             MEMB POPC 41   C31            2.0000
             MEMB POPC 41   O32            3.0000
             CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
             TIP3 TIP3 257  OH2            4.0000
             TIP3 TIP3 524  OH2            3.0000
             TIP3 TIP3 3687 OH2            2.0000
             TIP3 TIP3 3798 OH2            7.0000
             TIP3 TIP3 4038 OH2            3.0000
             TIP3 TIP3 5218 OH2            3.0000
             TIP3 TIP3 7177 OH2            1.0000
             CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
             CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
             MEMB POPC 30   C21            1.0000
             MEMB POPC 30   O22            2.0000
             MEMB POPC 30   C22            2.0000
             MEMB POPC 41   C3             7.0000
             MEMB POPC 41   O31            5.0000
             MEMB POPC 41   C31            3.0000
             MEMB POPC 41   O32            3.0000
             MEMB POPC 41   C32            1.0000
             CHARMM>    coor contact cut 4.5 sele WAT end sele RES .and. .not. WAT end
             TIP3 TIP3 524  OH2            1.0000
             TIP3 TIP3 2474 OH2            1.0000
             TIP3 TIP3 3687 OH2            1.0000
             TIP3 TIP3 3798 OH2            7.0000
             TIP3 TIP3 4038 OH2            4.0000
             TIP3 TIP3 5196 OH2            1.0000
             TIP3 TIP3 5218 OH2            2.0000
             TIP3 TIP3 7177 OH2            2.0000
             CHARMM>    coor contact cut 4.5 sele HEAD end sele RES .and. .not. HEAD end
             MEMB POPC 41   O3             2.0000
             CHARMM>    coor contact cut 4.5 sele TAIL end sele RES .and. .not. TAIL end
             MEMB POPC 30   C21            1.0000
             MEMB POPC 30   O22            3.0000
             MEMB POPC 30   C22            2.0000
             MEMB POPC 41   C3             5.0000
             MEMB POPC 41   O31            1.0000
             MEMB POPC 41   C31            2.0000
             MEMB POPC 41   O32            2.0000
            Outputs:
            Code:
            C:\TEMP>kumarboston.pl
            HEAD   TAIL    WAT
               1      7      7
               0      8      8
               1      7      0

            Comment

            • kumarboston
              New Member
              • Sep 2007
              • 55

              #7
              Hi RonB,
              I tried to run the script using your suggestions but somehow it is printing all zero values.
              I have attached the data file, and the script file also. The data file wil always be in a group of three, HEAD, TAIL,and WAT.

              [code=perl]
              #!/usr/bin/perl

              use strict;
              use warnings;
              use Data::Dumper;

              open (DATA, "file.txt") ;

              printf "%4s %5s %5s\n", 'HEAD', 'TAIL', 'WAT';

              my %count;
              while ( <DATA> ) {
              chomp;
              my ($head, @data) = split /\n/;
              my ($key) = $head =~ /(HEAD|TAIL|WAT)/;

              if ($key eq 'HEAD') { # assign default value of 0 for each key this can be done in the other if block, but I think it makes more sense here
              $count{$_} = 0 for keys %count;
              }
              $count{$key}++ for @data;

              if ($key eq 'WAT') {
              printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
              }
              }
              printf "%4d %5d %5d\n", $count{HEAD}, $count{TAIL}, $count{WAT};
              [/code]

              Thanks
              Attached Files

              Comment

              • RonB
                Recognized Expert Contributor
                • Jun 2009
                • 589

                #8
                Compare the code you posted to what I posted and you'll see that you're missing a very important line.

                DATA is one of Perl's special built-in filehandles and it's best not to use it when opening your file.

                Use a lexical var for the filehandle and use the 3 arg form of open.

                When opening a filehandle, you should always check the return code to verify that it was successful and take proper action if it failed.

                Code:
                my $file = "file.txt";
                open my $data_FH, '<', $file or die "failed to open <$file> $!";
                The file you attached has a 'WAT' section as its last block. Will that always be true in your actual complete data file? As it is right now, the last 2 rows of output will be duplicates.

                Comment

                • kumarboston
                  New Member
                  • Sep 2007
                  • 55

                  #9
                  Yes, the last section will always be having WAT section whether the data is there or not."
                  Thanks

                  Comment

                  • RonB
                    Recognized Expert Contributor
                    • Jun 2009
                    • 589

                    #10
                    In that case, remove that last printf statement, it's redundant.

                    Comment

                    • RonB
                      Recognized Expert Contributor
                      • Jun 2009
                      • 589

                      #11
                      After thinking about it, the assignment of the default hash values should be moved into the 'WAT' if block, or we need to define the 3 hash keys prior to the while loop.

                      Comment

                      Working...