Counting Punctuation Characters in a text file

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • watcher00
    New Member
    • Mar 2007
    • 4

    Counting Punctuation Characters in a text file

    Hi

    I'm a complete newbie at Perl and i was wondering if i can get some help completing an exercise i've come across.

    I need to count the punctuation marks from a text file and then output a list of all occurring punctuation marks with its frequency printed next to it.

    eg

    2 hyphens
    3 commas
    4 periods
    etc
    etc

    Not even sure where to begin, the exercise does give a hint but i'm not sure how that helps me at all, it says: "you will need to have some way of tying in the punctuation characters to its equivalent English name. Think ASCII and hash. Only those punctuation characters within a standard ASCII character set need to be checked for"

    Any help with this would be appreciated.

    Thanks
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    a place to start:

    How-can-I-count-the-number-of-occurrences-of-a-substring-within-a-string

    from there maybe you can figure out the "Think ASCII and hash" part.

    Comment

    • docsnyder
      New Member
      • Dec 2006
      • 88

      #3
      @watcher00

      I closed "http://perldoc.perl.or g/perlfaq4.html#H ow-can-I-count-the-number-of-occurrences-of-a-substring-within-a-string%3f" very fast, because it's not quite a "quick" reference. You should study it, of course, but in order to get immediate help, take this as a hint on how to proceed:

      Code:
      $text  = "Hello! This is a sentence, and an example. Two commas, one exclamation mark and two dots.";
      @marks = ( '\.', ',', '!' );
      
      for $mark ( @marks ) {
        $count{$mark} = () = $text =~ m(($mark))g;
        printf("mark '$mark' occurs %d times\n", $count{$mark});
      }
      But, be aware about meta characters of perl, which should be escaped (like "\.")!

      Enjoy!

      Greetz, Doc

      Comment

      • watcher00
        New Member
        • Mar 2007
        • 4

        #4
        Thanks guys

        here is what i've come up with, would appreciate any comments and suggestions.

        Code:
        #!c:/Perl/bin/perl.exe
        
        print("Please type in the file name\n");
        $file= <STDIN>;
        open(FILE, "$file") || die "Couldn't open file: $!";
        
        
        $text  = join(//,<FILE>);
        @marks = ( '!', '"', '\'', '\(', '\)', ',', '-', '\.', '/', ':', ';', '\?' );
        
        %names = (
        	"!" => "exclamation marks",
        	"\"" => "double quotes",
        	"\'" => "single quotes",
        	"\\(" => "opening parenthesis",
        	"\\)" => "closing parenthesis",
        	"," => "commas",
        	"-" => "hyphens",
        	"\\." => "periods",
        	"/" => "forward slashs",
        	":" => "colons",
        	";" => "semi-colons",
        	"\\?" => "question marks"
        );
        
        for $mark ( @marks ) 
        
        {
        	$count{$mark} = () = $text =~ m(($mark))g;
        	printf("%d $names{$mark}\n", $count{$mark});
        }

        Comment

        • miller
          Recognized Expert Top Contributor
          • Oct 2006
          • 1086

          #5
          Looks good watcher00.

          A few stylistic changes that I would suggest:

          1) Always "use strict;" It's just a good habit to always follow
          2) join takes a string separator, not a regular expression. use an empty string instead of an empty pattern, as in your code each of the lines of the file will be joined by '1'.
          3) Take advantage of the \Q alias for quotemeta to avoid having to escape the meta characters in the regex manually.
          4) This is a personal preference, but I never use () as the delimiter for a pattern. I much prefer to either use // or {} as I think this is easier to read.
          5) Take advantage of keys so you don't have to define your list more than once.

          These stylistic changes will result in the follow code:

          Code:
          #!c:/Perl/bin/perl.exe
          
          use strict;
          
          print("Please type in the file name\n");
          my $file = <STDIN>;
          
          open(FILE, "$file") || die "Couldn't open file: $!";
          my $text = join '', <FILE>;
          close FILE;
          
          my %names = (
          	'!' => "exclamation marks",
          	'"' => "double quotes",
          	"'" => "single quotes",
          	'(' => "opening parenthesis",
          	')' => "closing parenthesis",
          	',' => "commas",
          	'-' => "hyphens",
          	'.' => "periods",
          	'/' => "forward slashs",
          	':' => "colons",
          	';' => "semi-colons",
          	'?' => "question marks"
          );
          
          my %count;
          
          for my $mark ( keys %names ) {
          	$count{$mark} = () = $text =~ m/(\Q$mark\E)/g;
          	printf("%d $names{$mark}\n", $count{$mark});
          }
          - Miller

          Comment

          • watcher00
            New Member
            • Mar 2007
            • 4

            #6
            Thanks a lot guys for your help, as i'm still learning perl, i'm not too sure how exactly this line of code works:

            Code:
            $count{$mark} = () = $text =~ m/(\Q$mark\E)/g
            specifically, why does the " = () = " part need to be in there, what does it do?

            Comment

            • miller
              Recognized Expert Top Contributor
              • Oct 2006
              • 1086

              #7
              Ok, let's briefly talk about "$text =~ m/(\Q$mark\E)/g":

              In a scalar or condition context, this will return true or false if the regex matches. In a while loop it will continue to return true until the last match is made because of the global 'g' modifier.

              Code:
              if ($text =~ m/(\Q$mark\E)/g)
              or
              Code:
              while ($text =~ m/(\Q$mark\E)/g) {
              In a list context, this statement will return all of the captured groups.

              Code:
              my @marks = ($text =~ m/(\Q$mark\E)/g);
              Therefore, what the statement you're asking about is doing is tricking the regex to return in a list context, and then assigning that list to a scalar, which essentially returns the number of elements in the list, in other words the number of matches.

              Code:
              $count{$mark} = () = $text =~ m/(\Q$mark\E)/g
              is equivalent to:

              Code:
              my @array = ($text =~ m/(\Q$mark\E)/g);
              $count{$mark} = scalar(@array);
              As always, to better understand the code, try experimenting:

              Code:
              my $test = 'this is a test. of foo. of bar. of baz. boo yay!';
              my @array = $test =~ m/(\Q.\E)/g;
              my $count = @array;
              print "$count\n";
              # Outputs 4
              - Miller

              Comment

              Working...