Counting Punctuation Characters in a text file

**KevinADC** · Mar 6 '07, 10:42 PM

a place to start:

How-can-I-count-the-number-of-occurrences-of-a-substring-within-a-string

from there maybe you can figure out the "Think ASCII and hash" part.

**docsnyder** · Mar 7 '07, 09:06 PM

@watcher00

I closed "http://perldoc.perl.or g/perlfaq4.html#H ow-can-I-count-the-number-of-occurrences-of-a-substring-within-a-string%3f" very fast, because it's not quite a "quick" reference. You should study it, of course, but in order to get immediate help, take this as a hint on how to proceed:

Code:

$text  = "Hello! This is a sentence, and an example. Two commas, one exclamation mark and two dots.";
@marks = ( '\.', ',', '!' );

for $mark ( @marks ) {
  $count{$mark} = () = $text =~ m(($mark))g;
  printf("mark '$mark' occurs %d times\n", $count{$mark});
}

But, be aware about meta characters of perl, which should be escaped (like "\.")!

Enjoy!

Greetz, Doc

**watcher00** · Mar 7 '07, 11:25 PM

Thanks guys

here is what i've come up with, would appreciate any comments and suggestions.

Code:

#!c:/Perl/bin/perl.exe

print("Please type in the file name\n");
$file= <STDIN>;
open(FILE, "$file") || die "Couldn't open file: $!";


$text  = join(//,<FILE>);
@marks = ( '!', '"', '\'', '\(', '\)', ',', '-', '\.', '/', ':', ';', '\?' );

%names = (
	"!" => "exclamation marks",
	"\"" => "double quotes",
	"\'" => "single quotes",
	"\\(" => "opening parenthesis",
	"\\)" => "closing parenthesis",
	"," => "commas",
	"-" => "hyphens",
	"\\." => "periods",
	"/" => "forward slashs",
	":" => "colons",
	";" => "semi-colons",
	"\\?" => "question marks"
);

for $mark ( @marks ) 

{
	$count{$mark} = () = $text =~ m(($mark))g;
	printf("%d $names{$mark}\n", $count{$mark});
}

**miller** · Mar 7 '07, 11:42 PM

Looks good watcher00.

A few stylistic changes that I would suggest:

1) Always "use strict;" It's just a good habit to always follow
2) join takes a string separator, not a regular expression. use an empty string instead of an empty pattern, as in your code each of the lines of the file will be joined by '1'.
3) Take advantage of the \Q alias for quotemeta to avoid having to escape the meta characters in the regex manually.
4) This is a personal preference, but I never use () as the delimiter for a pattern. I much prefer to either use // or {} as I think this is easier to read.
5) Take advantage of keys so you don't have to define your list more than once.

These stylistic changes will result in the follow code:

Code:

#!c:/Perl/bin/perl.exe

use strict;

print("Please type in the file name\n");
my $file = <STDIN>;

open(FILE, "$file") || die "Couldn't open file: $!";
my $text = join '', <FILE>;
close FILE;

my %names = (
	'!' => "exclamation marks",
	'"' => "double quotes",
	"'" => "single quotes",
	'(' => "opening parenthesis",
	')' => "closing parenthesis",
	',' => "commas",
	'-' => "hyphens",
	'.' => "periods",
	'/' => "forward slashs",
	':' => "colons",
	';' => "semi-colons",
	'?' => "question marks"
);

my %count;

for my $mark ( keys %names ) {
	$count{$mark} = () = $text =~ m/(\Q$mark\E)/g;
	printf("%d $names{$mark}\n", $count{$mark});
}

- Miller

**watcher00** · Mar 8 '07, 02:07 PM

Thanks a lot guys for your help, as i'm still learning perl, i'm not too sure how exactly this line of code works:

Code:

$count{$mark} = () = $text =~ m/(\Q$mark\E)/g

specifically, why does the " = () = " part need to be in there, what does it do?

**miller** · Mar 8 '07, 11:50 PM

Ok, let's briefly talk about "$text =~ m/(\Q$mark\E)/g":

In a scalar or condition context, this will return true or false if the regex matches. In a while loop it will continue to return true until the last match is made because of the global 'g' modifier.

Code:

if ($text =~ m/(\Q$mark\E)/g)

or

Code:

while ($text =~ m/(\Q$mark\E)/g) {

In a list context, this statement will return all of the captured groups.

Code:

my @marks = ($text =~ m/(\Q$mark\E)/g);

Therefore, what the statement you're asking about is doing is tricking the regex to return in a list context, and then assigning that list to a scalar, which essentially returns the number of elements in the list, in other words the number of matches.

Code:

$count{$mark} = () = $text =~ m/(\Q$mark\E)/g

is equivalent to:

Code:

my @array = ($text =~ m/(\Q$mark\E)/g);
$count{$mark} = scalar(@array);

As always, to better understand the code, try experimenting:

Code:

my $test = 'this is a test. of foo. of bar. of baz. boo yay!';
my @array = $test =~ m/(\Q.\E)/g;
my $count = @array;
print "$count\n";
# Outputs 4

- Miller

Counting Punctuation Characters in a text file

Counting Punctuation Characters in a text file

Comment

Comment

Comment

Comment

Comment

Comment