Changing Tag case, while ignoring tag attribute values.

**miller** · Jul 12 '07, 04:40 PM

First. Lower case is a better than upper case. It's just easier to read.

Second. Your html is malformed. You're missing the closing quote in the image tag.

Third. Always, always, always "use strict;". It will simply require you to declare your variables, but this is a good thing.

Fourth. Don't have your subroutines work on global variables. Always pass parameters to your subroutines. Always. It's the simpliest way to document what things are doing.

Fifth. You should probably do a little studying of regular expressions. I laud your attempt at using them to verify your data, but you need a little more knowledge.
perldoc perlrequick

Sixth. Your script won't currently edit the file. To do that you should read this. Specifically the second question is very relevant.
perldoc perlfaq5 Files and Formats

Finally, here is your script so modified. Note, I personally would simplify it more by removing the subroutines. But I left them in there to demonstrate better coding practices with regard to subs.

[CODE=perl]
#!/usr/local/bin/perl

# Upper case all the tags within an html file.

use Tie::File;

use strict;
use warnings;

# Get filename from the command line
my $filename = shift;

# Else prompt for it
if (!$filename) {
print "Enter name of file you wish to edit.\n";
chomp($filename = <STDIN>);
}

validate_extens ion($filename);
uc_tags($filena me);

sub validate_extens ion {
my $filename = shift or die "Filename required";
print $filename =~ m{^\w+\.html?$} i ? "pass\n" : "fail\n";
}

sub uc_tags {
my $filename = shift or die "Filename required";
tie my @array, 'Tie::File', $filename or die "Can't open $filename: $!";

foreach my $line (@array) {
$line =~ s{(</?\w+)}{\U$1}g;
}
}

1;

__END__
[/CODE]

- Miller

**KevinADC** · Jul 12 '07, 04:47 PM

the vaidation of the file name is not really very good:

Code:

if ($filename =~ m/\w{1,}[.]{1}[htmlHTML]{3,4}$/)

you are using a character class incorrectly: [htmlHTML]

anything inside a character class is not interpreted as a string but as individual characters in any order. So a file ext of .HhH will match as well as .html or any other valid html extension. All you really need is:

Code:

if ($filename =~ m/^.*?\.html?$/i)

to see if a file is named with a .htm or .html extension."i" ignores case so upper and lower case will match.

Now later you have:

Code:

            $_ =~ tr/[a-z]/[A-Z]$/;

the tr opeator does not recognize the use of [] as a character class and the "$" on the end on the replacement side is doing nothing. "tr" has no concept of anchors (^$) like "m" and "s" do.

You shoud just use a "range", which "tr" does understand

Code:

           $_ =~ tr/a-z/A-Z/;

Also, you have not attempted to differentiate between html tags and text at all. I realize that is where you are confused, but I would be more comfortable helping you with your course work if I saw some attempt to do so.

**miller** · Jul 12 '07, 05:18 PM

Sorry Kevin. I failed to clue in to the fact that this was most likely homework. I was busy remembering some unnamed obsessive compulsive programmer creating such a script back in the day to change all tag casing to lower case, as it should be.

What do you think? Delete my provided code?

- Miller

**KevinADC** · Jul 12 '07, 05:25 PM

No, don't delete your code. The OP has at least posted some code so appears to be making an effort.

I wonder who this could be: unnamed obessive compulsive programmer ;)

**miller** · Jul 12 '07, 05:32 PM

Sure thing.

I went ahead and added a little comment at the top of the script to state what it does. The number of times I've reopened a script having new clue what it accomplishes... oi.

- M

**bluemaxx** · Jul 12 '07, 09:16 PM

Thank you for the help gents, much appreciated.

Changing Tag case, while ignoring tag attribute values.

Changing Tag case, while ignoring tag attribute values.

Comment

Comment

Comment

Comment

Comment

Comment