Analysing text files with Perl

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Davo1977
    New Member
    • Jun 2008
    • 3

    Analysing text files with Perl

    Analysing text files to obtain statistics on their content

    You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows:

    1) When run, the program should check if an argument has been provided. If not, the program should prompt for, and accept input of, a filename from the keyboard.

    2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format. The filename part should be no longer than 8 characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters. The file extension should be optional, but if given is should be ".TXT" (upper- or lowercase).

    If no extension if given, ".TXT" should be added to the end of the filename. So, for example, if "testfile" is input as the filename, this should become "testfile.T XT". If "input.txt" is entered, this should remain unchanged.

    3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point.

    4) The program should then check to see if the file exists using the filename provided. If the file does not exist, a suitable error message should be displayed and the program should end at this point.

    5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end.

    6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.



    Here is the code I have done so far and it doesn't seem to work. Can anybody see why??

    Code:
    #usr/bin/perl 
    
    use strict; 
    use warnings; 
    
    if ($#ARGV == -1) #no filename provided as a command line argument. 
    { 
    print("Please enter a filename: "); 
    $filename = <STDIN>; 
    chomp($filename); 
    } 
    else #got a filename as an argument. 
    { 
    $filename = $ARGV[0]; 
    } 
    
    #perform the specified checks 
    #check if filename is valid, exit if not 
    if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) 
    { 
    die("File format not valid\n");) 
    } 
    
    if ($filename !~ m/\.TXT$/i) 
    { 
    $filename .= ".TXT"; 
    } 
    
    #check if filename is actual file, exit if it is. 
    if (-e $filename) 
    { 
    die("File does not exist\n"); 
    } 
    
    #check if filename is empty, exit if it is. 
    if (-s $filename) 
    { 
    die("File is empty\n"); 
    } 
    
    my $i = 0; 
    my $p = 1; 
    my $words = 0; 
    my $chars = 0; 
    
    open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; 
    
    #then use a while loop and series of if statements similar to the following 
    while (<READFILE>) { 
    chomp;    #removes the input record Separator 
    $i = $.;    #"$". is the input record line numbers, $i++ will also work 
    $p++ if (m/^$/);   #count paragraphs 
    split (/\s+/);    #split sentences into "words" 
    $words++     #count all characters except spaces and add to $chars 
    $chars += tr/ //c;     #tr/ //c replaces everything in the string with itself, except spaces, and returns the number of such characters replaced 
    } 
    
    
    #display results 
    print "There are $i lines in $data1\n"; 
    print "There are $p Paragraphs in $data1\n"; 
    print "There are $words in $data1\n"; 
    print "There are $chars in $data1\n"; 
    
    close(READFILE);
    Last edited by numberwhun; Jun 25 '08, 12:33 PM. Reason: Please use code tags
  • numberwhun
    Recognized Expert Moderator Specialist
    • May 2007
    • 3467

    #2
    First off, the first line in your program is incorrect. You have:

    Code:
    #usr/bin/perl
    when what you should have is:

    Code:
    #!/usr/bin/perl
    The first two characters are the "#" and "!". They make up the "shbang" (hashbang). You also need the "/" at the beginning of the path as well.

    Another issue that I see is:

    Code:
    split (/\s+/);    #split sentences into "words"
    This will definitely throw you an error. You are doing a split, but into what? You need to have an array set equal to this, as so:

    Code:
    my @words;
    
    @words = split (/\s+/);
    If you don't have something to put the "words" into, an exception is thrown.


    Other than that I have two notes. The first, is please use code tags any time you include code in your posting. You can see them in your original posting if you edit it. They are required and not optional in the forums.

    Second, when you say you are "getting an error" or "the code produces and error", it is customary to please include the error(s) that you are seeing as we cannot see them otherwise.

    Regards,

    Jeff

    Comment

    • Davo1977
      New Member
      • Jun 2008
      • 3

      #3
      Analysing text files with Perl

      Analysing text files to obtain statistics on their content

      You are to write a Perl program that analyses text files to obtain statistics on their content. The program should operate as follows:

      1) When run, the program should check if an argument has been provided. If not, the program should prompt for, and accept input of, a filename from the keyboard.

      2) The filename, either passed as an argument or input from the keyboard, should be checked to ensure it is in MS-DOS format. The filename part should be no longer than 8 characters and must begin with a letter or underscore character followed by up to 7 letters, digits or underscore characters. The file extension should be optional, but if given is should be ".TXT" (upper- or lowercase).

      If no extension if given, ".TXT" should be added to the end of the filename. So, for example, if "testfile" is input as the filename, this should become "testfile.T XT". If "input.txt" is entered, this should remain unchanged.

      3) If the filename provided is not of the correct format, the program should display a suitable error message and end at this point.

      4) The program should then check to see if the file exists using the filename provided. If the file does not exist, a suitable error message should be displayed and the program should end at this point.

      5) Next, if the file exists but the file is empty, again a suitable error message should be displayed and the program should end.

      6) The file should be read and checked to display crude statistics on the number of characters, words, lines, sentences and paragraphs that are within the file.



      I am very new to Perl and have managed to compile this code using examples from various books. Could anyone oversee this coding and see how it could be improved.

      Code:
      #!/usr/bin/perl 
      
      use strict; 
      use warnings; 
      
      if ($#ARGV == -1) #no filename provided as a command line argument. 
      { 
      print("Please enter a filename: "); 
      $filename = <STDIN>; 
      chomp($filename); 
      } 
      else #got a filename as an argument. 
      { 
      $filename = $ARGV[0]; 
      } 
      
      #perform the specified checks 
      #check if filename is valid, exit if not 
      if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) 
      { 
      die("File format not valid\n");) 
      } 
      
      if ($filename !~ m/\.TXT$/i) 
      { 
      $filename .= ".TXT"; 
      } 
      
      #check if filename is actual file, exit if it is. 
      if (-e $filename) 
      { 
      die("File does not exist\n"); 
      } 
      
      #check if filename is empty, exit if it is. 
      if (-s $filename) 
      { 
      die("File is empty\n"); 
      } 
      
      my $i = 0; 
      my $p = 1; 
      my $words = 0; 
      my $chars = 0; 
      
      open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; 
      
      #then use a while loop and series of if statements similar to the following 
      while (<READFILE>) { 
      chomp; #removes the input record Separator 
      $i = $.; #"$". is the input record line numbers, $i++ will also work 
      $p++ if (m/^$/); #count paragraphs 
      $my @t = split (/\s+/); #split sentences into "words" 
      $words += @t; #add count to $words 
      $chars += tr/ //c; #tr/ //c count all characters except spaces and add to $chars 
      } 
      
      
      #display results 
      print "There are $i lines in $data1\n"; 
      print "There are $p Paragraphs in $data1\n"; 
      print "There are $words in $data1\n"; 
      print "There are $chars in $data1\n"; 
      
      close(READFILE);
      Last edited by numberwhun; Jun 25 '08, 02:16 PM. Reason: again, please use code tags

      Comment

      • numberwhun
        Recognized Expert Moderator Specialist
        • May 2007
        • 3467

        #4
        First, I distinctly remember asking you to use code tags when posting code in the forums. I even mentioned that it was not an option, but instead a requirement that they be used. This is no longer me asking, this is your only warning. PLEASE use code tags when posting code in the forums!

        Second, DO NOT start a new thread on the same exact topic as you previously posted. Simply reply to your post and post your additions. I have merged your two threads accordingly.

        Please be sure and read the Guidelines that are posted at the top of this forum as they will tell you the proper way to post in the forums.

        As for your issue, it looks as though this is school work, especially since it is formatted as a homework question. It is against this sites guidelines to post your homework here in hopes of getting us to do it for you. Other than the issue you had before, you did not mention any errors this time, so we don't have anything to fix. I believe that optimizing this code is probably part of your assignment (especially since you copied it out of books). You should learn the basics of Perl and examine the code and see if you can first find any ways to optimize it.

        Please heed the warning(s) I have provided above as well.

        Regards,

        Jeff

        Comment

        • nithinpes
          Recognized Expert Contributor
          • Dec 2007
          • 410

          #5
          There are many errors in your code. In line :
          Code:
          open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!";
          where are you getting $data1 from? You have assigned filename to $fileneme.
          The following line is not correct.
          Code:
          $my @t = split (/\s+/);
          The loop for testing file format should come after loop for appending .TXT. Also, there are logical errors in loops for testing existence and size of file. You should use 'unless' instead of 'if'.


          Modify the script as below:
          [CODE=perl]
          #!usr/bin/perl

          use strict;
          use warnings;

          my $filename;
          my $i = 0;
          my $p = 1;
          my $words = 0;
          my $chars = 0;
          if ($#ARGV == -1) #no filename provided as a command line argument.
          {
          print("Please enter a filename: ");
          $filename = <STDIN>;
          chomp($filename );
          }
          else #got a filename as an argument.
          {
          $filename = $ARGV[0];
          }

          if ($filename !~ m/\.TXT$/i)
          {
          $filename .= ".txt";
          }
          #perform the specified checks
          #check if filename is valid, exit if not
          if ($filename !~ m/^[a-z]{1,7}\.TXT$/i)
          {
          die("File format not valid\n");
          }



          #check if filename is actual file, exit if it is.
          unless (-e $filename)
          {
          die("File does not exist\n");
          }

          #check if filename is empty, exit if it is.
          unless (-s $filename)
          {
          die("File is empty\n");
          }



          open(READFILE, "<$filename ") or die "Can't open file $filename: $!";

          #then use a while loop and series of if statements similar to the following
          while (<READFILE>) {
          chomp; #removes the input record Separator
          $i = $.;
          ##
          $p++ if (m/^$/); #count paragraphs
          my @t = split (/\s+/); #split sentences into "words"
          $words += @t; #add count to $words
          $chars += tr/ //c; #tr/ //c replaces everything in the string with itself, except spaces, and returns the number of such characters replaced
          }


          #display results
          print "There are $i lines in $filename\n";
          print "There are $p Paragraphs in $filename\n";
          print "There are $words words in $filename\n";
          print "There are $chars characters in $filename\n";

          close(READFILE) ;
          [/CODE]

          Comment

          • KevinADC
            Recognized Expert Specialist
            • Jan 2007
            • 4092

            #6
            Originally posted by numberwhun
            Second, DO NOT start a new thread on the same exact topic as you previously posted. Simply reply to your post and post your additions. I have merged your two threads accordingly.


            Regards,

            Jeff
            He did the same on the perlguru forum. Obnoxious.

            Comment

            Working...