This is the thread I will use to post the article I have claimed in the Experts forum. If you don't know what that means, don't worry about it. Suffice it to say after I get my working draft completed I will post it in this thread for review. Members of this forum that are "Experts" should visit the "Experts" forum to get up to speed.
Building a File Download Script with Perl
Collapse
X
-
I realize we don't have too many perl coders here, but any and all feedback will be appreciated. Here is my working draft so far. I only need to add the resources section (links to online reference material) to the end of the script and my copyright/license notice (Creative Commons License). Article begins now.
Note: You may skip to the end of the article if all you want is the perl code.
Introduction
Many websites have a form or a link you can use to download a file. You click a form button or click on a link and after a moment or two a file download dialog box pops-up in your web browser and prompts you for some instructions, such as “open” or “save“. I’m going to show you how to do that using a perl script.
This article will not teach you how to write perl programs but will introduce you to CGI scripting with perl and the CGI module that comes with perl. If you already have some experience with perl and CGI scripting you should still find the information in the article useful and hopefully interesting. At the end of the article is a list of online resources you can access for more information concerning some of the details that will be discussed in this article.
What You Need
Any recent version of perl (5.06 or newer should be good) and a server to run the script on. The ability to upload and run perl scripts on the server. A server that allows you to store files above the web root is preferable but not necessary. That’s the safest place to put files you don’t want people or ‘bots’ to be able to access. A little bit of prior HTML knowledge would be helpful but is not necessary.
The Perl Code
Just about all perl scripts that run as a CGI process need to start with what is called the shebang line. The most common shebang line is:
Code:#!/usr/bin/perl
Code:#!/usr/bin/perl -T
Modules
Modules are sort of like separate perl programs you can use in your perl program. Many people have written modules that have become standards that other perl programmers use all the time. We will be using these modules:
Code:use strict; use warnings; use CGI; use CGI::Carp qw/fatalsToBrowser/; use Tie::File;
The next two lines in the program establish some important parameters:
Code:$CGI::POST_MAX = 1024; $CGI::DISABLE_UPLOADS = 1;
Setting Paths and Options
Code:#################################### #### User Configuration Section #### #################################### # The path to where the downloadable files are. # Preferably this should be above the web root folder. my $path_to_files = '/home/users/downloads/'; # The path to the error log file my $error_log = '/home/users/downloads/logs/errors.txt'; # The path to the counter file my $counter_log = '/home/users/downloads/logs/counter.txt'; # Option to log errors: 1 = yes, 0 = no my $log = 1; # Option to count downloads: 1 = yes, 0 = no my $counter = 1; # Checks if someone is trying to hot-link to your script my $url = 'http://www.yoursite.com'; ######################################## #### End User Configuration Section #### ########################################
$error_log is the path to the errors.txt file that logs errors generated by the script.
$counter_log is the path to where the counter.txt file that keeps track of how many times files are downloaded.
$log and $counter turn the logs on or off.
$url should be the name of your website including the “http://” part.
Create the CGI object
Code:my $q = CGI->new;
In reality, the CGI module has many “commands” you can give to the “butler”. We will use but a few of them. Learning to use the CGI module is almost like learning a small programming language. But the beauty is you only need to know what the commands do, not how they do it. Just like a real butler you have to trust that he knows what he is doing and will get the job done efficiently and effectively without looking over his shoulder. I recommend you take the time to read the CGI modules documentation, even if you don’t understand much of it, you should at least be familiar with the basic form processing methods. I leave that up to you.
Security Checkpoint
Never underestimate the need for security when running scripts as a CGI. We are going to use three “checkpoin ts” to detect any suspicious activity. The first is going to check the amount of data sent to the script. We give the cgi_error() command to our trusty butler “q”. “413” indicates the limit we set for $CGI::POST_MAX has been exceeded.
Code:if (my $error = $q->cgi_error()){ if ($error =~ /^413\b/o) { error('Maximum data limit exceeded.'); } else { error('An unknown error has occured.'); } }
Code:if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) { error('Invalid Content-Type : multipart/form-data.') }
Next we check that the request to use the script comes from your website.
Code:if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) { error('Access forbidden.') }
I am going to use the Vars method to get all the parameters sent to the script into a hash. Once again, we call on “q” to do the actual work.
Code:my %IN = $q->Vars;
Code:my $file = $IN{'file'} or error('No file selected.');
You can’t say it enough, all data sent to a CGI script has to be validated. If we allowed just any thing to be sent to the script someone could send something like this: /foo/bar and depending on the path you append that to, the script will obediently go find the foo directory and download the bar file. There are of course much worse things a person could try, but this is not an article about how to hack into a website using the front door. To prevent the user from getting away with such a dangerous stunt we need to validate the data sent to the script.
Code:if ($file =~ /^(\w+[\w.-]+\.\w+)$/) { $file = $1; } else { error('Invalid characters in filename.'); }
The above code is also “untaintin g” the data. Since the data will be used to open a file on the server we must untaint it to satisfy the –T switch that we are not doing anything insecure. The only way to untaint data is to use a regexp. The parenthesis in the regexp store the pattern match in memory, we get that value using $1. We then assign the value back to our variable $file and now the data we will use to open the file is internal to our script and the –T switch will consider it safe to use. It’s up to you to know that your validation/filtering is sufficient for the task. If, for example, you used this pattern in the: regexp /(.*)/ the –T switch will not complain but the data will be passed into the script just like it was entered in the form or sent via a hyperlink. That would be a silly thing to do.
If the data does not pass the validation routine a message is sent to the error subroutine and the user is alerted.
Ready for Downloading
Code:if (download($file)) { #increments the files download count counter($file) if ($counter); } else { error('An unknown error has occured. Try again.'); }
The download() Subroutine
Code:sub download { my $file = $_[0] or return(0); # Uncomment the next line only for debugging the script #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!"; # Comment the next line if you uncomment the above line open(my $DLFILE, '<', "$path_to_files/$file") or return(0); # this prints the download headers with the file size included # so you get a progress bar in the dialog box that displays during file downlaods. print $q->header(-type => 'application/x-download', -attachment => $file, 'Content-length' => -s "$path_to_files/$file", ); binmode $DLFILE; print while <$DLFILE>; undef ($DLFILE); return(1); }
Code:print $q->header(-type => 'application/x-download', -attachment => $file, 'Content-length' => -s "$path_to_files/$file", );
“-attachment” option defines the name of the file being downloaded. You could give the file any name you wanted to, it does not have to be the actual filename. That can be useful if you have a reason to hide the real name of the file or needed to give the downloaded file a name other than the real name. The “Content-length” option uses the –s file test operator to get the size of the file. This allows the file download dialog box to display the file size and a progress bar and estimate the time remaining to complete the file download.
The last four lines of the subroutine complete the download process.
Code:binmode $DLFILE; print while <$DLFILE>; undef ($DLFILE); return(1);
Subroutines
The “error” subroutine is very simple. It uses a few html generating methods to print a basic html document that displays the error messages we send to it. The error message is stored in $_[0]. Each of these methods are discussed in the CGI modules documentation. If you have error logging turned on the “log_error ” function is also called. Anytime the “error” subroutine is called it will display the html document and then terminate the script, which is what exit() does.
Code:sub error { print $q->header(-type=>'text/html'), $q->start_html(-title=>'Error'), $q->h3("Error: $_[0]"), $q->end_html; log_error($_[0]) if $log; exit(0); }
Code:sub log_error { my $error = $_[0]; #open (my $log, ">>", $error_log) or die "Can't open error log: $!"; open (my $log, ">>", $error_log) or return(0); flock $log,2; my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params'; print $log '"', join('","',time, scalar localtime(), $ENV{'REMOTE_ADDR'}, $ENV{'SERVER_NAME'}, $ENV{'HTTP_HOST'}, $ENV{'HTTP_REFERER'}, $ENV{'HTTP_USER_AGENT'}, $ENV{'SCRIPT_NAME'}, $ENV{'REQUEST_METHOD'}, $params, $error), "\"\n"; }
Basically the file will look like this:
Frog.jpg,12
Meatloaf.txt,10 000
Babypics.zip,12 34
The filename is one the left of the comma and the count is on the right. The “counter” subroutine should only be called is the download is successful so the counts should be accurate.
Code:sub counter { my $filename = $_[0] or return(0); #my $o = tie my @array, "Tie::File", $counter_log or die "Can't open counter log: $!"; my $o = tie my @array, "Tie::File", $counter_log or return(0); $o->flock; my $flag = 0; if ($array[0]) { foreach my $line (@array) { my ($name,$count) = split(/,/,$line); if ($filename eq $name) { $count++; $line = qq{$name,$count}; $flag = 1; last; } } if ($flag == 0) { push @array, qq{$filename,1}; } } else {$array[0] = qq{$filename,1};} undef $o; untie @array; }
resorces will be here
Complete script
Code:#!/usr/bin/perl -T # Copyright 2008 Kevin Ruggles. All rights reserved. # It may be used and modified freely, but I request that this copyright # notice remain attached to the file. ## Load pragmas and modules use strict; use warnings; use CGI; use Tie::File; # Uncomment the next line only for debugging the script. #use CGI::Carp qw/fatalsToBrowser/; # The next two lines are very important. Do not modify them # if you do not understand what they do. $CGI::POST_MAX = 1024; $CGI::DISABLE_UPLOADS = 1; #################################### #### User Configuration Section #### #################################### #/home/users/web/b706/ipw.beaspart/contacts/pages/error.html # The path to where the downloadable files are. # Prefereably this should be above the web root folder. #my $path_to_files = '/home/users/downloads/'; my $path_to_files = '/home/users/web/b706/ipw.beaspart/downloads/'; # The path to the error log file my $error_log = '/home/users/web/b706/ipw.beaspart/downloads/logs/errors.txt'; # The path to the counter file my $counter_log = '/home/users/web/b706/ipw.beaspart/downloads/logs/counter.txt'; # Option to log errors: 1 = yes, 0 = no my $log = 1; # Option to count downloads: 1 = yes, 0 = no my $counter = 1; # Checks if someone is trying to hot-link to your script my $url = 'http://www.beaspartyponies.com'; #################################### ## End User Configuration Section ## #################################### # Edit below here at your own risk my $q = CGI->new; ###################################### ## This section checks for a number ## ## of possible errors or suspicious ## ## activity. ## ###################################### # check to see if data limit is exceeded if (my $error = $q->cgi_error()){ if ($error =~ /^413\b/o) { error('Maximum data limit exceeded.'); } else { error('An unknown error has occured.'); } } # Check to see if the content-type is acceptable. # multipart/form-data indicates someone is trying # to upload data to the script with a hacked form. # $CGI_DISABLE_UPLOADS prevents uploads. This routine # is to catch the attempt and log it. if ($ENV{'CONTENT_TYPE'} =~ m|^multipart/form-data|io ) { error('Invalid Content-Type : multipart/form-data.') } # Check if the request came from your website, if not # it indicates remote access or hot linking. if ($ENV{'HTTP_REFERER'} && $ENV{'HTTP_REFERER'} !~ m|^\Q$url|io) { error('Access forbidden.') } ################################ ## End error checking section ## ################################ # Get the data sent to the script. my %IN = $q->Vars; # Parse the "file" paramater sent to the script. my $file = $IN{'file'} or error('No file selected.'); # Here we untaint the filename and make sure there are no characters like '/' # in the name that could be used to download files from any folder on the website. if ($file =~ /^(\w+[\w.-]+\.\w+)$/o) { $file = $1; } else { error('Invalid characters in filename.'); } # Check if the download succeeded if (download($file)) { #increments the files download count counter($file) if ($counter); } else { error('An unknown error has occured.'); } ################# ## SUBROUTINES ## ################# # download the file sub download { my $file = $_[0] or return(0); # Uncomment the next line only for debugging the script #open(my $DLFILE, '<', "$path_to_files/$file") or die "Can't open file '$path_to_files/$file' : $!"; # Comment the next line if you uncomment the above line open(my $DLFILE, '<', "$path_to_files/$file") or return(0); # This prints the download headers with the file size included # so you get a progress bar in the dialog box that displays during file downlaods. print $q->header(-type => 'application/x-download', -attachment => $file, 'Content-length' => -s "$path_to_files/$file", ); binmode $DLFILE; print while <$DLFILE>; undef ($DLFILE); return(1); } # This is a very generic error page. You should make a better one. sub error { print $q->header(-type=>'text/html'), $q->start_html(-title=>'Error'), $q->h3("Error: $_[0]"), $q->end_html; log_error($_[0]) if $log; exit(0); } # Log the error to a file sub log_error { my $error = $_[0]; # Uncomment the next line only for debugging the script #open (my $log, ">>", $error_log) or die "Can't open error log: $!"; # Comment the next line if you uncomment the above line open (my $log, ">>", $error_log) or return(0); flock $log,2; my $params = join(':::', map{"$_=$IN{$_}"} keys %IN) || 'no params'; print $log '"', join('","',time, scalar localtime(), $ENV{'REMOTE_ADDR'}, $ENV{'SERVER_NAME'}, $ENV{'HTTP_HOST'}, $ENV{'HTTP_REFERER'}, $ENV{'HTTP_USER_AGENT'}, $ENV{'SCRIPT_NAME'}, $ENV{'REQUEST_METHOD'}, $params, $error), "\"\n"; } # Incrememt the file download counter sub counter { my $filename = $_[0] or return(0); # Uncomment the next line only for debugging the script #my $o = tie my @array, "Tie::File", $counter_log or die "Can't open counter log: $!"; # Comment the next line if you uncomment the above line my $o = tie my @array, "Tie::File", $counter_log or return(0); $o->flock; my $flag = 0; if ($array[0]) { foreach my $line (@array) { my ($name,$count) = split(/,/,$line); if ($filename eq $name) { $count++; $line = qq{$name,$count}; $flag = 1; last; } } if ($flag == 0) { push @array, qq{$filename,1}; } } else {$array[0] = qq{$filename,1};} undef $o; untie @array; }
-
I felt that the article was well written. For me it had a nice flow to it and was informative.
In this section of code you are not closing the filehandle. Is that not necessary?
Code:binmode $DLFILE; print while <$DLFILE>; undef ($DLFILE); return(1);
Comment
-
Originally posted by eWishIn this section of code you are not closing the filehandle. Is that not necessary?
I had originally explained that but removed it because the article seemed to be getting too long and I still have some stuff to add.
I am also a little surprised you don't know that it does close the filehandle. When you "undef" an indirect filehandle, $DLFILE in this case, that closes the file. The file would also be closed automatically once the $DLFILE scalar went out of scope, which is the end of the subroutine block. It could also be written using close():
close $DLFILE;Comment
-
I did not know that by using undef that it would close the file handle. I did know that it would go out of scope and not be an issue because of that. Anytime I open a filehandle I use close() at the end even though it goes out of scope.
--KevinComment
-
Originally posted by eWishI did not know that by using undef that it would close the file handle. I did know that it would go out of scope and not be an issue because of that. Anytime I open a filehandle I use close() at the end even though it goes out of scope.
--Kevin
Another convenient behavior is that an indirect filehandle automatically closes when it goes out of scope or when you undefine it:
Code:sub firstline { open( my $in, shift ) && return scalar <$in>; # no close() required }
Comment
-
Shouldn't
Next I am going to see if someone has altered the form to try and upload a file to the script. “multi-part/form-data” must be used in a CGI forms “encypt” attribute in order to send files.
Next I am going to see if someone has altered the form to try and upload a file to the script. “multi-part/form-data” must be used in a CGI forms “enctype” attribute in order to send files.
Very well documented article, btw.Comment
-
Yes, it might be long still, I did cut it down from the originally planned article, but the user is free to skip to the code at the end of the article and then read any section of the article to understand a section of the code if need be. Does that make sense or seem plausible to you?
And thanks for editing the article to correct that error.Comment
-
I have to agree with Kevin. I usually getting a bit "upset" when people break an article up into multiple parts, especially when it has to do with coding. If I am reading it, then I typically have the time to finish it and prefer it all on one page. But, that's my preference. :)
Great article Kevin! Thanks a bunch for writing it.
Regards,
JeffComment
Comment