Perl regexp: matches this AND not that?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • SaltyFoam
    New Member
    • Mar 2007
    • 1

    Perl regexp: matches this AND not that?

    I need a Perl match expression that will succeed if the passed string matches a particular format, excluding some combinations. For example I need a match expression for filenames that don't start with an underscore and aren't "index.html " so someone can't upload those files to my site. I tried:

    ^[^_](ndex\.html){0} .*

    which it would seem to me excludes anything starting with an underscore and anything with ?ndex.html matching 0 times (i.e. NOT matching?) but this seems to allow "index.html ".

    Once I get the above working I need to extend it to dissallow files starting with "Icon" and "Thumb" for similar reasons.

    Dominique
  • miller
    Recognized Expert Top Contributor
    • Oct 2006
    • 1086

    #2
    Hello Dominique,

    Before trying to create a complicated regex for a problem like this, try creating a bunch of simple ones. I doubt that there is a strong enough efficiency concern for creating a single regex when weighed against the uncertainty in allowing loopholes if you're unfamiliar with regexs.

    One thing that you can do is cache a list of regex's so you don't have to create multiple if statements. This is nice since it will allow you to easily add new limitations without having to add more code. It would look something like this.

    Code:
    my @filenameLimitations = (
    	qr{^_},
    	qr{^index\.html$},
    	qr{^Icon},
    	qr{^Thumb},
    );
    
    if (grep {$filename =~ /$_/} @filenameLimitations) {
    	print "Error: filename not allowed";
    }

    Nevertheless, what you've described would be fairly easy to create a single regex for so far. It would be like this:

    Code:
    if ($filename =~ /^(?:_|index\.html$|Icon|Thumb)/) {
    	print "Error: filename not allowed";
    }
    To read more about regex's start with perldoc:





    - Miller

    Comment

    • KevinADC
      Recognized Expert Specialist
      • Jan 2007
      • 4092

      #3
      There is no reason at all to try and do this with one regexp:

      Code:
      if (/^_/) {
         its bad: starts with an underscore
      }
      if (/^index\.html?$/i) {
         its bad: "index.htm or index.html"
      }

      Comment

      Working...