Need a regex to check form submission url format

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • dancerman
    New Member
    • Nov 2008
    • 2

    Need a regex to check form submission url format

    I want a simple as possible regex to check the format of my form submission URL string, I don't care whether is an actual real working URL, just that it be in proper URL format and, IF POSSIBLE add someting so that the URL characters are the only charactrers entered in the field:
    so it would be an err msg prompt for legit users to enter their link in the proper format;
    On the uther hand, it should not allow spammers to enter tons of text in the url field.
    I have used the following regex to chewck that there are characters supplied in addition to the http:// value in the form field, and it works with simple urls but it does not work with urls such as
    http://somesubdomain-somesitename.or g/somepagename.ht m
    if ($FORM{'url'} eq 'http://' || $FORM{'url'} !~ /^(f|ht)tp:\/\/\w+\.\w+/) {
    &no_url;
    }
  • KevinADC
    Recognized Expert Specialist
    • Jan 2007
    • 4092

    #2
    There are a number of URI checking modules on CPAN, but if you don't want to go the module route, you can use this regexp taken from the URI modules documentation:

    Code:
    my $uri = 'http://www.mysite.com:8080/path/index.html?test=foo&foo=bar#internal-link';
    
    my($scheme, $authority, $path, $query, $fragment) =
      $uri =~ m|(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|;
    
    print qq{scheme = $scheme
    authority = $authority
    path = $path
    query = $query
    fragment = $fragment};
    Then you apply tests to each part of the URI individually.

    Comment

    • dancerman
      New Member
      • Nov 2008
      • 2

      #3
      Thx for the response.
      This reply delayed as I had to go out of town.
      My thought was that what I really need to do is check for basic url format in the form submissions.
      As any valid url format has to be OK, then, manual editing or other spam checks in the script is the spam solution, as just checking for basic url formating is the key here.

      I came up with if string != (http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?) {
      & check url;
      }
      which works for simple urls, but I'd like to add code for more complex urls such as http://subdomain.somed omain.com/somefilename.ht mlor
      http://xyzabd-somedomain.com/somefilename.ht ml
      or
      Seems simple (to me) but not able to to get the script to process - it hangs on the url field data.
      thx,
      Mike

      Comment

      Working...