Extract php code from a php file using RegEx

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • rizwan6feb
    New Member
    • Jul 2007
    • 108

    Extract php code from a php file using RegEx

    I am trying to extract php code from a php file (php file also contains html, css and javascript code). I am using the following regex for this
    Code:
    <\?[\w\W]*?\?>
    but this doesn't cater quotation marks (single and double quotes) and comments, i mean how can i skip php tags inside a string (and comments). Please have a look at the following code
    Code:
    <?php
    	include("db.php");
        $name=$_REQUEST['name'];
        /* the regular expression to extract code inside php tags (i.e  <? and ?>) is  */
        $str='|<\?[\w\W]*?\?>|';
        # The regex can also be written using double quotes
        $str="|<\?[\w\W]*?\?>|";
    ?>
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    
    <title>Sample PHP File</title>
    </head>
    
    <body>
    <?="<h1>Some output from PHP?>
    </body>
    </html>
    I need a regular expression that extracts the two blocks of PHP code from sample code
  • Dormilich
    Recognized Expert Expert
    • Aug 2008
    • 8694

    #2
    Originally posted by rizwan6feb
    Code:
    <\?[\w\W]*?\?>
    but this doesn't cater quotation marks (single and double quotes) and comments,
    well, if I try, the RegEx gets it all. your problem is that it is extremely difficult to determine, whether a "?>" is a comment, a string or a processing instruction (i.e. you need to kind of parse the string).

    maybe it's easier to do the reverse and not extract what's between ?> and <? (though this may also fail in special circumstances)

    Comment

    • rizwan6feb
      New Member
      • Jul 2007
      • 108

      #3
      Found a solution at http://regexadvice.com/forums/thread/53756.aspx

      The regex below is what i needed
      <\?(\x22[^\x22]*?\x22|\x27[^\x27]*?\x27|/\*.*?\*/|.)*?\?>

      Comment

      • Dormilich
        Recognized Expert Expert
        • Aug 2008
        • 8694

        #4
        are you sure? the RegEx fails for me (that is, it doesn't get the whole first block, only chunks).

        translated into characters… all that's a string, a comment or any char
        Code:
        <\?("[^"]*?"|'[^\']*?'|/\*.*?\*/|.)*?\?>

        Comment

        Working...