Tokenizer Difficulties

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Max B-K

    Tokenizer Difficulties

    I've delved into the usage of the PHP Tokenizer that directly
    interfaces with the Zend engine.

    So far, I have found it incredibly useful when it comes to editing a
    PHP file.

    What I am trying to do is to make a PHP5 class compatible in PHP4 by
    running it through a class I made.

    To do this, I decided that first I had to have the ability to remove
    the __construct(), and the visibility declarations! So I have a
    function that runs a switch statement, and it removes all visibility
    declarations.

    Now here is my problem:
    There are 2 visibility types, 1) for variables, 2) for functions.

    If I replace every visibility declaration with "var", it will
    obviously not work for functions, but functions are declared AFTER
    visibility, and therefore I do not know how I can make sure that only
    the visibility RIGHT before that function will be removed without
    affecting ANYTHING else.

    Does anyone know how I can assign a "var" replace for variable
    visibility and just remove visibility for functions? Many thanks in
    advance, I will try to delve further into this if you guys need me to
    for more support.

    Here is my function:

    function TokenizedRetrog rade($file_name , $visibility_poi nter =
    '{VIS}')
    {
    $source = file_get_conten ts($file_name);
    $tokens = token_get_all($ source);
    $function_decla red = false;
    $x = 0;
    foreach ($tokens as $token)
    {
    if (is_string($tok en))
    {
    // simple 1-character token
    $data .= $token;
    }
    else
    {
    // token array--$text stores the data from a specific token.
    list($id, $text) = $token;
    switch ($id) {
    case T_PROTECTED:
    case T_PUBLIC:
    case T_PRIVATE:
    //Replace private, public, protected keyword with visibility
    pointer.
    $x++;
    $text = 'var';
    $visibility_set = true;
    break;

    case T_CLASS:
    //T_CLASS occurs when a class is declared.
    $class_declared = true;
    break;

    case T_VARIABLE:
    break;

    case T_OBJECT_OPERAT OR:
    $in_object_refe rence = true;
    break;
    case T_STRING:
    //If a class was just declared, the string is the class name.
    If ($class_declare d === true)
    {
    $class_declared = false;
    $class_name = $text;
    }

    //If __construct is referenced to within the files code, replace
    //it with the name of the class previously gotten from class
    //declaration.
    If ('__construct' == $text)
    {
    $text = $class_name;
    }
    break;

    case T_FUNCTION:
    $function_decla red = true;
    break;

    case T_WHITESPACE:
    break;

    default:
    break;
    }
    $data .= $text; //Add text previously set and possibly modified.
    }
    }
  • Ira Baxter

    #2
    Re: Tokenizer Difficulties


    "Max B-K" <sephiriz@gmail .com> wrote in message
    news:1e2afd5.04 08101851.2ec948 85@posting.goog le.com...[color=blue]
    > I've delved into the usage of the PHP Tokenizer that directly
    > interfaces with the Zend engine.
    >
    > So far, I have found it incredibly useful when it comes to editing a
    > PHP file.
    >
    > What I am trying to do is to make a PHP5 class compatible in PHP4 by
    > running it through a class I made.
    >
    > To do this, I decided that first I had to have the ability to remove
    > the __construct(), and the visibility declarations! So I have a
    > function that runs a switch statement, and it removes all visibility
    > declarations.
    >
    > Now here is my problem:
    > There are 2 visibility types, 1) for variables, 2) for functions.
    >
    > If I replace every visibility declaration with "var", it will
    > obviously not work for functions, <<... general troubles with this[/color]
    attack>>

    Most of your problem comes from trying "edit" a stream of tokens,
    with no memory of the context in which an individual token is found.
    Without the context, you simply can't do the job right.
    Sure, you can build ad hoc machinery to try to remember it,
    but such ad hocness generally turns into a baroque pile of code.
    The general way to collect such context for structured texts is called
    "parsing" (of which tokenizing is just the first step).

    If you can parse the PHP, and build conventional compiler data structures
    for this, then you could consider walking over the trees and using
    the "parent context" (your visibility declarations are either in variable
    declaration or function declaration context) to make this change
    safely and reliably.

    Even cooler, you could perhaps even write patterns that expressed
    the change you wanted to make, *and* the context, so that you
    could easily express the patterns of changes you wanted to make, e.g.,

    "public \x;" -> "var \x;".
    "private function \fnheader \fnbody" -> "function \fnheader \fnbody".

    A tool that can do this exists, can already parse PHP4 and PHP5.
    http://www.semdesigns.com/Products/D...formation.html.
    It would be ideal for implementing the specific task you described,
    and the broader task you imply of converting PHP5 code back
    into executable PHP4.

    I'm not sure exactly why you want to do that, considering already have
    PHP5 :-}

    --
    Ira D. Baxter, Ph.D., CTO 512-250-1018
    Semantic Designs, Inc. www.semdesigns.com


    Comment

    Working...