Need help with PHP DOMXML - get_elements_by_tagname

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MegaZone

    Need help with PHP DOMXML - get_elements_by_tagname

    I'm having some issues with PHP DOMXML - in particular the
    get_elements_by _tagname method. Now, the PGP docs on this are, well,
    sparse, so maybe I'm just doing something stupid. I thought this
    method would behave like the 'findnodes' XML method in Perl. Namely
    that you can pass it an xpath statement and it will find nodes that
    match:
    $array = $node->get_elements_b y_tagname($xpat h);
    This is long so here's a pagebreak:

    And, indeed, this seems to have worked when I've used it in the past.
    But I'm working on a more complex system now which does a lot more
    sub-node access, etc, and it is failing on me. I put together this
    example to demonstrate the failure modes I'm experiencing.

    ----------------------------------------------------------------------
    <?php
    if (PHP_OS == "WIN32" || PHP_OS == "WINNT") {
    define('EOL', "\r\n");
    }
    else {
    define('EOL', "\n");
    }
    header("Content-Type: text/plain");
    print PHP_VERSION . EOL;

    $xml =
    '<TestDoc>
    <level1>
    <level2>
    <level3>
    <level4>
    <fubar>fubar</fubar>
    <fubaz>fubaz</fubaz>
    </level4>
    </level3>
    </level2>
    </level1>
    </TestDoc>';

    if (!$domResponse = domxml_open_mem ($xml)) {
    print("failed!" );
    }

    print pcGetField(&$do mResponse, "fubar") . EOL;

    lookup(&$domRes ponse);

    function lookup($domResp onseRef) {
    print $domResponseRef->dump_mem();
    $node =
    $domResponseRef->get_elements_b y_tagname("leve l4");
    $node = $node[0];
    print_r($node);
    print pcGetField2(&$n ode, "//fubar") . EOL;
    print pcGetField(&$do mResponseRef, "fubar") . EOL;
    print pcGetField(&$no de, "fubar") . EOL;
    $node =
    $node->get_elements_b y_tagname("fuba r");
    print $node[0]->get_content( ) . EOL;
    $node = $node[0];
    print_r($node);
    }

    function pcGetField($nod eRef, $tag) {
    if(!isset($node Ref)) {
    print("pcGetFie ld: node is blank [" . $tag . "]");
    return "";
    }
    $node = $nodeRef->get_elements_b y_tagname($tag) ;
    $node = $node[0];
    return $node->get_content( );
    }

    function pcGetField2($no deRef, $tag) {
    if(!isset($node Ref)) {
    print("pcGetFie ld: node is blank [" . $tag . "]");
    return "";
    }
    $xpath = xpath_new_conte xt($nodeRef);
    $node = &xpath_eval($xp ath, $tag);
    $node = $node->nodeset[0];

    return $node->get_content( );
    }
    ?>
    ----------------------------------------------------------------------

    With it like this everything is fine:
    ----------------------------------------------------------------------
    4.2.2
    fubar
    <?xml version="1.0"?>
    <TestDoc>
    <level1>
    <level2>
    <level3>
    <level4>
    <fubar>fubar</fubar>
    <fubaz>fubaz</fubaz>
    </level4>
    </level3>
    </level2>
    </level1>
    </TestDoc>
    DomElement Object
    (
    [type] => 1
    [tagname] => level4
    [0] => 3
    [1] => 137395784
    )
    fubar
    fubar
    fubar
    fubar
    DomElement Object
    (
    [type] => 1
    [tagname] => fubar
    [0] => 2
    [1] => 138335176
    )
    ----------------------------------------------------------------------

    You see, I'm jumping down to 'level4' so fubar is an immediate child
    node. But if I start from *anywhere* except the root node or the
    immediate parent, it fails. So if I change:
    $node = $domResponseRef-&gt;get_element s_by_tagname("l evel4");
    to
    $node = $domResponseRef-&gt;get_element s_by_tagname("l evel1");
    I get:
    ----------------------------------------------------------------------
    4.2.2
    fubar
    <?xml version="1.0"?>
    <TestDoc>
    <level1>
    <level2>
    <level3>
    <level4>
    <fubar>fubar</fubar>
    <fubaz>fubaz</fubaz>
    </level4>
    </level3>
    </level2>
    </level1>
    </TestDoc>
    DomElement Object
    (
    [type] => 1
    [tagname] => level1
    [0] => 3
    [1] => 138064304
    )
    fubar
    fubar
    <br />
    <b>Fatal error</b>: Call to a member function on a non-object in
    <b>/home/megazone/scripts/PHP/Kiosk/test.php</b> on line <b>56</b><br />
    ----------------------------------------------------------------------

    Note that the it works when checking from the docroot and later when
    using xpath_eval and from the reference to the doc root - but fails
    the first time you try it from the subnode. It will fail with
    TestDoc, level1, level2, and level3. it will fail if I try
    get_elements_by _tagname("fubar "), get_elements_by _tagname("//fubar"),
    etc. It also fails if you're at say level3 and try to look for
    "level4/fubar". I think I've tried every combination I can think of.

    So is this just a limitation that it only works when working with the
    root node of the document or when looking for an immediate child of
    the current node? What the heck am I not seeing?

    Thanks.

    -MZ, RHCE #80619929990054 1, ex-CISSP #3762
    --
    <URL:mailto:meg azoneatmegazone .org> Gweep, Discordian, Author, Engineer, me.
    "A little nonsense now and then, is relished by the wisest men" 508-755-4098
    <URL:http://www.megazone.or g/> <URL:http://www.eyrie-productions.com/> Eris

  • Terence

    #2
    Re: Need help with PHP DOMXML - get_elements_by _tagname

    I had a quick skim read of your post (as it was quite long) and it seems
    like you don't check to see if get_elements_by _tagname() is even
    returning at least one node.

    instead of
    $node = $node[0];

    you should have
    if(count($node) ) {
    $node=$node[0];
    ...
    $node =
    $node->get_elements_b y_tagname("fuba r");
    print $node->get_content( ) . EOL;
    }
    else {
    $node = null;
    print "There was no level4 element!\n";
    }

    and also in your function you are using isset() to test if it is a node.
    not gonna work.

    if(!isset($node Ref))

    well of course it will exist. it will exist no matter what is in it
    because it is in the function call. The line should be

    if(is_object($n odeRef))

    this would be an improvement. It would be even better if you tested it
    using the is_a() function inside the object test.



    DOM can be laboursome. However, it is definately the best way to create
    XML data. DOM is fairly low-level in terms of describing a document,
    which is why I've written a library which attempts to provide higher
    level functionality for app designers.


    There is a method to fetch one element based on it's name


    There is a method for getting an array of nodes from an XPath query


    These are basically shortcuts to having to write low-level DOM, but they
    also do obligatory/mundane sanity checks that you are missing here.


    XAO is for object oriented programmers who would rather leverage code
    than have to re-invent the basics every time.

    XAO allows you to declare call-back functions based on element names
    and/or XPath queries. This provides a custom-tag facility in addition to
    DOM functionality.





    Comment

    • MegaZone

      #3
      Re: Need help with PHP DOMXML - get_elements_by _tagname

      Terence <tk.lists@fastm ail.fm> shaped the electrons to say:[color=blue]
      >I had a quick skim read of your post (as it was quite long) and it seems
      >like you don't check to see if get_elements_by _tagname() is even
      >returning at least one node.[/color]

      In the example I did, no, I didn't bother with any real error
      checking. I wrote it solely to illustrate the problem I'm having with
      a much larger codebase. I stripped most of that stuff to try to keep
      the sample from being monsterous. The overall codebase is a few
      thousand lines.
      [color=blue]
      >and also in your function you are using isset() to test if it is a node.
      >not gonna work.
      >
      >if(!isset($nod eRef))[/color]

      Actually that does work. If you do the lookup and get 0 results, then
      set the node to the 0th element of the array, it is null. And the
      isset check does catch passing in an unset node. I've had it happen
      in the production code. But I'll check is_object and is_a, sounds
      like they may do more appropriate checks.
      [color=blue]
      >DOM can be laboursome. However, it is definately the best way to create
      >XML data. DOM is fairly low-level in terms of describing a document,[/color]

      I know, I've worked with DOM for a while. This codebase already
      exists in other languages, the PHP version is more recent. One of the
      over riding requirements is keeping the code structures of the various
      implementations of the code similar. The Perl, ASP and CF
      implementations all use DOM. I wrote the Perl code before we added
      PHP as a supported platform. The Perl is using XML::LibXML for this,
      which uses libxml2 - same as DOMXML. CF5 and ASP use MSXML, CF MX
      uses the built in XML handler CF MX provides.

      It was working in our 2.5.5 revision - the 2.6.0 revision restructured
      the functions and introduced more pass-by-reference calls, and more
      work from subnodes instead of always working from the root node. It
      worked fine in all the other languages - but when the changes were
      made to PHP, it stopped working. And it stems from this function.

      Doing more digging since I posted (pretty much what I've been doing
      all day), it looks like I'm going to have to change anyway. While
      get_elements_by _tagname worked with xpath in some situations, the
      response to one bug report at php.net indicates this has been changed
      in newer versions of PHP and the method will *not* support xpath. The
      reply indicated that if you want to use xpath, you need to explicitly
      use the xpath methods. Unfortunate, since the other languages support
      xpath in their equivalent methods (ASP/VB's SelectSingleNod e, Perl's
      findnodes, etc). But I think the simplest solution may be replacing
      all the occurances of get_elements_by _tagname with the xpath
      functions. At least all the occurances that break at this time.

      Since the code is a sample framework that goes out to customers one of
      the general requirements is to try not to depend on being too current
      on the releases and trying to stick to libraries that are as common as
      possible. XML handling is mandatory since the framework communicates
      with a kiosk system that speaks XML only. Since the requirement for
      XML was there, all of the configs and such are also stored in XML
      since it is a nice format.

      I'm looking forward to PHP5 since the XML support is a fundamental
      feature. I only got back into PHP a few months ago (I had used it in
      PHP3 days, and played with it in the PHP/FI days) when we added PHP to
      our list of supported platforms. The initial port of Perl 2.5.5 to
      PHP 2.5.5 actually went extremely smoothly. I was kind of surprised
      to run into this trouble with 2.6.0 after that experience. There are
      two main contexts for the framework - one is working, the other is not.

      -MZ, RHCE #80619929990054 1, ex-CISSP #3762
      --
      <URL:mailto:meg azoneatmegazone .org> Gweep, Discordian, Author, Engineer, me.
      "A little nonsense now and then, is relished by the wisest men" 508-755-4098
      <URL:http://www.megazone.or g/> <URL:http://www.eyrie-productions.com/> Eris

      Comment

      • Terence

        #4
        Re: Need help with PHP DOMXML - get_elements_by _tagname

        I assumed you were asking why you got the exception about trying to use
        a method on a non-object.

        The fact that xpath doesn't work with get_elements_by _tagnam() is
        testiment to the fact that a standards based approach neccesitates
        implementing only features of the lowest common denominator. If you are
        writing a cross-platform (ie. multi language) framework, you will always
        have these limitations. While PHP5 will be using libxml2, I don't know
        if this means it will support XPath in get_elements_by _tagnam(). The
        behaviour is non-standard.

        Comment

        • MegaZone

          #5
          Re: Need help with PHP DOMXML - get_elements_by _tagname

          Terence <tk.lists@fastm ail.fm> shaped the electrons to say:[color=blue]
          >I assumed you were asking why you got the exception about trying to use
          >a method on a non-object.[/color]

          That was one symptom - why goes get_elements_by _tagname() work if you
          start from the docroot or the immediate parent, but no where in
          between? Perhaps this is fixed in a later version of PHP - from the
          docs it sounds like it *should* recurse the structure no matter where
          you start from.
          [color=blue]
          >The fact that xpath doesn't work with get_elements_by _tagnam() is[/color]

          It does appear to work some of the time, and one of the comments left
          in the online PHP documentation illustrates using XPath:
          PHP is a popular general-purpose scripting language that powers everything from your blog to the most popular websites in the world.


          That, and other examples I found when looking for sample code, seemed
          to indicate XPath was valid. It seemed to make sense as that also fit
          with the behavior of other languages.

          But this evening I found this:


          From one of the comments:
          ---
          chregu@php.net

          you are using xpath-expressions and not simple element-names. This may
          had worked with older php versions, but the internal code was changed
          later.

          If you want to use something like "timeopen/year" then use the
          appropriate xpath methods (see manual..)
          ---

          So it sounds like the right thing to do is switch to straight tagnames
          where possible, and use the xpath functions where it isn't.
          [color=blue]
          >implementing only features of the lowest common denominator. If you are
          >writing a cross-platform (ie. multi language) framework, you will always[/color]

          It is part of a payment system - the framework exists in different
          languages because it is the piece merchant can integrate with their
          backend. So we provide it in whatever language the merchant is using
          for their site. Before I joined the company a year ago it was
          basically a Windows shop so it was ASP and CF, I did the Perl and PHP
          implementations . JSP is on the roadmap. But I readily admit I'm
          still learning PHP - I tend to implement any new features in Perl
          first since that's my primary language, and then port it to PHP. For
          the most part that's worked rather well, the language structures are
          close enough that a lot of the 'porting' can be done with emacs
          regexps. Looks like this is one of the 'gotchas' though - it looked
          like 'get_elements_b y_tagname' was a drop in replacement for
          'findnodes', but that no long seems to be the case.
          [color=blue]
          >have these limitations. While PHP5 will be using libxml2, I don't know
          >if this means it will support XPath in get_elements_by _tagnam(). The
          >behaviour is non-standard.[/color]

          Based on what I found tonight, it sounds like the move is away from
          allowing XPath in the PHP functions - except for the xpath_eval, etc,
          family.

          I've probably made my share of newbie errors in the PHP port - I know
          I used 'isset' in a number of places and some of them I caught and
          made 'is_object', but some are probably still off. Like most things,
          it was done with the "we need it yesterday" mandate, so I didn't have
          a lot of time to refresh my PHP knowledge. I picked up O'Reilly's
          Programming PHP, the PHP.net documentation, and hit the ground
          running. ;-)

          -MZ, RHCE #80619929990054 1, ex-CISSP #3762
          --
          <URL:mailto:meg azoneatmegazone .org> Gweep, Discordian, Author, Engineer, me.
          "A little nonsense now and then, is relished by the wisest men" 508-755-4098
          <URL:http://www.megazone.or g/> <URL:http://www.eyrie-productions.com/> Eris

          Comment

          Working...