Parsing Large XML files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • doug

    Parsing Large XML files

    How can I parse a large XML file that is to large for memory? I am
    currently using php 5.0.3 and the libxml parser, I would like to read
    it incrementally from a file, but the parser gets the entire contents
    from as String?

  • Derek Fountain

    #2
    Re: Parsing Large XML files

    doug wrote:
    [color=blue]
    > How can I parse a large XML file that is to large for memory? I am
    > currently using php 5.0.3 and the libxml parser, I would like to read
    > it incrementally from a file, but the parser gets the entire contents
    > from as String?[/color]

    The standard XML solution to this problem is to use a SAX parser instead of
    a DOM one. However, there doesn't seem to be a SAX parser in the PHP XML
    library. One solution appears to be:


    &id=3628&Itemid =10159

    Google might help find others. Or maybe use an external SAX based tool to
    boil the XML down to something a bit smaller that you can manipulate from
    PHP?

    --
    The email address used to post is a spam pit. Contact me at
    http://www.derekfountain.org : <a
    href="http://www.derekfounta in.org/">Derek Fountain</a>

    Comment

    • Chung Leong

      #3
      Re: Parsing Large XML files

      "doug" <ddalton@shortb us.net> wrote in message
      news:1109671938 .655665.192150@ g14g2000cwa.goo glegroups.com.. .[color=blue]
      > How can I parse a large XML file that is to large for memory? I am
      > currently using php 5.0.3 and the libxml parser, I would like to read
      > it incrementally from a file, but the parser gets the entire contents
      > from as String?
      >[/color]

      Use the expat functions instead.




      Comment

      • doug

        #4
        Re: Parsing Large XML files

        How do I use this SAX parser to read XML directly from a file?

        Can you show me an instantiation example?

        Comment

        • doug

          #5
          Re: Parsing Large XML files

          So in the example below, every line is parsed individually with no
          contents stored in memory but the current line?


          <?php
          $file = "data.xml";
          $depth = array();

          function startElement($p arser, $name, $attrs)
          {
          global $depth;
          for ($i = 0; $i < $depth[$parser]; $i++) {
          echo " ";
          }
          echo "$name\n";
          $depth[$parser]++;
          }

          function endElement($par ser, $name)
          {
          global $depth;
          $depth[$parser]--;
          }

          $xml_parser = xml_parser_crea te();
          xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
          if (!($fp = fopen($file, "r"))) {
          die("could not open XML input");
          }

          while ($data = fread($fp, 4096)) {
          if (!xml_parse($xm l_parser, $data, feof($fp))) {
          die(sprintf("XM L error: %s at line %d",
          xml_error_strin g(xml_get_error _code($xml_pars er)),
          xml_get_current _line_number($x ml_parser)));
          }
          }
          xml_parser_free ($xml_parser);
          ?>

          Comment

          • Derek Fountain

            #6
            Re: Parsing Large XML files

            doug wrote:
            [color=blue]
            > How do I use this SAX parser to read XML directly from a file?[/color]

            Erm, read the docs that come with it! Sounds like you need to read up on SAX
            first.
            [color=blue]
            > Can you show me an instantiation example?[/color]

            No, I've never used it. You're actually going to have to do some work here
            yourself!

            If it were me, and the application allowed it, I'd be writing a separate
            utility to do the XML grunt work. PHP, for all it's strengths, isn't ideal
            for heavy duty XML parsing.

            --
            The email address used to post is a spam pit. Contact me at
            http://www.derekfountain.org : <a
            href="http://www.derekfounta in.org/">Derek Fountain</a>

            Comment

            • Chung Leong

              #7
              Re: Parsing Large XML files

              "doug" <ddalton@shortb us.net> wrote in message
              news:1109724037 .385785.217450@ l41g2000cwc.goo glegroups.com.. .[color=blue]
              > So in the example below, every line is parsed individually with no
              > contents stored in memory but the current line?
              >[/color]

              The expat parser reuses the same buffers in calls to the handlers. If PHP
              garbage collection works correctly, then the XML data should not take up
              memory if your handlers don't save it.


              Comment

              Working...