read and parsing an XML file using the content

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Bob Bedford

    read and parsing an XML file using the content

    Hi all,

    I've to read and parse an XML file to save the datas in a database.

    Unfortunately it appens that the datas are wrong. I mean it seems they are
    not well readed...someti mes only part of it it's ok.

    It's there any way to read an XML file sequentially using the content
    instead of a $data = fread($fp, 4096); wich may be the source of the problem
    ?

    I get wrong datas when the files are big. When they are little I seem not to
    have problems.

    Also I've memory limit and also execution time limit.

    Thanks for helping.

    Bob


  • Bob Bedford

    #2
    Re: read and parsing an XML file using the content

    If you're parsing chunks of data at a time, it's possible your block of
    4096 bytes is breaking in the middle of a tag. In fact, much more than
    possible - it's be highly unlikely it would end right at a 4096 byte
    boundary.
    So how to fix it ? I can't read the entire file due to memory limit of my
    server (I can't change server)


    Comment

    • Jerry Stuckle

      #3
      Re: read and parsing an XML file using the content

      Bob Bedford wrote:
      >If you're parsing chunks of data at a time, it's possible your block of
      >4096 bytes is breaking in the middle of a tag. In fact, much more than
      >possible - it's be highly unlikely it would end right at a 4096 byte
      >boundary.
      >
      So how to fix it ? I can't read the entire file due to memory limit of my
      server (I can't change server)
      >
      >
      >
      How big is your xml file?

      If you can't read the entire file in, and can't change hosts, you'll
      have to parse it manually. That will be a lot of work.

      Why can't you change servers?

      --
      =============== ===
      Remove the "x" from my email address
      Jerry Stuckle
      JDS Computer Training Corp.
      jstucklex@attgl obal.net
      =============== ===

      Comment

      • Bob Bedford

        #4
        Re: read and parsing an XML file using the content

        How big is your xml file?
        around 2-3MB. I've created a php script that split bigger files in chunk of
        2-3MB. I've already checked the splitted files and they are OK.

        Also I can't change for further reasons, but mainly as the service offered
        is quite the same as a dedicated server (50GB space, unlimited emails,
        unlimited traffic but with execution time and memory limit only) and I can't
        move to a dedicated server since I know nothing on managing such dedicated
        or virtual server and I've no time to learn and manage it. Also it will be a
        problem since security is very important for me and that's well done by the
        actual ISP.

        Anyway the problem is that I treat the xml content when I reach the closing
        tag and it seems to treat the content even if it doesn't reach this closing
        tag....like if it arrives at the end of file and no reaching the end tag, it
        executes the code anyway....it is the case or the end tag must absolutely be
        reached ?

        Thanks for your help.

        Bob
        >
        If you can't read the entire file in, and can't change hosts, you'll have
        to parse it manually. That will be a lot of work.
        >
        Why can't you change servers?
        >
        --
        =============== ===
        Remove the "x" from my email address
        Jerry Stuckle
        JDS Computer Training Corp.
        jstucklex@attgl obal.net
        =============== ===
        >
        >

        Comment

        • Jerry Stuckle

          #5
          Re: read and parsing an XML file using the content

          Bob Bedford wrote:
          >How big is your xml file?
          around 2-3MB. I've created a php script that split bigger files in chunk of
          2-3MB. I've already checked the splitted files and they are OK.
          >
          Also I can't change for further reasons, but mainly as the service offered
          is quite the same as a dedicated server (50GB space, unlimited emails,
          unlimited traffic but with execution time and memory limit only) and I can't
          move to a dedicated server since I know nothing on managing such dedicated
          or virtual server and I've no time to learn and manage it. Also it will be a
          problem since security is very important for me and that's well done by the
          actual ISP.
          >
          I would still look for another hosting company. There are others around
          with high limits, and ones which will allow for larger memory limits.
          Or, go to a managed dedicated server or vps.
          Anyway the problem is that I treat the xml content when I reach the closing
          tag and it seems to treat the content even if it doesn't reach this closing
          tag....like if it arrives at the end of file and no reaching the end tag, it
          executes the code anyway....it is the case or the end tag must absolutely be
          reached ?
          >
          Thanks for your help.
          >
          Bob
          >
          >
          >
          If the file is incomplete, the parser will consider it as malformed xml.
          It will do it's best with the xml, but results probably will not be
          what you want.

          So you're left with handling the file on another system or parsing the
          file with your own code.

          --
          =============== ===
          Remove the "x" from my email address
          Jerry Stuckle
          JDS Computer Training Corp.
          jstucklex@attgl obal.net
          =============== ===

          Comment

          • Toby A Inkster

            #6
            Re: read and parsing an XML file using the content

            Bob Bedford wrote:
            So how to fix it ? I can't read the entire file due to memory limit of
            my server (I can't change server)
            How are you parsing the file?

            Try a stream-based XML parser like expat <http://www.php.net/xml>. This
            reads the XML file as a stream of tags and content, calling your functions
            when it encounters things that you're interested in. It shouldn't use up
            as much memory as DOM-like XML parsers, which read the whole file into RAM
            before they parse it.

            The first example at the following URL is a good starting point:



            --
            Toby A Inkster BSc (Hons) ARCS
            [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
            [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 39 min.]

            Best... News... Story... Ever!

            Comment

            • Jerry Stuckle

              #7
              Re: read and parsing an XML file using the content

              Toby A Inkster wrote:
              Bob Bedford wrote:
              >
              >So how to fix it ? I can't read the entire file due to memory limit of
              >my server (I can't change server)
              >
              How are you parsing the file?
              >
              Try a stream-based XML parser like expat <http://www.php.net/xml>. This
              reads the XML file as a stream of tags and content, calling your functions
              when it encounters things that you're interested in. It shouldn't use up
              as much memory as DOM-like XML parsers, which read the whole file into RAM
              before they parse it.
              >
              The first example at the following URL is a good starting point:
              >

              >
              Hi, Toby,

              Have you found it saves much memory? In my experience the difference
              isn't all that much. It looks like the xml parser caches a lot of the
              file in memory.

              Or maybe it was just the structure of the xml files I was using which
              caused the extra memory usage.

              --
              =============== ===
              Remove the "x" from my email address
              Jerry Stuckle
              JDS Computer Training Corp.
              jstucklex@attgl obal.net
              =============== ===

              Comment

              • Bob Bedford

                #8
                Re: read and parsing an XML file using the content


                "Toby A Inkster" <usenet200803@t obyinkster.co.u ka écrit dans le message de
                news: drjsb5-p7v.ln1@ophelia .g5n.co.uk...
                Bob Bedford wrote:
                >
                >So how to fix it ? I can't read the entire file due to memory limit of
                >my server (I can't change server)
                >
                How are you parsing the file?
                >
                Try a stream-based XML parser like expat <http://www.php.net/xml>. This
                reads the XML file as a stream of tags and content, calling your functions
                when it encounters things that you're interested in. It shouldn't use up
                as much memory as DOM-like XML parsers, which read the whole file into RAM
                before they parse it.
                >
                The first example at the following URL is a good starting point:
                >
                http://uk.php.net/manual/en/function.xml-set-object.php
                Hi Toby,

                the XML parser code I use:

                $xml_parser = xml_parser_crea te();
                xml_set_element _handler($xml_p arser, "startEleme nt", "endElement ");
                xml_set_charact er_data_handler ($xml_parser, "FDataHandler") ;

                while (!feof($fp)){
                $data = fread($fp, 4096);
                if (!xml_parse($xm l_parser, $data, feof($fp))) {
                die(sprintf("XM L error: %s at line %d",
                xml_error_strin g(xml_get_error _code($xml_pars er)),
                xml_get_current _line_number($x ml_parser)));
                }
                }
                fclose($fp);
                xml_parser_free ($xml_parser);

                It's like it parse and executes then endElement function when it reads the
                4096 bytes and doesn't find the closing tags.

                Bob


                Comment

                • Toby A Inkster

                  #9
                  Re: read and parsing an XML file using the content

                  Bob Bedford wrote:
                  while (!feof($fp)){
                  $data = fread($fp, 4096);
                  if (!xml_parse($xm l_parser, $data, feof($fp))) {
                  die(sprintf("XM L error: %s at line %d",
                  xml_error_strin g(xml_get_error _code($xml_pars er)),
                  xml_get_current _line_number($x ml_parser)));
                  }
                  }
                  Basic idea is to catch the failure of xml_parse(), and if !feof() then
                  read another few bytes from the file, append them to $data and then try
                  parsing again.

                  Untested code:

                  while (!feof($fp))
                  {
                  $data = fread($fp, 4096);
                  if (!xml_parse($xm l_parser, $data, feof($fp)))
                  {
                  if (feof($fp))
                  die(sprintf("XM L error: %s at line %d",
                  xml_error_strin g(xml_get_error _code($xml_pars er)),
                  xml_get_current _line_number($x ml_parser)));

                  while (!feof($fp) && !xml_parse($xml _parser, $data, feof($fp)))
                  $data .= fread($fp, 10);
                  }
                  }


                  --
                  Toby A Inkster BSc (Hons) ARCS
                  [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
                  [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 1 day, 2:11.]

                  Best... News... Story... Ever!

                  Comment

                  Working...