Reading 600Mb XML file

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Tomislav Bilic

    Reading 600Mb XML file

    Hello,

    I need to parse one XML file that has size of 600Mb. Parsing itself is
    not a problem. Problem occures while reading the file that large.

    Here is the code:

    function set_file($file2 parse) {
    $chunksize = 1*(1024*1024); // how many bytes per chunk
    $buffer = '';
    $handle = fopen($file2par se, 'rb');
    if ($handle === false) {
    return false;
    }
    while (!feof($handle) ) {
    $buffer.=fread( $handle, $chunksize);
    }
    fclose($handle) ;
    $this->rdf_content=$b uffer;
    }


    After cca. 5-10 minutes of execution, error occures:

    CGI Timeout
    The specified CGI application exceeded the allowed time for processing.
    The server has deleted the process.

    I changed max_execution_t ime from php.ini from 300 to 30000, but that
    didn't help.

    Does anybody have experience with it?

    --
    Tomislav Bilic
    Escape d.o.o.

    --
    GSM: +385 91 577 1025
    ICQ: 1824223
  • scrunchy2k@yahoo.com

    #2
    Re: Reading 600Mb XML file

    How much memory and paging space do you have?

    Comment

    • Tomislav Bilic

      #3
      Re: Reading 600Mb XML file

      scrunchy2k@yaho o.com wrote:[color=blue]
      > How much memory and paging space do you have?[/color]

      512Mb RAM and 20Gb HDD space, WinXp, PHP5


      --
      Tomislav Bilic
      Escape d.o.o.

      --
      GSM: +385 91 577 1025
      ICQ: 1824223

      Comment

      • Daniel Tryba

        #4
        Re: Reading 600Mb XML file

        In comp.lang.php Tomislav Bilic <tomislav_remov ethis@escapestu dio.net> wrote:[color=blue]
        > while (!feof($handle) ) {
        > $buffer.=fread( $handle, $chunksize);
        > }
        > fclose($handle) ;
        > $this->rdf_content=$b uffer;[/color]

        Couldn't you use a SAX parser to avoid reading the whole file into
        memory?
        [color=blue]
        > After cca. 5-10 minutes of execution, error occures:
        >
        > CGI Timeout
        > The specified CGI application exceeded the allowed time for processing.
        > The server has deleted the process.
        >
        > I changed max_execution_t ime from php.ini from 300 to 30000, but that
        > didn't help.
        >
        > Does anybody have experience with it?[/color]

        There are many timeouts that can happen, this one is propably not from
        PHP since you changed the max_execution_t ime timeout, so maybe it is the
        webserver that decides the script took to long, maybe it's a limit in
        the OS you are using. It may even be the client.

        FUP to c.l.p

        Comment

        • scrunchy2k@yahoo.com

          #5
          Re: Reading 600Mb XML file

          Tomislav Bilic wrote:[color=blue]
          > scrunchy2k@yaho o.com wrote:[color=green]
          > > How much memory and paging space do you have?[/color]
          >
          > 512Mb RAM and 20Gb HDD space, WinXp, PHP5[/color]

          I'd definitely add 256MB of RAM. I suspect your
          free RAM is only half of the file's size, so
          your system must be doing a lot of paging.

          Comment

          • Good Man

            #6
            Re: Reading 600Mb XML file

            Tomislav Bilic <tomislav_remov ethis@escapestu dio.net> wrote in news:da3bof
            $os2$1@ss405.t-com.hr:
            [color=blue]
            > scrunchy2k@yaho o.com wrote:[color=green]
            >> How much memory and paging space do you have?[/color]
            >
            > 512Mb RAM and 20Gb HDD space, WinXp, PHP5[/color]

            512MB RAM, 600MB XML File.... you do the math!

            Comment

            • Tomislav Bilic

              #7
              Re: Reading 600Mb XML file

              Good Man wrote:[color=blue]
              > 512MB RAM, 600MB XML File.... you do the math![/color]

              The problem was parsing method. Parsing this way is a memory eater. I'm
              parsing with SAX now and it is functioning quite well...

              Thanks for pointing out for the problem :)

              --
              Tomislav Bilic
              Escape d.o.o.

              --
              GSM: +385 91 577 1025
              ICQ: 1824223

              Comment

              Working...