High Performance Xml parser

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • rony

    High Performance Xml parser

    Hi,
    I am looking for component which allows me to parse my xml file.
    the reason i am asking this, is because my xml files are huge it can
    reach as far as 1GB more or less.
    the time to parse such a file is something like 5 Hours.
    Now i am using the XmlRead, XmlNode ... (I do not load the file to the
    memory).
    Can you suggest better components to use?

    ** I tried SAX but i couldn't understand how it works, because there is
    no examples for .net , and very bad documentation.
    p.s : I am writing in C#.

    Regards, Rony

  • Joseph Kesselman

    #2
    Re: High Performance Xml parser

    If parsing a 1GB file is taking 5 hours, the problem isn't the parser --
    it's the fact that the data model (presumably an implementation of the
    DOM?) is becoming so huge that your machine's thrashing itself to death
    swapping data in and out of memory.

    SAX-based processing, when appropriate, is indeed a recommended solution
    for that. Or SAX feeding into a more specialized data model. Or --
    perhaps -- an XML database tool, which has its own specialized models
    and may be able to handle paging of data more intelligently than the
    system's default swapper.

    I don't use C#, so I can't advise you regarding specific tools.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden

    Comment

    • Martin Honnen

      #3
      Re: High Performance Xml parser

      rony wrote:
      I am looking for component which allows me to parse my xml file.
      the reason i am asking this, is because my xml files are huge it can
      reach as far as 1GB more or less.
      the time to parse such a file is something like 5 Hours.
      Now i am using the XmlRead, XmlNode ... (I do not load the file to the
      memory).
      Can you suggest better components to use?
      >
      ** I tried SAX but i couldn't understand how it works, because there is
      no examples for .net , and very bad documentation.
      p.s : I am writing in C#.
      XmlNode in the .NET framework is part of .NET's DOM implementation thus
      if you use XmlNode then your code is loading the XML in memory, or at
      least part of it depending on what exactly your code does.

      With .NET you have XmlReader for fast forwards only pull parsing, that
      is the best approach the .NET framework has to offer for parsing such
      large files. With the XmlReader the memory/resource consumption should
      not increase with the size of the XML as the reader pulls in the XML
      node by node.

      I think microsoft.publi c.dotnet.xml is a better place to discuss .NET
      specific questions on parsing XML.

      --

      Martin Honnen

      Comment

      • rony

        #4
        Re: High Performance Xml parser

        HI,
        What i am doing is making a reader with XmlTextReader
        end then
        while (reader.Read())
        {
        }
        so nothing is loaded to the memory.
        but still i think 5 hours to 1gb of xml file is very slow.
        is there any components that based on sax that can improve the
        performance?


        Martin Honnen wrote:
        rony wrote:
        >
        I am looking for component which allows me to parse my xml file.
        the reason i am asking this, is because my xml files are huge it can
        reach as far as 1GB more or less.
        the time to parse such a file is something like 5 Hours.
        Now i am using the XmlRead, XmlNode ... (I do not load the file to the
        memory).
        Can you suggest better components to use?

        ** I tried SAX but i couldn't understand how it works, because there is
        no examples for .net , and very bad documentation.
        p.s : I am writing in C#.
        >
        XmlNode in the .NET framework is part of .NET's DOM implementation thus
        if you use XmlNode then your code is loading the XML in memory, or at
        least part of it depending on what exactly your code does.
        >
        With .NET you have XmlReader for fast forwards only pull parsing, that
        is the best approach the .NET framework has to offer for parsing such
        large files. With the XmlReader the memory/resource consumption should
        not increase with the size of the XML as the reader pulls in the XML
        node by node.
        >
        I think microsoft.publi c.dotnet.xml is a better place to discuss .NET
        specific questions on parsing XML.
        >
        --
        >
        Martin Honnen
        http://JavaScript.FAQTs.com/

        Comment

        Working...