Help to Process two very big xml files....

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • fuel

    Help to Process two very big xml files....

    Hello,
    I have two big xml files (around 50-60 MB) each. and I need to
    process the data within each of them. The problem is, I need to
    process each node and compare with the other nodes in the other xml
    file. After iterating through all the nodes, I need to find those
    nodes which have changed or which have been newly introduced.
    Assume the following xml structure,

    <?xml version="1.0"?>
    <root>
    <nodeToProces s>

    </nodeToProcess>
    .....
    </root>

    I have two such xml files. I keep one xml file as the reference and
    compare it with the other. To solve this problem, I thought, I could
    use XPath. However, for now, only DOM based XPath processors are
    there. Since the file is very huge, I dont think I can afford DOM.
    ( Memory constraint )

    How can I approach this problem ? what would be the right way to start
    with.

    P.S ( I am trying to access these elements through Java)


  • Manuel Collado

    #2
    Re: Help to Process two very big xml files....

    fuel escribió:
    Hello,
    I have two big xml files (around 50-60 MB) each. and I need to
    process the data within each of them. The problem is, I need to
    process each node and compare with the other nodes in the other xml
    file. After iterating through all the nodes, I need to find those
    nodes which have changed or which have been newly introduced.
    Assume the following xml structure,
    >
    <?xml version="1.0"?>
    <root>
    <nodeToProces s>
    >
    </nodeToProcess>
    .....
    </root>
    >
    I have two such xml files. I keep one xml file as the reference and
    compare it with the other. To solve this problem, I thought, I could
    use XPath. However, for now, only DOM based XPath processors are
    there. Since the file is very huge, I dont think I can afford DOM.
    ( Memory constraint )
    >
    How can I approach this problem ? what would be the right way to start
    with.
    There are ready-to-run tools for differencing XML files. Please google
    for xml-diff.
    >
    P.S ( I am trying to access these elements through Java)
    Some of the tools are written in Java and some of them are open-source.

    Don't know the performance of these tools with big files.

    Hope this helps.
    --
    Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

    Comment

    • Martin Honnen

      #3
      Re: Help to Process two very big xml files....

      fuel wrote:
      How can I approach this problem ? what would be the right way to start
      with.
      Considering current destktop systems with a main memory of 1 or 2 or 3
      GB I don't think you will run into problems to perform XPath on 60 MB
      files. Just make sure that the Java VM is allowed to allocate enough
      memory http://java.sun.com/javase/6/docs/te...dows/java.html


      --

      Martin Honnen

      Comment

      • jimmy Zhang

        #4
        Re: Help to Process two very big xml files....

        You should check out vtd-xml, which is ideally suited for the task you
        described...




        "fuel" <ajaykumarns@gm ail.comwrote in message
        news:0ced59a0-c563-4493-8091-f0f72848bd89@l4 2g2000hsc.googl egroups.com...
        Hello,
        I have two big xml files (around 50-60 MB) each. and I need to
        process the data within each of them. The problem is, I need to
        process each node and compare with the other nodes in the other xml
        file. After iterating through all the nodes, I need to find those
        nodes which have changed or which have been newly introduced.
        Assume the following xml structure,
        >
        <?xml version="1.0"?>
        <root>
        <nodeToProces s>
        >
        </nodeToProcess>
        .....
        </root>
        >
        I have two such xml files. I keep one xml file as the reference and
        compare it with the other. To solve this problem, I thought, I could
        use XPath. However, for now, only DOM based XPath processors are
        there. Since the file is very huge, I dont think I can afford DOM.
        ( Memory constraint )
        >
        How can I approach this problem ? what would be the right way to start
        with.
        >
        P.S ( I am trying to access these elements through Java)
        >
        >

        Comment

        Working...