Merging 2 different XML files...

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sakhawn
    New Member
    • Nov 2008
    • 9

    Merging 2 different XML files...

    Hi,

    I have two files to merge using Java based on a similar text identifier:

    File 1:
    Code:
    <ListRecords>
    <record> 
    <header> 
    <identifier>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier> 
    <datestamp>2007-05-29T15:55:00Z</datestamp> 
    <datestampasdatetime>2007-05-29T17:55:00+02:00</datestampasdatetime> 
    </header> 
    <metadata> 
    <lom xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"> 
    <general > 
    <identifier>
    <catalog>oai</catalog>
    <entry>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</entry>
    </identifier
    <title> 
    <langstring> 
    <value>Graduation mw. S. de Caralt</value> 
    <language>en</language> 
    </langstring> 
    </title> 
    <catalogentry> 
    <catalog>nl.wur.wurtv</catalog> 
    <entry> 
    <langstring> 
    <value>2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</value> 
    <language>x-none</language> 
    </langstring> 
    </entry> 
    </catalogentry> 
    <grouplanguage>en</grouplanguage> 
    <description> 
    <langstring> 
    <value>Sponge Culture: Learning from Biology and Ecology</value> 
    <language>en</language> 
    </langstring> 
    </description> 
    </general> 
    <lifecycle xmlns="" /> 
    <metametadata > 
    <metadatascheme>LORENET</metadatascheme> 
    </metametadata> 
    </lom> 
    </metadata> 
    </record>
    <….More Records here…..!>
    </ListRecords>
    File 2:
    Code:
    <ListRecords>
     <record>
     <header>
      <identifier>some value herer</identifier> 
      <datestamp>2008-07-14T09:23:25Z</datestamp> 
      </header>
     <metadata>
     <group xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"">
      <title>User manipulating this</title> 
     <feed>
      <title>My feed</title> 
      <url>http://no.url.available</url> 
     <item>
      <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> 
     <events>
     <event>
      <dateTime>2008-03-26T13:27:49.00</dateTime> 
     <action>
      <actionType>doSomeAtcion</actionType> 
      </action>
      </event>
      </events>
      </item>
      </feed>
      </group>
      </metadata>
      </record>
    <....More Records here....!>
    </ListRecords>
    I want to merge <metadata> element and all its sub elements from file 1 into the file 2 within its <metadata element> based on unique text of element "<identifier>oa i:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier>" in file 1and similar ID <guid>oai:tripl e-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> in file 2.

    Any suggestions and guidelines will be highly appreciated.

    Thnx.
  • sakhawn
    New Member
    • Nov 2008
    • 9

    #2
    sorry i forgot to mention, i have to use Java to merge them...

    Comment

    • jkmyoung
      Recognized Expert Top Contributor
      • Mar 2006
      • 2057

      #3
      Need to define:
      • Rows/identifiers
      • Fields to be merged
      • Merging rules



      Please correct if any of the following is wrong.
      Assumptions from looking at the code:

      Fields summarized in xpaths:
      File 1
      rows: /ListRecords/record
      row id: header/identifier

      File 2
      rows: /ListRecords/record/metadata/group/item


      Let's look at the seperate xml sections to be merged:
      File 2:
      [code=xml]
      <item>
      <guid>oai:tripl e-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid>
      <events>
      <event>
      <dateTime>200 8-03-26T13:27:49.00</dateTime>
      <action>
      <actionType>doS omeAtcion</actionType>
      </action>
      </event>
      </events>
      </item>
      [/code]
      Is this technically a 'join' ? Eg are you just adding fields from one file to another, or are you copying over existing fields?


      Since you're merging into a file I would recommend either:
      1. DOM. Open both files with DOM. Add nodes to File1 DOM. Save back to file.
      2. XSLT. Performance may be less than optimal, but code is much more maintainable.

      Comment

      • sakhawn
        New Member
        • Nov 2008
        • 9

        #4
        Thnx. a lot for ur reply i was so worried about it as i have a deadline
        Actually i want to join the record of similar id from file 2 into file 1 after the file 1 record for that id ends, the output might look like:
        Code:
        <ListRecords>
        <record>
        <header>
        <identifier>some value here</identifier> 
        <datestamp>2008-07-14T09:23:25Z</datestamp> 
        </header>
        <metadata>
        <group xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"">
        <title>User manipulating this</title> 
        <feed>
        <title>My feed</title> 
        <url>http://no.url.available</url> 
        <item>
        <guid>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</guid> 
        <events>
        <event>
        <dateTime>2008-03-26T13:27:49.00</dateTime> 
        <action>
        <actionType>doSomeAtcion</actionType>  
        </lom> 
        </action>
        </event>
        </events>
        </item>
        </feed>
        </group>
        <header> 
        <identifier>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</identifier> 
        <datestamp>2007-05-29T15:55:00Z</datestamp> 
        <datestampasdatetime>2007-05-29T17:55:00+02:00</datestampasdatetime> 
        </header> 
        <metadata> 
        <lom xsi:schemaLocation="http://dpc.uba.uva.nl/schema/lom/triplel http://dpc.uba.uva.nl/schema/lom/triplel/lom.xsd"> 
        <general > 
        <identifier>
        <catalog>oai</catalog>
        <entry>oai:triple-l:2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</entry>
        </identifier
        <title> 
        <langstring> 
        <value>Graduation mw. S. de Caralt</value> 
        <language>en</language> 
        </langstring> 
        </title> 
        <catalogentry> 
        <catalog>nl.wur.wurtv</catalog> 
        <entry> 
        <langstring> 
        <value>2c7ba037-52a6-4323-97dd-b6ea1cdbfd18</value> 
        <language>x-none</language> 
        </langstring> 
        </entry> 
        </catalogentry> 
        <grouplanguage>en</grouplanguage> 
        <description> 
        <langstring> 
        <value>Sponge Culture: Learning from Biology and Ecology</value> 
        <language>en</language> 
        </langstring> 
        </description> 
        </general> 
        <lifecycle xmlns="" /> 
        <metametadata > 
        <metadatascheme>LORENET</metadatascheme> 
        </metametadata> 
        </lom> 
        </metadata> 
        </metadata>
        </record>
        i have to do this for almost 10 records for similar ids in both files

        Comment

        • jkmyoung
          Recognized Expert Top Contributor
          • Mar 2006
          • 2057

          #5
          Considering this, I would probably use xslt.

          Driving xslt in Java (sample):

          [code=java]
          //set file names
          File file1 = new File("Filename1 .xml");
          String filename2 = "Filename2.xml" ;
          File xslt = new File("FileXSLT. xslt");
          File dest = new File("resultFil e.xml");

          //build transformer
          TransformerFact ory xformFactory = TransformerFact ory.newInstance ();
          transformer = xformFactory.ne wTransformer(ne w StreamSource(xs lt));

          // set file2 filename parameter
          transformer.set Parameter("file 2", FileName2);

          // Modularization :( looks stupid, but actually makes it perform better.
          DocumentBuilder Factory docBuildFactory = DocumentBuilder Factory.newInst ance();
          DocumentBuilder parser = docBuildFactory .newDocumentBui lder();
          Document document = parser.parse(fi le1);

          transformer.tra nsform(new StreamSource(so urce), new StreamResult(de st));
          [/code]
          For more info, google "java xslt transformation"

          XSLT: Starting with a copy template, add template for the proper fields to merge them. I'm having trouble seeing which fields need to be merged, so I hope you can figure it out from the example.
          [code=xml]
          <xsl:param name="file2" select="''"/><!-- defaults to empty string -->
          <xsl:variable name="doc2" select="documen t($file2)"/><!-- convert to nodes -->

          <xsl:template match="*"><!-- copy template -->
          <xsl:copy>
          <xsl:copy-of select="@*"/>
          <xsl:apply-templates/>
          </xsl:copy>
          </xsl:template>

          <xsl:template match="record">
          <xsl:copy>
          <xsl:copy-of select="@*"/>
          <xsl:apply-templates/>
          <!-- add in other stuff here -->
          <xsl:copy-of select="$doc2/ListRecords/record/metadata/group/item[guid = current()/header/identifier]"/>
          </xsl:copy>
          </xsl:template>
          [/code]

          Key line in all of this is:
          <xsl:copy-of select="$doc2/ListRecords/record/metadata/group/item[guid = current()/header/identifier]"/>
          Copy the item nodes which match the current node's id.

          Customize this to merge as you need.

          Comment

          • sakhawn
            New Member
            • Nov 2008
            • 9

            #6
            Thank you very much for the reply at-least i got the idea but problem is that i am totally new with XSLT so of-course have no time to start with tutorials due to deadline but still i am trying and i hope to solve it but in case i have any problems i will post them.

            Comment

            • sakhawn
              New Member
              • Nov 2008
              • 9

              #7
              Hi,
              Thnx. a lot for ur help and tried (still trying) but couldn't manage to write the XSLT file correctly and also its not possible to start with tutorial for xslt from beginning due to deadline so please help me so at-least when this first task is done i will be able to read more about it as tomorrow is deadline :-(

              As top elements <ListRecords> and then <record> in both files.This means that this <record> is one unique record based on <identifer> value in file1 (line 4) and <guid> value in file2 (line 14). This unique record of these similar id's in both files have different data elements i mean different fields. I want to merge this unique record of mentioned ID from file 2 into file 1.

              There is also this <metadata> element in both files, file1 (line 8 to 42) and in file2 (line 7 to 26) so i want to simply copy this <metadata> element and elements in between (sub-elements) till line 43 from file 2 into file 1 after file1's <metadata> element ends at line 26 and after that last element would be then simply <record>
              There are 10 unique records in both files and final file should mention all of them in a similar way so i hope if one is correctly merged others follow the same template match.
              Please help me as i am really worried and first task in a new language is always such headache

              Best Regards

              Comment

              • sakhawn
                New Member
                • Nov 2008
                • 9

                #8
                "There is also this <metadata> element in both files, file1 (line 8 to 42) and in file2 (line 7 to 26) so i want to simply copy this <metadata> element and elements in between (sub-elements) till line 43 from file 2 into file 1 after file1's <metadata> element ends at line 26 and after that last element would be then simply <record>"

                Sorry a little mistake in above paragraph i want to merge record from file 1 into file 2 and not the other way around.

                Comment

                • jkmyoung
                  Recognized Expert Top Contributor
                  • Mar 2006
                  • 2057

                  #9
                  Could you show us what you have so far? If you can get the first few fields copying correctly, then it'll be easier to figure out mistakes you're making with the rest.

                  Comment

                  • sakhawn
                    New Member
                    • Nov 2008
                    • 9

                    #10
                    Thnx. for the reply..
                    Actually i only changed the xpath u provided as i made a mistake while mentioning which file to copy so i have to copy data from file 1 into file 2 under record element based on that unique ID. So i only changed xpath in the sample u provided (i am not sure i did it write as i m messed up) so it is:

                    [code=xml]
                    <?xml version="1.0"?>
                    <xsl:styleshe et version = '1.0'
                    xmlns:xsl='http ://www.w3.org/1999/XSL/Transform'>
                    <xsl:output method="xml" indent="yes"/>
                    <xsl:param name="file1" select="''"/><!-- defaults to empty string -->
                    <xsl:variable name="doc1" select="documen t($file1)"/><!-- convert to nodes -->
                    <xsl:template match="*"><!-- copy template -->
                    <xsl:copy>
                    <xsl:copy-of select="@*"/>
                    <xsl:apply-templates/>
                    </xsl:copy>
                    </xsl:template>
                    <xsl:template match="record">
                    <xsl:copy>
                    <xsl:copy-of select="@*"/>
                    <xsl:apply-templates/>
                    <!-- copy data from file 1 into file 2 based on guid in file 2 -->
                    <xsl:copy-of select="$doc1/ListRecords/record/header[identifier = current()/item//feed/guid]"/> <!-- dont know whether where will it copy that data and under which element of file 2 -->
                    </xsl:copy>
                    </xsl:template>
                    </xsl:stylesheet>
                    [/code]
                    So i dont know how to copy all metadata files from file one into file 2 exactly after file 2 metadata element ends. I know i didnt do much...
                    Hope u would help to solve it.

                    Comment

                    • jkmyoung
                      Recognized Expert Top Contributor
                      • Mar 2006
                      • 2057

                      #11
                      The easiest way I can think of (not the best programatically ) is to have a last metadata template. Use xpath like: "metadata[not(following:: metadata)]"
                      [code=xml]
                      <xsl:template match="metadata[not(following:: metadata)]">
                      <xsl:copy>
                      <xsl:copy-of select="@*"/>
                      <xsl:apply-templates/>
                      </xsl:copy>
                      <!-- add rest from other file -->
                      <xsl:copy-of select="$doc1//metadata"/>
                      </xsl:template>
                      [/code]

                      Comment

                      • sakhawn
                        New Member
                        • Nov 2008
                        • 9

                        #12
                        Thnx. a lot for the help..
                        Yes it works but only in case i have one record in each of the files but when merging more records, would require some concrete appraoch...

                        Comment

                        Working...