Design Advice: XML and Database Comparison VB.Net

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • AllBeagle
    New Member
    • Sep 2008
    • 13

    Design Advice: XML and Database Comparison VB.Net

    Hello Everyone,

    I'm trying to build a web-based application for a client of mine and I keep finding holes in my design, so I could use some guidance if anyone has any to offer. Let me try to explain what it is I'm trying to do and hopefully someone has an idea that's not going to take me a long time to implement and isn't above my not-so-advanced skill level. My attempts with data tables and arraylists have failed thus far. Looks like my Sunday just got ruined, here we go...

    I have a XML file that is created with real estate data that is pulled from a database. This database feed is through a RETS connector, so I have to set up an "admin" section where a single connection is made, data is read, and the XML file is created. I'm not able to either run the entire website off the RETS connection or create a new database because of connection and compatibility issues, so I'm left with XML.

    Now, currently I have it set up where the RETS connection is made and a new XML file is created with the new data overwriting the existing one. The problem I'm having is with finding only the new or old data so I can either A) download it and the images associated with it if it's new data or B) if it's data that is no longer in the database, remove it and delete the images. Like I said I've tried a bunch of things with data tables and arraylists, but my attempts have not been successful.

    Code:
    XML Layout---
    <properties>
    <details>
    <id>13515</id>
    <type>Single Family</type>
    <bedrooms>2</bedrooms>
    .....
    </details>
    <details>
    <id>1534</id>
    <type>Condo</type>
    <bedrooms>1</bedrooms>
    .....
    </properties>
    My attempt to remedy the issues I'm having was to read the current XML file at start (before writing the new one) and add the unique "id" field to a data table or arraylist. Then when I read the database, create another data table or arraylist as I'm writing the new XML. This would leave me with two sets of data: the old ids (real estate properties that were in the XML before connecting) and the new ids (real estate properties that were found when connecting to the database). Now once I have this data, here is what I need to do...

    1. Remove any records that have duplicates along with their originals. This would leave me with only properties that have changed. (if you've seen my other posts recently, this is what I've been trying to do but last night it hit me that the next 2 tasks are going to be more difficult to implement)

    2. Once I'm left with the rest of ids, I know that either A) they are old data that has been removed from the database or B) new data that has been added to the database since we last updated. Distinguishing between the 2 is my main problem. I already have the code set up to complete the tasks (download/delete images, etc), but my problem is making sure it's doing the correct thing with each id.

    Little more on my RETS connection: It has 1700+ properties and I'm pulling a lot more data than the example XML above. The client wishes to run the app in the morning, updating the XML for the site so it has all the latest properties. I've already spent many hours getting the "base" data into my directories, now I have to make sure it's only finding the new properties and removing old ones so it runs for minutes rather than hours and hours like it took to get the 10,000+ photos.

    One idea that has hit me as I'm typing this out is some kind of XML comparison. Would I be able to create 2 seperate XML files and compare/contrast them efficiently?

    Thanks a million (again) if anyone can help me.
  • mldisibio
    Recognized Expert New Member
    • Sep 2008
    • 191

    #2
    It's been awhile since I've manipulated XML data, but your synchronization task is common enough. I can give some general pointers.
    1. I Hope that whatever searching through new xml or old xml you do you are using XPath queries along with either the .NET Xml libraries or with the MSXML DOM. Given that your xml schema is fairly straightforward , you should be able to read each id from one file and find or not find it in the second file.
    Using System.Xml.Xpat h.XPathNavigato r.Select(...) (Framework) or XmlDomNode.Sele ctNodes(...) (MSXML) should eliminate any need for helper arrays or other structures...no t that it was a bad idea...but simply not necessary. Essentially, selecting nodes with an xpath query returns a node list of id's and/or as many other fields you want to store or compare.

    2. Consider this path for your synchronization :
    a. Remove any nodes from yesterday's file whose ID is not found in today's file.
    b. Create a third XmlDoc of new inserts: any node from today's file not in yesterday's file. As you add these nodes to the new xml doc, remove them completely from today's source file.
    c. At this point, yesterday's file and today's file should have the same nodes and id's. Now I am assuming that you have an option of comparing the data before doing an additional time-consuming photo retrieval? Because if you already have all the data, you are done.
    - If you can compare before retrieving more data:
    You need to figure out how to most efficiently compare them to see if today's data represents an update, but that will need to be done somehow. I suggest that as you finish comparing each node, you remove it from today's file, which reduces search time on that one, and optionally, you remove it from yesterday's file and add it to the third "new" data to reduce the size of yesterday's file as well.
    d. Finally, you simply combine the "new data" xml file with the "updated file". If you had to do a comparison, then yesterday's data is now the updated data (or optionally you already merged it one by one into the new file). If you skipped comparison because everything was already in today's file, then you just merge today's "updates" with the new "inserts".

    I realize this is rather high level, but I hope it helps somewhat. Once you start writing node comparisons, you can tweak performance by using the correct node readers/writers. If any of this is unfamiliar, you can read about it starting at: Process Xml Data Using the XPath Data Model

    Comment

    Working...