Dataset. Most efficient approach.

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • bob

    Dataset. Most efficient approach.

    Hi,
    My app needs to read a text file, compare the data to that already
    stored in the DB and generate a text file of the differences.

    The UI displays the text file data and the db data in a series of
    Datagridviews.
    Piclk a row in the text file master table dgv and the other dgvs move
    to the appropriate records so you can eyeball the differences before
    generating the difference file.

    I have used a dataset that contains two 'arms'
    The text file arm and the database arm.
    BindingSources are used to make the dataset relationships available
    to the code.

    Works OK but when I feed it the full text file it takes a long time to
    do the comparison work.
    Profiling shows the most expensive part is moving through the dataset
    with the bindingSource.M ovenext.

    So I figured that maybe using the threadpool to do the iteration and
    comparison would speed things up.

    Sort of
    while(position + 1 < bindingSource.C ount)
    {
    assign all the bindingsources and the dataset to a helperclass object
    'u'
    ThreadPool.Queu eUserWorkItem(n ew WaitCallback (MyComparisionF unction),
    u);
    MasterBindingSo urce.MoveNext() ;
    position++;
    }

    However the comparision code fails when it tries to get one of the
    datarelaionship s back from one of the bindingsources. "Relation not
    found"
    Intellisense indicates it is there but you can't argue with the
    executing code.
    Maybe it is lost in the casting from object to helper class .

    Anyway I started to get the feeling that maybe my whole approach is
    inefficient.

    So before I go completely off into the weeds what is the general
    opinion on how to handle this task?

    I am starting to think that maybe two collections of widgets maybe the
    way to go. Then use tree views instead of dgvs

    textfile -Collection A.
    db -Collection B. (one B widget for each widget in A)
    for( int i =0;i<A.Count;i+ +)
    {
    if (A.items[i] != B.items[i]
    AddtoDiffFile(A .Items[i])
    }

    Thanks
    Bob

  • Nicholas Paldino [.NET/C# MVP]

    #2
    Re: Dataset. Most efficient approach.

    Bob,

    Are you using a database server for this? You might want to consider
    uploading the contents of the text file into a temp tabl on the server and
    then find the differences using a query. I would say that it has the
    potential of being much faster than you doing all the parsing and comparison
    on the client side.

    This would especially be the case if the text file is a delimited or
    positional file of some sort, as you can use the bulk loader on SQL Server
    (if that is what you are using) to import the data.


    --
    - Nicholas Paldino [.NET/C# MVP]
    - mvp@spam.guard. caspershouse.co m

    "bob" <startatbob_cle gg@cutthis.adri ley.co.nzwrote in message
    news:m1ms83hrks m2bhp97r49pmocc 8v4g3htvn@4ax.c om...
    Hi,
    My app needs to read a text file, compare the data to that already
    stored in the DB and generate a text file of the differences.
    >
    The UI displays the text file data and the db data in a series of
    Datagridviews.
    Piclk a row in the text file master table dgv and the other dgvs move
    to the appropriate records so you can eyeball the differences before
    generating the difference file.
    >
    I have used a dataset that contains two 'arms'
    The text file arm and the database arm.
    BindingSources are used to make the dataset relationships available
    to the code.
    >
    Works OK but when I feed it the full text file it takes a long time to
    do the comparison work.
    Profiling shows the most expensive part is moving through the dataset
    with the bindingSource.M ovenext.
    >
    So I figured that maybe using the threadpool to do the iteration and
    comparison would speed things up.
    >
    Sort of
    while(position + 1 < bindingSource.C ount)
    {
    assign all the bindingsources and the dataset to a helperclass object
    'u'
    ThreadPool.Queu eUserWorkItem(n ew WaitCallback (MyComparisionF unction),
    u);
    MasterBindingSo urce.MoveNext() ;
    position++;
    }
    >
    However the comparision code fails when it tries to get one of the
    datarelaionship s back from one of the bindingsources. "Relation not
    found"
    Intellisense indicates it is there but you can't argue with the
    executing code.
    Maybe it is lost in the casting from object to helper class .
    >
    Anyway I started to get the feeling that maybe my whole approach is
    inefficient.
    >
    So before I go completely off into the weeds what is the general
    opinion on how to handle this task?
    >
    I am starting to think that maybe two collections of widgets maybe the
    way to go. Then use tree views instead of dgvs
    >
    textfile -Collection A.
    db -Collection B. (one B widget for each widget in A)
    for( int i =0;i<A.Count;i+ +)
    {
    if (A.items[i] != B.items[i]
    AddtoDiffFile(A .Items[i])
    }
    >
    Thanks
    Bob
    >

    Comment

    • bob

      #3
      Re: Dataset. Most efficient approach.


      Hi Nicholas,
      Good thought.
      I'll give it a go.
      Thanks
      Bob

      On Fri, 6 Jul 2007 12:00:52 -0400, "Nicholas Paldino [.NET/C# MVP]"
      <mvp@spam.guard .caspershouse.c omwrote:
      >Bob,
      >
      Are you using a database server for this? You might want to consider
      >uploading the contents of the text file into a temp tabl on the server and
      >then find the differences using a query. I would say that it has the
      >potential of being much faster than you doing all the parsing and comparison
      >on the client side.
      >
      This would especially be the case if the text file is a delimited or
      >positional file of some sort, as you can use the bulk loader on SQL Server
      >(if that is what you are using) to import the data.

      Comment

      • mark4asp

        #4
        Re: Dataset. Most efficient approach.

        On Sat, 07 Jul 2007 03:47:33 +1200, bob
        <startatbob_cle gg@cutthis.adri ley.co.nzwrote:
        >Hi,
        >My app needs to read a text file, compare the data to that already
        >stored in the DB and generate a text file of the differences.
        >
        >The UI displays the text file data and the db data in a series of
        >Datagridview s.
        >Piclk a row in the text file master table dgv and the other dgvs move
        >to the appropriate records so you can eyeball the differences before
        >generating the difference file.
        >
        >I have used a dataset that contains two 'arms'
        >The text file arm and the database arm.
        >BindingSourc es are used to make the dataset relationships available
        >to the code.
        >
        >Works OK but when I feed it the full text file it takes a long time to
        >do the comparison work.
        >Profiling shows the most expensive part is moving through the dataset
        >with the bindingSource.M ovenext.
        >
        >So I figured that maybe using the threadpool to do the iteration and
        >comparison would speed things up.
        >
        >Sort of
        >while(positi on + 1 < bindingSource.C ount)
        >{
        >assign all the bindingsources and the dataset to a helperclass object
        >'u'
        >ThreadPool.Que ueUserWorkItem( new WaitCallback (MyComparisionF unction),
        >u);
        >MasterBindingS ource.MoveNext( );
        >position++;
        >}
        >
        >However the comparision code fails when it tries to get one of the
        >datarelaionshi ps back from one of the bindingsources. "Relation not
        >found"
        >Intellisense indicates it is there but you can't argue with the
        >executing code.
        >Maybe it is lost in the casting from object to helper class .
        >
        >Anyway I started to get the feeling that maybe my whole approach is
        >inefficient.
        >
        >So before I go completely off into the weeds what is the general
        >opinion on how to handle this task?
        >
        >I am starting to think that maybe two collections of widgets maybe the
        >way to go. Then use tree views instead of dgvs
        >
        >textfile -Collection A.
        >db -Collection B. (one B widget for each widget in A)
        >for( int i =0;i<A.Count;i+ +)
        >{
        >if (A.items[i] != B.items[i]
        > AddtoDiffFile(A .Items[i])
        >}
        >
        >Thanks
        >Bob
        I've never found datasets to be very efficient in using resources. I
        prefer to use a custom list like a List<T>, but there are better
        Generics such as DataBindingList <T(but I think these are all custom).

        I can't understand Nicholas' solution as you specifically say the
        differences need to be eye-balled by the client.

        How big are these datasets anyway? You could do it entirely on the
        client by downloading the data to javascript arrays or objects and
        posting back only the differences using AJAX.

        Comment

        • mark4asp

          #5
          Re: Dataset. Most efficient approach.

          On Sat, 07 Jul 2007 08:55:34 GMT, mark4asp <mark4asp@gmail .comwrote:
          >Datagridview s
          javascript ... AJAX.
          - Ooops. Sorry. I didn't fully read you post. But I still don't like
          using datasets.

          Comment

          • bob

            #6
            Re: Dataset. Most efficient approach.

            Hi Mark,
            My take on Nicholas's solution is that the database engine is more
            efficent at set based comparisions than client code iterating through
            the dataset. The result set could be displayed giving an ordinary list
            of the differences.
            On the client side I too lean towards lists of custom objects.

            This project started life as a quick and dirty proof of concept.
            The main concern was to see if the resultant 'difference' files would
            be accepted by another app that maintains the database further
            downstream.

            The idea of the client being able to view the differences was based on
            the notion that there wouldn't be too many differences. It was a bit
            of a shock to to find a large disparity between the two sets.
            So the point and click advantages of the linked Datagridviews, while
            pretty, is not much use.
            30000 parent records, approx 70% differences.

            Thanks for your thoughts.
            Regards
            Bob


            >
            >I've never found datasets to be very efficient in using resources. I
            >prefer to use a custom list like a List<T>, but there are better
            >Generics such as DataBindingList <T(but I think these are all custom).
            >
            >I can't understand Nicholas' solution as you specifically say the
            >differences need to be eye-balled by the client.
            >
            >How big are these datasets anyway? You could do it entirely on the
            >client by downloading the data to javascript arrays or objects and
            >posting back only the differences using AJAX.

            Comment

            Working...