Very large dataset

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • eladla

    Very large dataset

    Hi!
    I am currentlly working on a project where I need to query the database and
    this query returns around 1M rows that I then need to process.
    I`m currently holding it all in a datatable, but I have a feeling that this
    type is very unefficient.
    Does anyone have an idea of a more efficient way of holding this large
    amount of data?
  • Marc Gravell

    #2
    Re: Very large dataset

    1M? I'd be keeping them in the database, then... but if your app *must* have
    them on hand, then I'd look at writing my own custom classes populated with
    a data-reader.

    Marc


    Comment

    • sloan

      #3
      Re: Very large dataset

      Youch.

      Choice #1
      Use an IDataReader to loop over them. And do your stuff "per record".

      Choice #2
      Use an IDataReader, and push them into little objects. (as per previous
      post), and work with them there.

      How do you do this? see this blog entry:

      5/24/2006
      Custom Objects/Collections and Tiered Development



      "eladla" <eladla@newsgro ups.nospamwrote in message
      news:5DE7FBE6-78DC-4695-A953-2D1596A541F3@mi crosoft.com...
      Hi!
      I am currentlly working on a project where I need to query the database
      and
      this query returns around 1M rows that I then need to process.
      I`m currently holding it all in a datatable, but I have a feeling that
      this
      type is very unefficient.
      Does anyone have an idea of a more efficient way of holding this large
      amount of data?

      Comment

      • Kevin Yu [MSFT]

        #4
        RE: Very large dataset

        Hi,

        I agree with Marc, that we should keep the data in database, instead of
        filling them to a dataset. In addition, if the process can be done at
        server side, you can write some stored procedures and call them using a
        SqlCommand object. Thus, no data will be transfered from server to client,
        which saves a lot of resources.

        If you can give some detailed information about how you're going to process
        these data, we might be able to give some more advice.

        Kevin Yu
        Microsoft Online Community Support

        =============== =============== =============== =====
        Get notification to my posts through email? Please refer to
        Find official documentation, practical know-how, and expert guidance for builders working and troubleshooting in Microsoft products.

        ications.
        Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
        where an initial response from the community or a Microsoft Support
        Engineer within 1 business day is acceptable. Please note that each follow
        up response may take approximately 2 business days as the support
        professional working with you may need further investigation to reach the
        most efficient resolution. The offering is not appropriate for situations
        that require urgent, real-time or phone-based interactions or complex
        project analysis and dump analysis issues. Issues of this nature are best
        handled working with a dedicated Microsoft Support Engineer by contacting
        Microsoft Customer Support Services (CSS) at
        http://msdn.microsoft.com/subscripti...t/default.aspx.
        =============== =============== =============== =====

        (This posting is provided "AS IS", with no warranties, and confers no
        rights.)

        Comment

        • eladla

          #5
          RE: Very large dataset

          Thank you all for your replys.
          I will elaborate on my specific situation...
          I retrieve the data from the database into a datatable and then iterate
          through the datatable and copy each block of rows (blocks are devided by user
          id) to a struct and send an array of these strucs to a DLL.
          I could get the data and send it block by block....but that will drive the
          calls to the SQL server up...I`m afried this will not scale up well for a web
          environment with many concurrent users, on the other hand...my current
          implementation is not working out for me either.
          I would be happy to get your thoughts on this.

          Comment

          • Marc Gravell

            #6
            Re: Very large dataset

            OK, so what is the purpose... roughly. For what reason does the dll need 1M
            rows? Basically, this isn't a scalable model, and I would be looking to
            understand why this step is required...

            For instance: could the entire thing be done inside the database? Still lots
            of IO (reads), but less network (but uses the CPU of the SQL server, which
            precludes scale-out). Alternatively, would it be possible to dump the data
            into e.g. csv eveery [time interval determined by scenario], and have the
            callers obtain this file for processing? Still uses network IO to transfer
            file, but doesn't hammer the database each time...

            By preference, avoid the need to query millions of rows... maybe for data
            import / export, and possibly some reportiing scenarios (counter with
            OLAP)... but not day-to-day processing on a transactional database.

            Marc


            Comment

            • eladla

              #7
              Re: Very large dataset

              This process is done once at the time of initial system instalation and then
              done again about once every 30 days.
              The problem is that even to be done once a month, this process takes too
              long and is too memory and CPU intensive for a production DB and I`m trying
              to find ways to make it maybe longer and less resource consuming or maybe
              more intensive but not as long...
              I`m guessing longer but less resource consuming will be easyer to achieve.
              I`ll try and create my own data storing class implementing IDataReader, as
              you sugested.

              Comment

              • Shailen Sukul

                #8
                Re: Very large dataset

                I would just use a DataReader to retrieve the rows, preferably with a SP, and
                call the dll with one row at a time. That will certainly improve your memory
                consumption, but I am afraid that the network traffic will be unavoidable.
                Also couple this with an automatic dispose call the the datareader to
                release memory ASAP like so

                using (IDataReader dr = ...)
                {
                while (dr.Read())
                {
                // call the dll with each row that was retrieved
                }
                }

                Hope that helps.


                "eladla" wrote:
                This process is done once at the time of initial system instalation and then
                done again about once every 30 days.
                The problem is that even to be done once a month, this process takes too
                long and is too memory and CPU intensive for a production DB and I`m trying
                to find ways to make it maybe longer and less resource consuming or maybe
                more intensive but not as long...
                I`m guessing longer but less resource consuming will be easyer to achieve.
                I`ll try and create my own data storing class implementing IDataReader, as
                you sugested.
                >

                Comment

                Working...