vb.net Fastest Data Storage/Access Method

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • csin
    New Member
    • Sep 2008
    • 6

    vb.net Fastest Data Storage/Access Method

    I wrote a program in VB.NET that uses a database of 120,000+ records. Each record has 12 fields and can containe quite a bit of data... Currently I have it setup to pull from an Access database, but it is SLOW. Searching and pulling upto 30,000 records at a time can take an hour or more.

    My question is... What is the fastst alternative to using an Access DB? I cannot use a SQL DB as the end users will be pulling the data localy, not from a server. The users need to be able to edit and delete the records without it taking forever as well.

    Any help is greatly appreciated.
  • mldisibio
    Recognized Expert New Member
    • Sep 2008
    • 191

    #2
    Do a search on standalone database file formats that allow users to have a local database, such as SQLLite.

    However, the key to any database performance is normalization and indexing.
    120000 is not a lot of records. If they were indexed, a file routine could probably load, read and search them in a text file within a few seconds.

    Access is not a great database, but since you are already using it, I would suggest reviewing the table structure, normalization, and indexes. That should increase performance dramatically.

    After that step, then you might look into competing database engines.

    Comment

    • Curtis Rutland
      Recognized Expert Specialist
      • Apr 2008
      • 3264

      #3
      I'll just confirm what the previous poster said. 120000 records is not a lot of records. We work on tables with 2.5 million records, and realistically, that's not a lot. Indexing is likely a large part of your problem.

      Comment

      • csin
        New Member
        • Sep 2008
        • 6

        #4
        They are indexed...


        My issue is more prevelent when runing a deduplication rutine,,Where the app needs to find records that have field 2,4,5, and 12 the same, but the others can be different...

        IE:

        1 2 3 4 5 6 7 8 9 10 11 12
        2 2 34 4 5 46 7 81 3 1 11 12
        14 15 16 17 18 19 20 21 22 23 24 25 26

        The data would be much longer, but for simplicty use the above... It would delete the second record above because there is already on there that has that info for fields 2,4,5, and 12, but it would not delete both and would perform this check to clean all but one record for each distinct set of fields 2,4,5, and 12.

        Would it be work to make a 13th field that has fields 2,4,5, and 12 and then do a select distinct(field1 3) into a new table?

        Comment

        • balabaster
          Recognized Expert Contributor
          • Mar 2007
          • 798

          #5
          I'd say that SQL Server or Oracle are the ideal solutions in this case, but as they're not possible, then some form of flat file configuration that would allow random access... It's a good question that raises a whole host of theoretical answers.

          Fixed length rows would allow simulation of this.
          You can query Excel documents using ADO, so using an Excel native document format could be a possibility...
          I believe you can also query CSV using ADO, although I've never done it or seen it done.

          Comment

          • mldisibio
            Recognized Expert New Member
            • Sep 2008
            • 191

            #6
            A 13th field is a possibility.
            But again, a correctly created index would prevent duplicates in the first place.
            If you were to create a unique index composed of the fields in question (fields 2,4,5,12 if I am understanding you correctly that you mean fields and not values) then
            - duplicate records would never be allowed in the first place
            - or, if you needed to allow them for some reason, the index would not be unique, but searching for that combination of fields should be much quicker.

            However, I understand that then you may need to add application code to explain to the user why their entry failed, and maybe you cannot do that now.

            You may also need to do a one time clean-up, in which case the 13th field or temporary pseudo-index tables can help.

            Is a permanent fix such as not allowing such duplicates a possibility for you?

            Again, not to criticize, but if you have data that requires a painfully search to eliminate odd combinations of duplicates, I would strongly recommend reviewing your data model to see if it can be normalized. For example, if fields 2,4,5 and 12 represent some kind of unique key, why not have a table with just those fields, where all fields together make up a primary key, and then have your (current) 12 field table (now reduced to 8 fields) simply reference that unique key set with a foreign key?

            Comment

            • csin
              New Member
              • Sep 2008
              • 6

              #7
              Okay, The program im working on is an internal app for reviews on other applications. Each month an update will come out with new reviewed applications.

              The way I have it now is like this:

              tblPrograms
              -id (index)
              -lngPlatformFK
              -strCompany
              -strProgramName
              -strVersion
              -strInfo
              -strURL
              -strLetter
              -strNew
              -strUpdate
              -strType
              -strMU

              tblPlatforms
              -strPlatform
              -strMU

              tbl:etters
              -lngPlatormFK
              -strLetter
              -strMU

              tblUpdates
              -strUpdate


              strMU designate "(U)ser Database" or "(M)ain Database".

              The procedure im working on needs to make sure there are no duplicates in each of the two databases... Meaning its okay to have one PC version of Adobe Acrobat 6 in the MainDB and one PC version of Adobe Acrobat 6 in the UserDB, but not two in the same database.

              And even though I have checks in place to make sure duplicates are not imported, I need to create a procedure to remove them in the event they get in there still somehow.

              Comment

              • mldisibio
                Recognized Expert New Member
                • Sep 2008
                • 191

                #8
                Couple of comments:
                1. Having an ID indexed does not enhance any query performance (your original issue) if you are searching non-indexed fields. If you have a particularly slow search with four specific fields, then you need an index with just those four fields and make sure each of the four fields is in your WHERE clause (so the optimizer knows which index to use).
                2. You have strMU in three tables, yet you say this a defining characteristic of a unique record. I don't understand your structure completely, but I would aim for a structure where strMU is only in one table and is part of a unique index.
                3. I am not sure how much control you have over the incoming data. If you are importing some kind of data dumps over which you have no control, and then are responsible for sorting out duplicates, then yes, you need some different data tables and some good duplicate removal procedures.
                If, however, you control the incoming data at its origin, then you would be able to eliminate duplicates with properly created keys/indexes, and there would be no need for clean-up. However, I suspect your case is the the first one, am I correct?
                4. One suggestion would be to re-vamp your import process. Perhaps set up two tables similar to what you have, but one for M and one for U, with a unique index on ALL the field which constitute a unique record. Then import your data dumps, and handle "duplicate key" exceptions by tossing out those records. Finally, you can merge the M and U table back into a reporting table similar to what you have. Separate good "normalized " tables which you control from "reporting" tables which your end-users (or apps) want to see.

                Comment

                • csin
                  New Member
                  • Sep 2008
                  • 6

                  #9
                  Originally posted by mldisibio
                  Couple of comments:
                  1. Having an ID indexed does not enhance any query performance (your original issue) if you are searching non-indexed fields. If you have a particularly slow search with four specific fields, then you need an index with just those four fields and make sure each of the four fields is in your WHERE clause (so the optimizer knows which index to use).
                  2. You have strMU in three tables, yet you say this a defining characteristic of a unique record. I don't understand your structure completely, but I would aim for a structure where strMU is only in one table and is part of a unique index.
                  3. I am not sure how much control you have over the incoming data. If you are importing some kind of data dumps over which you have no control, and then are responsible for sorting out duplicates, then yes, you need some different data tables and some good duplicate removal procedures.
                  If, however, you control the incoming data at its origin, then you would be able to eliminate duplicates with properly created keys/indexes, and there would be no need for clean-up. However, I suspect your case is the the first one, am I correct?
                  4. One suggestion would be to re-vamp your import process. Perhaps set up two tables similar to what you have, but one for M and one for U, with a unique index on ALL the field which constitute a unique record. Then import your data dumps, and handle "duplicate key" exceptions by tossing out those records. Finally, you can merge the M and U table back into a reporting table similar to what you have. Separate good "normalized " tables which you control from "reporting" tables which your end-users (or apps) want to see.
                  strMU is in 3 of the tables because there are letter recors for each of the two databases seperately, there are programs for each of them seperatly, and there are platforms for each of them seperatly...

                  A big part of the reason I need to come up with a dup checker in the first place is because the db they gave me to start with has 124000 records and about 25%+ are duplicates already and I need to filter them out.

                  Any other suggestions on the structure?

                  Comment

                  Working...