VB .NET Read CSV into Access DB Really Slow

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • cellus205
    New Member
    • Sep 2008
    • 13

    VB .NET Read CSV into Access DB Really Slow

    Hows it going everyone. Im still pretty new to vb .net, and am having a little trouble. I am loading a tab delimited txt file into an access database, and parsing the text to extract some of the needed information from each field, and while this works, it is really slow right now. Does anyone have any suggestions on how I could speed this up? The file that Im trying to load atm has about 100k rows.

    Code:
     Dim fs As New IO.FileStream(openFileDialog1.FileName, IO.FileMode.Open)
                    Dim sr As New IO.StreamReader(fs)
                    Dim x() As String = sr.ReadToEnd().Split(CType(Chr(10), Char))
                    Dim docno() As String
                    Dim doc As String
                    Dim file As String
                    Dim fileloc As Integer
                    While J < x.Length - 1
                        docno = x(J).Split(vbTab)
                        'MsgBox(docno(4))
                        ' If myRegex.IsMatch(x(J)) = True Then
                        If docno(2).Contains("\") = True Then
                            fileloc = docno(2).LastIndexOf(CType("\", Char))
                        Else
                            fileloc = 0
                        End If
    
                        file = docno(2).Substring(fileloc)
    
                        While file.Contains(Chr(34)) = True
                            file = file.Remove(file.IndexOf(Chr(34)), 1)
                        End While
    
                        While file.Contains("\") = True
                            file = file.Remove(file.IndexOf("\"), 1)
                        End While
                        ' x(J) = myRegex.Match(x(J)).Value
                        doc = docno(3)
                        While doc.Contains(Chr(34)) = True
                            doc = doc.Remove(doc.IndexOf(Chr(34)), 1)
                        End While
                        Table1TableAdapter.Insert(doc, file, "Member of Txt", "0")
                        Table1TableAdapter.Update(TestdbDataSet.Table1)
    
                        '    End If
                        J = J + 1
                    End While
  • stoogots2
    New Member
    • Sep 2007
    • 77

    #2
    I know indexes and not having the tablelock set to on (in SQL server) affect performance. I wrote something using the SqlBulkCopy class which using table lock, reading 64 MB tab delimited file (200,000+ rows), create DataTable and all rows, and finally copy to server, executed in 13 seconds (running in Debug mode on my laptop against an old SQL server). How long is yours taking?

    Comment

    • Plater
      Recognized Expert Expert
      • Apr 2007
      • 7872

      #3
      I used the MS JET oledb provider to read in the CSV file (56k records takes about 3-5seconds to read into a DataTable)
      Then you can do with as you please the DataTable from it.
      EDIT: I would be interested in learning this tablelock and bulk copy concept though

      Comment

      • cellus205
        New Member
        • Sep 2008
        • 13

        #4
        Thanks for the replies guys. Well right now its actually reading the file into an array using a streamreader, splitting on each new line, and Im guessing thats whats probably taking it so long? After reading it into an array its splitting each array string into what it needs to insert into access, and inserting them into access through the tableadapter.

        Comment

        • cellus205
          New Member
          • Sep 2008
          • 13

          #5
          And also forgot to add, its basically taking longer than Id be able to wait, lol, the longest i waited was about 15 min

          Comment

          • cellus205
            New Member
            • Sep 2008
            • 13

            #6
            anyone have any suggestions?

            Comment

            • Plater
              Recognized Expert Expert
              • Apr 2007
              • 7872

              #7
              Try using the JET ole database provider to read in the CSV instead?
              404 - Page Not Found. Shown when a URL cannot be mapped to any kind of resource.

              Comment

              • cellus205
                New Member
                • Sep 2008
                • 13

                #8
                Well Im actually parsing the data before I submit it to access, because the access table is bound to a datagrid. I think that using the oledb to read in the csv would just read the whole csv into access right?

                Comment

                • Plater
                  Recognized Expert Expert
                  • Apr 2007
                  • 7872

                  #9
                  It would read it into a DataTable, or DataReader object. You can parse the contents as you wish

                  Comment

                  • Curtis Rutland
                    Recognized Expert Specialist
                    • Apr 2008
                    • 3264

                    #10
                    Basically what Plater is telling you is to treat the flat file as a database, and use OLE to connect to it. OLE has nothing to do with Access per se...the JET engine does, but it has more uses than just Access. It is also used to access Excel and flatfiles.

                    I don't see what you are using Access for anyway, unless you are storing the data there. If you are just trying to get some values out, you can just keep it in memory and filter your DataTable object.

                    Comment

                    • cellus205
                      New Member
                      • Sep 2008
                      • 13

                      #11
                      Ok, I understand. I tried it a few other ways, including using this csv reader class: http://www.codeproject .com/KB/database/CsvReader.aspx? msg=2629921#xx2 629921xx

                      There are actually paths and filenames in each row, and im removing any the path and any other characters other than the filename out of each row. Do you think performing those operations on every row would slow it down ?

                      Comment

                      Working...