Load large files

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • platinumbay

    Load large files

    I am trying to load large flat files; > 250 MB. What is the best practice
    for this? I have already read through everything I could find on Google, to
    no avail.

    I have a SQL DTS process that will load the file in about 7 minutes. My
    code takes about an hour.

    Here is a snippet:

    Dim objSR As StreamReader = System.IO.File. OpenText(objOri gFile)

    Dim curEncoding As System.Text.Enc oding = objSR.CurrentEn coding

    Dim loopX As Integer = 0
    Do While objSR.Peek > -1
    loopX += 1

    Dim strRec As String = objSR.ReadLine
    arrA = Split(strRec, coldel)

    Try
    If trailerSigHas Then
    If getRecordData(t railerSigPos - 1).ToString =
    trailerSig Then
    If loopX < 2 Then
    Throw New Exception("Empt y extract file")
    Else
    Exit Do
    End If
    End If
    End If

    Dim tmpLngth As Integer = arrA.Length
    Dim tmpColCount As Integer =
    iNode.Item("col umns").ChildNod es.Count ' objDS.Tables(0) .Columns.Count
    If tmpLngth > tmpColCount Then
    Throw New Exception("Too many columns in extract
    row.")
    ElseIf tmpLngth < tmpColCount Then
    Throw New Exception("Not enough columns in extract
    row.")
    End If

    Dim newRow As DataRow = objDS.Tables(0) .NewRow

    For x As Integer = 0 To objDS.Tables(0) .Columns.Count -
    1
    Dim tmpSqlCol As String =
    objDS.Tables(0) .Columns(x).Cap tion
    Dim tmpNode As XmlNode =
    iNode.SelectSin gleNode("//columns/column[@sqlcol='" & tmpSqlCol & "']")
    '"//columns/column[sqlcol='" & tmpSqlCol & "']")
    Dim attrID As String =
    tmpNode.Attribu tes("id").Inner Text
    Dim xmlCol As Integer = Convert.ToInt32 (attrID)
    Dim arrVal As Object = getRecordData(x mlCol - 1)
    newRow(x) = GetObjectValue( x, arrVal) ' Does a
    Convert to the target column type
    Next

    objDS.Tables(0) .Rows.Add(newRo w)
    Catch ex As Exception
    Dim newRow As DataRow = objDS.Tables(1) .NewRow
    Dim finalX As Integer = 0
    For x As Integer = 0 To objDS.Tables(1) .Columns.Count -
    1
    Try
    Dim strColCaption As String =
    objDS.Tables(1) .Columns(x).Cap tion
    Dim tmpINode As XmlNode =
    iNode.SelectSin gleNode("//columns/column[@sqlcol='" & strColCaption & "']")

    Dim attr2ID As String =
    tmpINode.Attrib utes("id").Inne rText
    Dim sqlColInt As Integer =
    Convert.ToInt32 (attr2ID)
    If x = 0 Then
    newrow(x) = Left(getRecordD ata(sqlColInt -
    1).ToString, 8000)
    Else
    newrow(x) = Left(getRecordD ata(sqlColInt -
    1).ToString, 250)
    End If
    Catch ' non-existent column
    Finally
    finalX = x
    End Try
    l001f:
    Next
    newrow(finalX - 4) = Left(ex.Message , 150)
    newrow(finalX - 3) = DateTime.Now
    newrow(finalX - 2) = tblName
    newrow(finalX - 1) = strFileName
    newrow(finalX) = loopX
    objDS.Tables(1) .Rows.Add(newRo w)
    End Try
    Loop
    objSR.Close()
    objSR = Nothing
    End If

    objConn.Open()
    Dim rowCount As Integer = objDS.Tables(0) .Rows.Count
    Dim rowCount2 As Integer = objDS.Tables(1) .Rows.Count
    Dim updCount As Integer

    If objConn.State <> ConnectionState .Closed Then objConn.Close()
    updCount = objDAmain.Updat e(objDS, tblName)

    If objConn.State <> ConnectionState .Closed Then objConn.Close()
    updCount = objDAerr.Update (objDS, tblName & "_err")

    objConn.Close()

    objDS = Nothing

    Thank you very much,

    Steve


  • Peter Huang [MSFT]

    #2
    RE: Load large files

    Hi

    I think you may try to use the FileStream to see if that helps you.
    File Merger and Splitter in C#


    Best regards,

    Peter Huang
    Microsoft Online Partner Support

    Get Secure! - www.microsoft.com/security
    This posting is provided "AS IS" with no warranties, and confers no rights.

    Comment

    • platinumbay

      #3
      Re: Load large files

      I am loading the datatable with the schema, so I can validate the input file
      has the correct datatype, row by row, column by column.


      "Cor Ligthert" <notmyfirstname @planet.nl> wrote in message
      news:uM5Ay8AFFH A.3940@tk2msftn gp13.phx.gbl...[color=blue]
      > Planinumbay.
      >
      > Maybe I see something wrong, what gives me the question why you are[/color]
      creating[color=blue]
      > a datatable and not just do a batch proccess,
      >
      > In pseudo
      > \\\
      > Create Insert string
      > Set parameters
      > do
      > ReadLine
      > set parameter.value s
      > command.execute nonquery
      > loop
      > ////
      >
      >[/color]
      http://msdn.microsoft.com/library/de...querytopic.asp[color=blue]
      >
      > Cor
      >
      >[/color]


      Comment

      • Cor Ligthert

        #4
        Re: Load large files

        Platiumbay,
        [color=blue]
        >I am loading the datatable with the schema, so I can validate the input
        >file
        > has the correct datatype, row by row, column by column.
        >[/color]

        That does the setting of the value in the SQLcommandparam eters as well.

        However when the schema or that should check the correct datatype you are in
        my opinion to late.

        Cor


        Comment

        • platinumbay

          #5
          Re: Load large files

          Unfortunately, the invalid data could be anything. A field overflow,
          string in a boolean, too many column, too few columns, invalid date, etc.
          Our backend is running legacy code to generate these extracts, and is prone
          to failure. Our front end needs to be able to handle anything. Hence why I
          am writing this code. I just need to figure out the quickest way to read
          the flat file, validate the data, and load it. Validation failure needs to
          happen on a row by row basis to load as much good data as possible.


          "Cor Ligthert" <notmyfirstname @planet.nl> wrote in message
          news:#EFC2SEFFH A.1292@TK2MSFTN GP10.phx.gbl...[color=blue]
          > Platiumbay,
          >[color=green]
          > >I am loading the datatable with the schema, so I can validate the input
          > >file
          > > has the correct datatype, row by row, column by column.
          > >[/color]
          >
          > That does the setting of the value in the SQLcommandparam eters as well.
          >
          > However when the schema or that should check the correct datatype you are[/color]
          in[color=blue]
          > my opinion to late.
          >
          > Cor
          >
          >[/color]


          Comment

          • Cor Ligthert

            #6
            Re: Load large files

            Platinumbay,

            In my opinion stays the way straigt forward.

            I would create from your legaly file fields, when I read them line by line
            with Substring
            Validate those fields and take the right decisions on that.

            When the values of a line are correct I would add them to the
            parametervalues as I said, and use an command.execute nonquiry to fill it.

            I think that what you do, as far as I can understand it, you can do as well
            with the commandparamate rs. However I will never put a catch in my code with
            no action what I saw in yours.



            Just my thought,

            Cor


            Comment

            Working...