Performance Optimizations

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Ketchup

    Performance Optimizations

    Hello everyone,

    I am writing a utility. Part of its function is to do a block-mode copy of
    files and generate MD5 / SHA1 hashes on the fly. The functionality is
    similar to that of the Unix DD / DCFLDD utilities. I am using API calls to
    get handles to files and FileStreams to perform the copy block by block,
    hashing on the way. The relevant portion of the code is below. I am using
    byte by byte transfer to avoid copying the file and then re-openning it to
    hash it. That's double the work and results in poor performance.

    I realize that hashing the the files as they are being copied will generate
    overhead. I am comparing performance to utilities like robocopy, xcopy,
    plain windows copy, and dcfldd. I am pretty much on par with dcfldd (on
    windows), sometimes better. My performance is about 90% to 95% that of DOS
    utilities like robocopy and xcopy. Windows copy has too much overhead and
    is much slower. Dcfldd on windows is not very fast and is not a good
    reference point. I would like to get closer to 98% / 99% performance of
    robocopy and xcopy.

    I have several questions.

    1. Can anyone recommend a good list of block sizes to use for various
    environments? For example, what should be the block size for HDD to HDD
    copying, LAN to LAN, within the same HDD, LAN to HDD, slow connections, etc?
    I tried setting the block size to 512 bytes when copying from HDD to HDD to
    match the NTFS cluster size, but that resulted in much worse transfer speed
    then a block size of 32K. I experimented with setting block size to match
    default LAN MTU size - minus packet headers sizes, but that resulted in poor
    performance as well. I am just not how to determine good block size other
    then trial and error.

    2. Is my code below optimized? Am I wasting any CPU cycles?

    3. Is there a better way of doing this? I would not call myself an
    experienced programmer. I would appreciate any criticism.

    Thank you in advance!

    'get MAC times of the sourcefile
    If GetFileTime(hFl Handle, dtCreated, dtAccessed, dtModified) = False Then
    Logger.writeLN( "Copy Error: " & APIErrorMessage (GetLastError))
    Throw New Exception("Unab le to get MAC times from sourcefile")
    End If

    While SourceStream.Po sition <SourceStream.L ength ' write until EOF

    'clear the buffer block
    ReDim transferBlock(i BlockSize - 1)
    If SourceStream.Le ngth - SourceStream.Po sition CLng(iBlockSize ) Then

    'read a block of data
    iBytesRead = SourceStream.Re ad(transferBloc k, 0,
    transferBlock.L ength)
    'hash the block
    objMD5.Transfor mBlock(transfer Block, 0, transferBlock.L ength,
    transferBlock, 0)
    'write to destination file
    DestStream.Writ e(transferBlock , 0, transferBlock.L ength)

    Else

    'read a block of data
    ReDim transferBlock(S ourceStream.Len gth - SourceStream.Po sition - 1)
    iBytesRead = SourceStream.Re ad(transferBloc k, 0,
    transferBlock.L ength)
    'hash final block
    objMD5.Transfor mFinalBlock(tra nsferBlock, 0, transferBlock.L ength)
    'write to destination file
    DestStream.Writ e(transferBlock , 0, transferBlock.L ength)

    End If

    iTotalBytesRead += iBytesRead
    iCurrentFileCop ied += iBytesRead

    End While

    'set MAC times for the destination file
    If SetFileTime(hDe stHandle, dtCreated, dtAccessed, dtModified) = False Then
    Logger.writeLN( "Copy Error: " & APIErrorMessage (GetLastError))
    Throw New Exception("Unab le to set MAC times for destination file")
    End If


Working...