Need fast ListOf(type comparison

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • !NoItAll
    Contributor
    • May 2006
    • 297

    Need fast ListOf(type comparison

    I have a ListOf(Type that I've created that will often have well over 20-thousand entries.
    The type I am placing into the list looks like this:

    Code:
    Structure MyType
       Dim FileName as String
       Dim FileDate as String
       Dim FileSize as String
       Dim NewPath as String
    End Structure
    Dim MyList as ListOf(MyType)
    Before I add anything to the list I need to check and make sure it's not already in there:

    Code:
    If  MyList.Contains(sNewEntry) = False then
       MyList.Add(sNewEntry)
    End if
    ...but with anything above about 7-thousand entries it takes about 1/4 to 1/2 second for each comparison. With more than 15-thousand it takes nearly a second. If I am adding a total of 20-thousand records I could miss Christmas!

    I *thought* it would be faster to try

    Code:
    If MyList.BinarySearch(sNewEntry) < 0 then
       MyList.Add(sNewEntry)
    End if
    But this throws an exception ("failed to compare") - so either this is not what I want or I am just doing it wrong.

    I also thought I might throw a hash into the structure and then force the CONTAINS or BINARYSEARCH method to only compare hashes - but I just can't figure out how to do that... Also - just how unique is the default hash?

    Any suggestions would be very much appreciated!

    If I stumble across a better solution I will be sure to post it here - I loath threads without answers!

    Thanks all!
  • tlhintoq
    Recognized Expert Specialist
    • Mar 2008
    • 3532

    #2
    So each list element is a struct of 4 items.
    So each test of the compare is actually having to check if all 4 items in the list element are the same as your comparitor.
    Meaning 5,000 element test is 20,000 {internal} comparisons.

    If the struct were a class you could access the members individually.
    MyType.FileName for example
    You could the loop through testing any one element to see if its the same.
    Now you would be doing 5,000 comparisons instead of 20,000

    Comment

    • !NoItAll
      Contributor
      • May 2006
      • 297

      #3
      Here's what I've done - it's not big trick, but it has improved performance dramatically. As tlhintoq pointed out there were way too many compares going on.
      I added an additional element to the structure
      Code:
      Structure MyType
         Dim FileName as String
         Dim FileDate as String
         Dim FileSize as String
         Dim NewPath as String
         Dim FileNameHash as Integer
      End Structure
      Dim MyList as ListOf(MyType)
      Then I wrote my own compare routine
      Code:
      Friend Function CompareHash(ByRef LongList as List(of MyType), NewItem as MyType) as Boolean
      
      For Each item as MyType in LongList
          If item.FileNameHash = NewItem.FileNameHash then
                 If Item.FileName = NewItem.FileName then
                       Return True
                  End if
           End if
      Next
      Return False
      This way I am only looking at the hash (an integer) so the checks will be very fast.
      Notice that I check the filename even after there is a hash match. This is because the hash is not guarenteed unique. If I have two strings they will definately generate the exact same hash. But it is possible for two different strings to generate identical hashes - so we do a second compare after matching hashes are found to be sure. This is still hundreds of times faster than doing all string matches.
      For my routine a list of 22-thousand items went from 5 minutes to 18 seconds.

      Comment

      Working...