Faster search algorithm for lists

**husoski** · Jun 15 '18, 11:03 PM

It's much faster to search a set than a list. Since that list doesn't seem to change during the program run, simply use

Code:

    common_words = set(list_2) # after building list_2 from the file
    ...
    for word in list_1:
         if word in common_words:
             ...do the deletion

If you are just making a reduced list, then you can do that without any explicit looping:

Code:

uncommon = [w for w in list_2 if w not in common_words]

That's a "list comprehension". It builds a list of every element from the list_2 sequence that is not an element of the common_words set.

**rayisalive** · Jun 18 '18, 12:21 PM

Thanks a lot. I'll definitely try it out.

**kiskiller0** · Jun 21 '18, 06:38 PM

I would use this big list to build sub_lists , one sub_list for each letter , thus , reducing the search time by almost 26 times (in average) , I don't know if python's built-in search functions do that or just loop through every element.

**husoski** · Jul 1 '18, 07:41 PM

Searching a set is much faster than searching lists--even with the sublists you describe. It's a hash table search, and even if the key-hashing algorithm is a horrible match for the actual key distribution (nearly all "words" happen to hash to the same number, modulo the table size) then that worst-case performance is a sequential list search. Heads anytime you win; tails 30 times in a row you lose a little.

Faster search algorithm for lists

Faster search algorithm for lists

Comment

Comment

Comment

Comment