Re: Match beginning of two strings
On Mon, 04 Aug 2003 11:56:04 GMT, Alex Martelli <aleax@aleax.it > wrote:
[color=blue]
>Ravi wrote:
>[color=green]
>> Hi,
>>
>> I have about 200GB of data that I need to go through and extract the
>> common first part of a line. Something like this.
>>[color=darkred]
>> >>>a = "abcdefghijklmn opqrstuvwxyz"
>> >>>b = "abcdefghijklmn opBHLHT"
>> >>>c = extract(a,b)
>> >>>print c[/color]
>> "abcdefghijklmn op"
>>
>> Here I want to extract the common string "abcdefghijklmn op". Basically I
>> need a fast way to do that for any two given strings. For my situation,
>> the common string will always be at the beginning of both strings. I can[/color]
>
>Here's my latest study on this:
>
>*** pexa.py:
>[/color]
[...]
JFTHOI, if you have the inclination, I'm curious how this slightly
different 2.3-dependent version would fare in your harness on your
system with the rest:
def commonprefix(s1 , s2): # very little tested!
try:
for i, c in enumerate(s1):
if c != s2[i]: return s1[:i]
except IndexError:
return s1[:i]
return s1
[...]
[color=blue]
>
>and my measurements give me:
>
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa' \[color=green]
>> 'pexa.extract(" abcdefghijklmon pKOU", "abcdefghijklmo npZE")'[/color]
>100000 loops, best of 3: 2.39 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract( "abcdefghijklmo npKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 2.14 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract2 ("abcdefghijklm onpKOU", "abcdefghijklmo npZE")'
>10000 loops, best of 3: 30.2 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract3 ("abcdefghijklm onpKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 9.59 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_ pyrex("abcdefgh ijklmonpKOU", "abcdefghijklmo npZE")'
>10000 loops, best of 3: 21.8 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_ c("abcdefghijkl monpKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 1.88 usec per loop
>[alex@lancelot exi]$
>[/color]
Interesting, but I think I will have to write a filter so I can
see a little more easily what your timeit.py outputs say ;-)
Regards,
Bengt Richter
On Mon, 04 Aug 2003 11:56:04 GMT, Alex Martelli <aleax@aleax.it > wrote:
[color=blue]
>Ravi wrote:
>[color=green]
>> Hi,
>>
>> I have about 200GB of data that I need to go through and extract the
>> common first part of a line. Something like this.
>>[color=darkred]
>> >>>a = "abcdefghijklmn opqrstuvwxyz"
>> >>>b = "abcdefghijklmn opBHLHT"
>> >>>c = extract(a,b)
>> >>>print c[/color]
>> "abcdefghijklmn op"
>>
>> Here I want to extract the common string "abcdefghijklmn op". Basically I
>> need a fast way to do that for any two given strings. For my situation,
>> the common string will always be at the beginning of both strings. I can[/color]
>
>Here's my latest study on this:
>
>*** pexa.py:
>[/color]
[...]
JFTHOI, if you have the inclination, I'm curious how this slightly
different 2.3-dependent version would fare in your harness on your
system with the rest:
def commonprefix(s1 , s2): # very little tested!
try:
for i, c in enumerate(s1):
if c != s2[i]: return s1[:i]
except IndexError:
return s1[:i]
return s1
[...]
[color=blue]
>
>and my measurements give me:
>
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa' \[color=green]
>> 'pexa.extract(" abcdefghijklmon pKOU", "abcdefghijklmo npZE")'[/color]
>100000 loops, best of 3: 2.39 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract( "abcdefghijklmo npKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 2.14 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract2 ("abcdefghijklm onpKOU", "abcdefghijklmo npZE")'
>10000 loops, best of 3: 30.2 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract3 ("abcdefghijklm onpKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 9.59 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_ pyrex("abcdefgh ijklmonpKOU", "abcdefghijklmo npZE")'
>10000 loops, best of 3: 21.8 usec per loop
>[alex@lancelot exi]$ python -O timeit.py -s 'import pexa'
>'pexa.extract_ c("abcdefghijkl monpKOU", "abcdefghijklmo npZE")'
>100000 loops, best of 3: 1.88 usec per loop
>[alex@lancelot exi]$
>[/color]
Interesting, but I think I will have to write a filter so I can
see a little more easily what your timeit.py outputs say ;-)
Regards,
Bengt Richter
Comment