Re: re.search much slower then grep on some regular expressions
Carl Banks <pavlovevidence @gmail.com>:
My posting wasn't intended to reflect the differences in complexity between
normal GNU grep expressions (which are basically extended POSIX
expressions) and perl-compatible expressions. The latter just _are_ more
complex, having additional features like look aheads or non-greedy
qualifiers.
I just wanted to illustrate, that the speed of the given search is somehow
related to the complexity of the engine.
Btw, other pcre implementation are as slow as Python or "grep -P". I tried
a sample C++-code using pcre++ (a wrapper around libpcre) and saw it
running equally long.
--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
Carl Banks <pavlovevidence @gmail.com>:
On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner"
<basti.wies...@ gmx.netwrote:
>
This confirms that a particular engine might not be optimized for it,
but it's not necessarily a reflection that the engine is more complex.
<basti.wies...@ gmx.netwrote:
>Paddy <paddy3...@goog lemail.com>:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>FWIW, grep itself can confirm this statement. The following command
>roughly takes as long as Python's re.search:
>>
># grep -P '[^ "=]*/' input
>>
>-P tells grep to use real perl-compatible regular expressions.
>>
>>
>>
On Jul 4, 1:36 pm, Peter Otten <__pete...@web. dewrote:
>Henning_Thornb lad wrote:
What can be the cause of the large difference between re.search and
grep?
>Henning_Thornb lad wrote:
What can be the cause of the large difference between re.search and
grep?
>grep uses a smarter algorithm ;)
This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re
#!/usr/bin/env python
import re
row=""
for a in range(156000):
row+="a"
print re.search('[^ "=]*/',row)
for a in range(156000):
row+="a"
print re.search('[^ "=]*/',row)
While doing a simple grep:
grep '[^ "=]*/' input (input contains 156.000 a in
one row)
doesn't even take a second.
grep '[^ "=]*/' input (input contains 156.000 a in
one row)
doesn't even take a second.
Is this a bug in python?
>You could call this a performance bug, but it's not common enough in
>real code to get the necessary brain cycles from the core developers.
>So you can either write a patch yourself or use a workaround.
>real code to get the necessary brain cycles from the core developers.
>So you can either write a patch yourself or use a workaround.
>re.search('[^ "=]*/', row) if "/" in row else None
>might be good enough.
>Peter
It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
more capabilities than grep RE's which need a slower, more complex
algorithm.
>FWIW, grep itself can confirm this statement. The following command
>roughly takes as long as Python's re.search:
>>
># grep -P '[^ "=]*/' input
>>
>-P tells grep to use real perl-compatible regular expressions.
This confirms that a particular engine might not be optimized for it,
but it's not necessarily a reflection that the engine is more complex.
normal GNU grep expressions (which are basically extended POSIX
expressions) and perl-compatible expressions. The latter just _are_ more
complex, having additional features like look aheads or non-greedy
qualifiers.
I just wanted to illustrate, that the speed of the given search is somehow
related to the complexity of the engine.
Btw, other pcre implementation are as slow as Python or "grep -P". I tried
a sample C++-code using pcre++ (a wrapper around libpcre) and saw it
running equally long.
--
Freedom is always the freedom of dissenters.
(Rosa Luxemburg)
Comment