Inexplicable timings

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • MRAB

    Inexplicable timings

    I'm looking at the implementation of regular expressions in Python and
    wrote a script to test my changes. This is the script:


    import re
    import time

    base = "abc"
    final = "d"

    for n in [100, 1000, 10000]:
    for f in [final, ""]:
    for r in ["+", "+?"]:
    pattern = "(?:%s)%s%s " % (base, r, final)
    text_format = "(?:%s){%d} %s" % (base, n, f)
    regex = re.compile(patt ern)
    text = base * n + f
    start_time = time.clock()
    found = regex.search(te xt)
    finish_time = time.clock()
    duration = finish_time - start_time
    print "%s\t:: %s\t:: %.3f" % (text_format, pattern,
    duration)


    I've found some puzzling behaviour that I can't explain. If I delete
    sre_parse.pyc or sre_compile.pyc before running the script I get
    timings like this:

    (?:abc){100}d :: (?:abc)+d :: 0.000
    (?:abc){100}d :: (?:abc)+?d :: 0.000
    (?:abc){100} :: (?:abc)+d :: 0.000
    (?:abc){100} :: (?:abc)+?d :: 0.000
    (?:abc){1000}d :: (?:abc)+d :: 0.000
    (?:abc){1000}d :: (?:abc)+?d :: 0.000
    (?:abc){1000} :: (?:abc)+d :: 0.022
    (?:abc){1000} :: (?:abc)+?d :: 0.020
    (?:abc){10000}d :: (?:abc)+d :: 0.001
    (?:abc){10000}d :: (?:abc)+?d :: 0.000
    (?:abc){10000} :: (?:abc)+d :: 1.961
    (?:abc){10000} :: (?:abc)+?d :: 1.756

    Subsequent runs give timings like this:

    (?:abc){100}d :: (?:abc)+d :: 0.000
    (?:abc){100}d :: (?:abc)+?d :: 0.000
    (?:abc){100} :: (?:abc)+d :: 0.000
    (?:abc){100} :: (?:abc)+?d :: 0.000
    (?:abc){1000}d :: (?:abc)+d :: 0.000
    (?:abc){1000}d :: (?:abc)+?d :: 0.000
    (?:abc){1000} :: (?:abc)+d :: 0.020
    (?:abc){1000} :: (?:abc)+?d :: 0.020
    (?:abc){10000}d :: (?:abc)+d :: 0.001
    (?:abc){10000}d :: (?:abc)+?d :: 0.000
    (?:abc){10000} :: (?:abc)+d :: 3.953
    (?:abc){10000} :: (?:abc)+?d :: 1.763

    The second-from-last takes twice as long!

    This is repeatable.

    Python 2.5.2, on the other hand, gives timings like this:

    (?:abc){100}d :: (?:abc)+d :: 0.000
    (?:abc){100}d :: (?:abc)+?d :: 0.000
    (?:abc){100} :: (?:abc)+d :: 0.000
    (?:abc){100} :: (?:abc)+?d :: 0.000
    (?:abc){1000}d :: (?:abc)+d :: 0.000
    (?:abc){1000}d :: (?:abc)+?d :: 0.000
    (?:abc){1000} :: (?:abc)+d :: 0.044
    (?:abc){1000} :: (?:abc)+?d :: 0.027
    (?:abc){10000}d :: (?:abc)+d :: 0.002
    (?:abc){10000}d :: (?:abc)+?d :: 0.002
    (?:abc){10000} :: (?:abc)+d :: 4.547
    (?:abc){10000} :: (?:abc)+?d :: 2.858

    repeatably, irrespective of whether I delete the .pyc files first.

    Why is only one timing affected, and why by so much? Python quits
    after each run, so it can't be anything cumulative, surely. I've
    considered things like caching, for example compiling the .pyc files
    on the first run, but can't see why (or how) that would cause what I
    see.

    Can anyone suggest a factor that I might be missing? Any suggestions
    welcome!

    Matthew
  • MRAB

    #2
    Re: Inexplicable timings

    On Aug 2, 12:43 am, MRAB <goo...@mrabarn ett.plus.comwro te:
    I'm looking at the implementation of regular expressions in Python and
    wrote a script to test my changes. This is the script:
    >
    import re
    import time
    >
    base = "abc"
    final = "d"
    >
    for n in [100, 1000, 10000]:
        for f in [final, ""]:
            for r in ["+", "+?"]:
                pattern = "(?:%s)%s%s " % (base, r, final)
                text_format = "(?:%s){%d} %s" % (base, n, f)
                regex = re.compile(patt ern)
                text = base * n + f
                start_time = time.clock()
                found = regex.search(te xt)
                finish_time = time.clock()
                duration = finish_time - start_time
                print "%s\t:: %s\t:: %.3f" % (text_format, pattern,
    duration)
    >
    I've found some puzzling behaviour that I can't explain. If I delete
    sre_parse.pyc or sre_compile.pyc before running the script I get
    timings like this:
    >
    (?:abc){100}d    :: (?:abc)+d    :: 0.000
    (?:abc){100}d    :: (?:abc)+?d    :: 0.000
    (?:abc){100}    :: (?:abc)+d    :: 0.000
    (?:abc){100}    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}    :: (?:abc)+d    :: 0.022
    (?:abc){1000}    :: (?:abc)+?d    :: 0.020
    (?:abc){10000}d    :: (?:abc)+d    :: 0.001
    (?:abc){10000}d    :: (?:abc)+?d    :: 0.000
    (?:abc){10000}    :: (?:abc)+d    :: 1.961
    (?:abc){10000}    :: (?:abc)+?d    :: 1.756
    >
    Subsequent runs give timings like this:
    >
    (?:abc){100}d    :: (?:abc)+d    :: 0.000
    (?:abc){100}d    :: (?:abc)+?d    :: 0.000
    (?:abc){100}    :: (?:abc)+d    :: 0.000
    (?:abc){100}    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}    :: (?:abc)+d    :: 0.020
    (?:abc){1000}    :: (?:abc)+?d    :: 0.020
    (?:abc){10000}d    :: (?:abc)+d    :: 0.001
    (?:abc){10000}d    :: (?:abc)+?d    :: 0.000
    (?:abc){10000}    :: (?:abc)+d    :: 3.953
    (?:abc){10000}    :: (?:abc)+?d    :: 1.763
    >
    The second-from-last takes twice as long!
    >
    This is repeatable.
    >
    Python 2.5.2, on the other hand, gives timings like this:
    >
    (?:abc){100}d    :: (?:abc)+d    :: 0.000
    (?:abc){100}d    :: (?:abc)+?d    :: 0.000
    (?:abc){100}    :: (?:abc)+d    :: 0.000
    (?:abc){100}    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+d    :: 0.000
    (?:abc){1000}d    :: (?:abc)+?d    :: 0.000
    (?:abc){1000}    :: (?:abc)+d    :: 0.044
    (?:abc){1000}    :: (?:abc)+?d    :: 0.027
    (?:abc){10000}d    :: (?:abc)+d    :: 0.002
    (?:abc){10000}d    :: (?:abc)+?d    :: 0.002
    (?:abc){10000}    :: (?:abc)+d    :: 4.547
    (?:abc){10000}    :: (?:abc)+?d    :: 2.858
    >
    repeatably, irrespective of whether I delete the .pyc files first.
    >
    Why is only one timing affected, and why by so much? Python quits
    after each run, so it can't be anything cumulative, surely. I've
    considered things like caching, for example compiling the .pyc files
    on the first run, but can't see why (or how) that would cause what I
    see.
    >
    Can anyone suggest a factor that I might be missing? Any suggestions
    welcome!
    >
    I've found the cause. It's the memory allocation.

    For future reference, I think what's happening is that when Python
    generates the .pyc files it's using more memory (grabbing it from the
    system) so that when my code needs to allocate Python can do it more
    quickly; if Python didn't have to generate the .pyc files then it has
    to grab more from the system, which takes longer. Or something like
    that.

    Anyway, it's fixed. :-)

    Comment

    Working...