Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • OlafMeding@gmail.com

    Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

    Below are 2 files that isolate the problem. Note, both programs hang
    (stop responding) with hyper-threading turned on (a BIOS setting), but
    work as expected with hyper-threading turned off.

    Note, the Windows task manager shows 2 CPUs on the Performance tab with
    hyper-threading is turned on.

    Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
    problem.
    The operating system is MS Windows XP Professional.

    winmsd.exe shows:
    2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
    Version: 5.1.2600 Service Pack 2 Build 2600

    Could someone with a hyper-threading (or dual/multicore) CPU please
    confirm this bug?

    Many Thanks

    Olaf


    # testsleep.py
    import threading
    import time

    class Task(threading. Thread):
    def __init__(self, n, t):
    threading.Threa d.__init__(self )
    self.n = n
    self.t = t
    def run(self):
    print 'thread %d started' % self.n
    print 'sleep time:', self.t
    print time.clock()
    print time.clock()
    print time.clock()
    print
    count = 0
    printCount = int(10 / self.t)
    while True:
    start = time.clock()
    time.sleep(self .t)
    stop = time.clock()
    if stop - start > 1.0:
    print 'thread', self.n, stop - start

    count += 1
    if count > printCount:
    count = 0
    print self.n,

    def test():
    thread1 = Task(1, 0.01)
    thread2 = Task(2, 0.003)
    thread1.start()
    thread2.start()

    test()

    ------------------------------------------------------------------------

    # testsleep2.py
    import thread
    import time
    import sys

    def run(n, t):
    print 'thread %d started' % n
    print 'sleep time:', t
    print time.clock()
    print time.clock()
    print time.clock()
    print
    count = 0
    printCount = int(10 / t)
    while True:
    start = time.clock()
    time.sleep(t)
    stop = time.clock()
    if stop - start > 1.0:
    print 'thread', n, stop - start

    count += 1
    if count > printCount:
    count = 0
    print n,

    def test():
    thread.start_ne w_thread(run, (1, 0.01))
    thread.start_ne w_thread(run, (2, 0.003))

    # Wait until the user presses the enter key.
    sys.stdin.read( 1)


    test()

  • Serge Orlov

    #2
    Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

    OlafMeding@gmai l.com wrote:[color=blue]
    > Below are 2 files that isolate the problem. Note, both programs hang
    > (stop responding) with hyper-threading turned on (a BIOS setting), but
    > work as expected with hyper-threading turned off.[/color]

    What do you mean "stop responding"? Not responding when you press
    ctrl-c? They stop printing? If you mean stop printing, try
    sys.stdout.flus h() after each print

    Comment

    • Tim Peters

      #3
      Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

      [OlafMeding@gmai l.com][color=blue]
      > Below are 2 files that isolate the problem. Note, both programs hang
      > (stop responding)[/color]

      What does "stop responding" mean?
      [color=blue]
      > with hyper-threading turned on (a BIOS setting), but
      > work as expected with hyper-threading turned off.
      >
      > Note, the Windows task manager shows 2 CPUs on the Performance tab with
      > hyper-threading is turned on.
      >
      > Both Python 2.3.5 and 2.4.3 (downloaded from python.org) have this
      > problem.
      > The operating system is MS Windows XP Professional.
      >
      > winmsd.exe shows:
      > 2CPUs: x86 Family 15 Model 4 Stepping 1 GenuineIntel ~3000 MHz
      > Version: 5.1.2600 Service Pack 2 Build 2600
      >
      > Could someone with a hyper-threading (or dual/multicore) CPU please
      > confirm this bug?[/color]

      I don't see anything unexpected (to me) on a WinXP Pro SP2 box with HT
      enabled, using 2.4.3, but I'm not sure what "not responding" means to
      you.
      [color=blue]
      > # testsleep.py
      > import threading
      > import time
      >
      > class Task(threading. Thread):
      > def __init__(self, n, t):
      > threading.Threa d.__init__(self )
      > self.n = n
      > self.t = t
      > def run(self):
      > print 'thread %d started' % self.n
      > print 'sleep time:', self.t
      > print time.clock()
      > print time.clock()
      > print time.clock()
      > print
      > count = 0
      > printCount = int(10 / self.t)
      > while True:
      > start = time.clock()
      > time.sleep(self .t)
      > stop = time.clock()
      > if stop - start > 1.0:
      > print 'thread', self.n, stop - start[/color]

      You don't _expect_ this print to execute, do you? It should trigger
      very rarely (if ever).
      [color=blue]
      > count += 1
      > if count > printCount:
      > count = 0
      > print self.n,
      >
      > def test():
      > thread1 = Task(1, 0.01)
      > thread2 = Task(2, 0.003)
      > thread1.start()
      > thread2.start()
      >
      > test()[/color]

      This is what happened when I ran it, until I got tired of watching it
      and killed the job:

      C:\Python24>pyt hon test1.py
      thread 1 started
      sleep time: 0.01
      7.8499538238e-007
      4.42770924877e-005
      8.62618455186e-005

      thread 2 started
      sleep time: 0.003
      0.0004793495332 38
      0.0005219042829 16
      0.0005636490373 59

      1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
      2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
      1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1
      [color=blue]
      > # testsleep2.py
      > import thread
      > import time
      > import sys
      >
      > def run(n, t):
      > print 'thread %d started' % n
      > print 'sleep time:', t
      > print time.clock()
      > print time.clock()
      > print time.clock()
      > print
      > count = 0
      > printCount = int(10 / t)
      > while True:
      > start = time.clock()
      > time.sleep(t)
      > stop = time.clock()
      > if stop - start > 1.0:
      > print 'thread', n, stop - start[/color]

      See above.
      [color=blue]
      > count += 1
      > if count > printCount:
      > count = 0
      > print n,
      >
      > def test():
      > thread.start_ne w_thread(run, (1, 0.01))
      > thread.start_ne w_thread(run, (2, 0.003))
      >
      > # Wait until the user presses the enter key.
      > sys.stdin.read( 1)[/color]

      Do you want someone running this test to hit the ENTER key, or not?
      [color=blue]
      >
      > test()[/color]

      Works much the same for me, except I got bored quicker ;-):

      C:\Python24>pyt hon test2..py
      thread 1 started
      sleep time: 0.01
      1.14999323533e-006
      8.01271757225e-005
      thread 2 started
      sleep time: 0.003
      0.0003958653184 39
      0.0004742598572 95
      0.0005598317068 72

      0.0007106134669 8

      1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1

      At that point I hit the ENTER key, and saw:

      Unhandled exception in thread started by
      Error in sys.excepthook:

      Original exception was:
      Unhandled exception in thread started by
      Error in sys.excepthook:

      Original exception was:


      That's unfortunate, but not unexpected either. The interpreter
      doesn't wait for a `thread` module thread to finish before tearing
      itself down, so the threads keep running after Python has torn so much
      of itself down that weird execptions occur -- and Python is _so_ torn
      down by that point it can't even display a useful error message. The
      interpreter does (by default) wait for `threading` module threads to
      exit before tearing itself down, so those kinds of useless exit
      messages weren't expected in the first test.

      Comment

      • OlafMeding@gmail.com

        #4
        Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

        > What do you mean "stop responding"?

        Both threads print their thread numbers (either 1 or 2) approximately
        every 10 seconds. However, after a while (minutes to hours) both
        programs (see above) hang!

        Pressing ctrl-c (after the printing stops) causes the threads to "wake
        up" from their sleep statement. And since the sleep took more than 1
        seconds the thread number and the duration of the sleep is printed to
        the screen.

        Do you have a hyper-threading/dual/multi core CPU? Did you try this?

        Olaf

        Comment

        • OlafMeding@gmail.com

          #5
          Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

          Time
          [color=blue][color=green][color=darkred]
          >>>[/color][/color][/color]
          1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1
          2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1 2
          1 1 2 1 2 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 1[color=blue][color=green][color=darkred]
          >>>[/color][/color][/color]

          This is exactly what you should see. The problem I see is that after a
          while (minutes to hours) the printing of 1s and 2s stops! If you press
          ctrl-c at that point the threads will print how many seconds they were
          stuck in the sleep statements (proving that the threads were stuck in
          the sleep statement until you pressed ctrl-c).

          Could you please try again (perhaps let it run overnight)? Many, many
          thanks.

          Olaf

          Comment

          • OlafMeding@gmail.com

            #6
            Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

            Tim
            [color=blue]
            > Do you want someone running this test to hit the ENTER key, or not?[/color]

            The purpose of the "sys.stdin.read (1)" statement is simply to prevent
            the main thread from exiting and thus ending the test. And yes, I also
            get an exception when I press the enter key.

            Olaf

            Comment

            • Serge Orlov

              #7
              Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled


              OlafMeding@gmai l.com wrote:[color=blue][color=green]
              > > What do you mean "stop responding"?[/color]
              >
              > Both threads print their thread numbers (either 1 or 2) approximately
              > every 10 seconds. However, after a while (minutes to hours) both
              > programs (see above) hang!
              >
              > Pressing ctrl-c (after the printing stops) causes the threads to "wake
              > up" from their sleep statement. And since the sleep took more than 1
              > seconds the thread number and the duration of the sleep is printed to
              > the screen.
              >
              > Do you have a hyper-threading/dual/multi core CPU? Did you try this?[/color]

              I don't have such CPU but I run the first program anyway. It printed

              C:\py>th.py
              thread 1 started
              sleep time: 0.01
              3.63174649292e-006
              8.43682646817e-005
              0.0001648254177 56

              thread 2 started
              sleep time: 0.003
              0.0006752254825 68
              0.0007534477147 24
              0.0008294350259 6

              1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 2 1 1 1 2 1

              I got bored and tried to stop it with ctrl-c but it didn't respond and
              kept running and printing the numbers. I had to kill it from task
              manager.

              Comment

              • OlafMeding@gmail.com

                #8
                Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                Serge
                [color=blue]
                > I got bored and tried to stop it with ctrl-c ...[/color]

                Yes, you have to use the ctrl-break key to stop the first program. And
                neither program every hangs on a single core CPU. It also does not
                hang on a hyper-threading CPU if hyper-threading is turned off in the
                BIOS.

                Olaf

                Comment

                • Tim Peters

                  #9
                  Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

                  >> What do you mean "stop responding"?

                  [OlafMeding@gmai l.com][color=blue]
                  > Both threads print their thread numbers (either 1 or 2) approximately
                  > every 10 seconds. However, after a while (minutes to hours) both
                  > programs (see above) hang![/color]

                  Where "hang" means they stop printing.
                  [color=blue]
                  > Pressing ctrl-c (after the printing stops) causes the threads to "wake
                  > up" from their sleep statement. And since the sleep took more than 1
                  > seconds the thread number and the duration of the sleep is printed to
                  > the screen.
                  >
                  > Do you have a hyper-threading/dual/multi core CPU? Did you try this?[/color]

                  I was using a 3.4 GHz Pentium 4 (single core) with hyper-threading
                  enabled. I didn't run it for hours ;-)

                  But supposing I do and see a hang, it's unlikely that will have
                  anything to do with Python. On Windows, time.sleep() called from any
                  thread other than the main thread just calls the Win32 API Sleep()
                  function, after converting the argument to milliseconds. So it may be
                  more fruitful to recode your test program in C (if there is a bug
                  here, it's most likely in Microsoft's implementation of its Sleep()
                  function).

                  Comment

                  • OlafMeding@gmail.com

                    #10
                    Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                    Tim
                    [color=blue]
                    > I didn't run it for hours ;-)[/color]

                    Please try.

                    The sleep statement does not return! And this should not happen. The
                    code above does nothing special or unusual. The problem only occurs if
                    2 threads use the sleep statement and hyper-threading is enabled.

                    We discovered this bug perhaps a year ago. The only solution was to
                    tell our customers to disable hyper-threading (you can imagine they did
                    not like our "solution" very much). It then took many days of hard
                    work to isolate the problem down to the code I posted above.
                    [color=blue]
                    > Where "hang" means they stop printing.[/color]

                    Our Python code just stops running (locking up the entire application).
                    We use Qt for our GUI. We have over a hundred .py files. We use a
                    total of 4 threads (they get created once and stay running). One
                    thread uses sockets.

                    Once the application locks up (getting stuck in a sleep statement) all
                    comes back to live if I pull the network cable out. So perhaps the
                    socket thread returns from the sleep statement and other threads return
                    to live because they were waiting for the socket thread.

                    Our software runs on both Windows and Linux. We are not sure if the
                    problem also happens on Linux.

                    In any case, if someone else can confirm the bug then this is a serious
                    problem meriting further investigation.

                    We have searched the Internet far and wide and were not able to find
                    any information that indicates that someone else has reported a similar
                    problem (neither Python nor Windows Sleep related).

                    Thank you for your help.

                    Olaf

                    Comment

                    • Grant Edwards

                      #11
                      Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                      On 2006-05-04, OlafMeding@gmai l.com <OlafMeding@gma il.com> wrote:
                      [color=blue]
                      > Please try.
                      >
                      > The sleep statement does not return![/color]

                      Never, or does it just take a long time?
                      [color=blue]
                      > And this should not happen.[/color]

                      Dude, it's MS Windows.

                      It does all _sorts_ of stuff that it shouldn't.

                      Having sleep() take orders of magnitude longer than it should
                      is not an uncommon complaint for MS Windows users. There was a
                      fairly extensive thread in this group about that problem just a
                      few weeks ago. IIRC, disabling some other program or service
                      fixed it for one MS victem.
                      [color=blue]
                      > The code above does nothing special or unusual. The problem
                      > only occurs if 2 threads use the sleep statement and
                      > hyper-threading is enabled.[/color]

                      That part is new. I'm pretty sure other people who complained
                      about sleep() not returning in a reasonable amount of time saw
                      it all of the time.

                      --
                      Grant Edwards grante Yow! I was giving HAIR
                      at CUTS to th' SAUCER PEOPLE
                      visi.com ... I'm CLEAN!!

                      Comment

                      • OlafMeding@gmail.com

                        #12
                        Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                        Grant
                        [color=blue]
                        > Having sleep() take orders of magnitude longer than it should[/color]

                        I seen a few times where sleep returns after some seconds or even after
                        tens of seconds (my code above check for that). But most of the time
                        it gets stuck forever.

                        Olaf

                        Comment

                        • Marc 'BlackJack' Rintsch

                          #13
                          Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                          In <125ir8gpofq3i1 c@corp.supernew s.com>, Grant Edwards wrote:
                          [color=blue]
                          > On 2006-05-04, OlafMeding@gmai l.com <OlafMeding@gma il.com> wrote:[color=green]
                          >> The sleep statement does not return![/color]
                          >
                          > Never, or does it just take a long time?
                          >[color=green]
                          >> And this should not happen.[/color]
                          >
                          > Dude, it's MS Windows.
                          >
                          > It does all _sorts_ of stuff that it shouldn't.
                          >
                          > Having sleep() take orders of magnitude longer than it should
                          > is not an uncommon complaint for MS Windows users. There was a
                          > fairly extensive thread in this group about that problem just a
                          > few weeks ago. IIRC, disabling some other program or service
                          > fixed it for one MS victem.[/color]

                          IIRC it was something like an NTP daemon that caused the clock to "jump"
                          a little and (Window's) sleep was confused.

                          Ciao,
                          Marc 'BlackJack' Rintsch

                          Comment

                          • OlafMeding@gmail.com

                            #14
                            Re: Using time.sleep() in 2 threads causes lockup when hyper-threading is enabled

                            Marc
                            [color=blue]
                            > IIRC it was something like an NTP daemon that caused the clock to "jump" a little and (Window's) sleep was confused.[/color]

                            The problem is not a "jump" but a permanet lockup of the sleep
                            statement.

                            To all others, is there nobody out there that could run the test code
                            at the beginning of this post on a hyper-threading CPU?

                            Olaf

                            Comment

                            • Tim Peters

                              #15
                              Re: Using time.sleep() in 2 threads causes lockup whenhyper-threading is enabled

                              [Tim Peters][color=blue][color=green]
                              >> I didn't run it for hours ;-)[/color][/color]

                              [OlafMeding@gmai l.com][color=blue]
                              > Please try.[/color]

                              OK, I let the first test program run for over 24 hours by now. It
                              never hung. Overnight, the box did go into sleep mode, but the test
                              woke itself up after sleep mode ended, and the threads reported they
                              were sleeping for about 8 hours then.

                              I did this under a debug build of Python, so that if it did hang I'd
                              have _some_chance of getting useful info from the VC 7.1 debugger.
                              It's possible (but unlikely) that using a debug-build Python prevented
                              a hang that would have occurred had I used a release-build Python.
                              [color=blue]
                              > The sleep statement does not return! And this should not happen. The
                              > code above does nothing special or unusual. The problem only occurs if
                              > 2 threads use the sleep statement and hyper-threading is enabled.
                              >
                              > We discovered this bug perhaps a year ago. The only solution was to
                              > tell our customers to disable hyper-threading (you can imagine they did
                              > not like our "solution" very much). It then took many days of hard
                              > work to isolate the problem down to the code I posted above.[/color]

                              As before, since Python merely calls the Win32 API Sleep() function,
                              it's extremely unlikely that the problem is due to Python.

                              It's quite possible that the problem is due to a tiny timing hole in
                              MS's implementation of Sleep(). Since I don't have the source code
                              for that, and disassembling is prohibited by my license ;-), I can't
                              pursue that.

                              I've seen software with honest-to-God thread-race bugs that were never
                              reported across years of heavy production use, until a user tried the
                              code on a hyper-threaded or multi-core box. Tiny timing holes can be
                              _extremely_ shy, and running on a box with true, or even just
                              finer-grained (like HT), concurrency can make them much more likely to
                              appear. I've never seen a software failure with (eventually) known
                              cause occur on an HT box that could not have happened on a non-HT box.
                              The statistics of extremely unlikely events aren't a natural fit to
                              the unabused human mind ;-)
                              [color=blue]
                              > ...
                              > Once the application locks up (getting stuck in a sleep statement) all
                              > comes back to live if I pull the network cable out. So perhaps the
                              > socket thread returns from the sleep statement and other threads return
                              > to live because they were waiting for the socket thread.[/color]

                              That's peculiar. time.sleep() called from a thread other than the
                              main thread on Windows is non-interruptable (because the Windows
                              Sleep() function is non-interruptable). time.sleep() called from the
                              main thread on Windows _is_ interruptable: the main thread uses the
                              Win32 API WaitForSingleOb ject() instead, passing a handle to a custom
                              interrupt event; the _intent_ is so that a time.sleep() in the main
                              thread can be aborted by a keyboard interrupt. But it shouldn't be
                              possible for anything other than a keyboard interrupt or a normal
                              (timeout) return to knock that loose.

                              So if unplugging the cable knocks things lose, that just points even
                              stronger at a bug in the Windows kernel.
                              [color=blue]
                              > Our software runs on both Windows and Linux. We are not sure if the
                              > problem also happens on Linux.[/color]

                              Well, I ran your test for a day on my Windows box -- you try running
                              it for a week on your Linux box ;-)
                              [color=blue]
                              > ...
                              > We have searched the Internet far and wide and were not able to find
                              > any information that indicates that someone else has reported a similar
                              > problem (neither Python nor Windows Sleep related).[/color]

                              I'm pretty sure I would have known about it if anyone reported such a
                              Python bug in the last 15 years. But this is the first time I've
                              heard it. I don't keep up with Microsoft bug reports at all.

                              Comment

                              Working...