Re: Synchronization Algorithm Verificator for C++0x

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris M. Thomasson

    Re: Synchronization Algorithm Verificator for C++0x

    "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
    news:51a36d85-2fc6-4bd6-af64-29950ce5ab6a@59 g2000hsb.google groups.com...
    >I want to announce tool called Relacy Race Detector, which I've
    developed. It's synchronization algorithm verificator for C++0x's
    relaxed memory model. The tool is implemented in the form of header-
    only library for C++03, which can be used for efficient execution of
    unit-tests for synchronization algorithms. The tool executes unit-test
    many times under control of special scheduler, on every execution
    scheduler models different interleaving between threads, at the same
    time every execution is exhaustively analyzed for data races, accesses
    to freed memory, failed assertions etc. If no errors found then
    verification terminates when particular number of interleavings are
    verified (for random scheduler), or when all possible interleavings
    are verified (for full search scheduler). If error is found then tool
    outputs detailed execution history which leads to
    error and terminates.
    The tool was designed for verification of algorithms like memory
    management, memory reclamation for lock-free algorithms, multithreaded
    containers (queues, stacks, maps), mutexes, eventcounts and so on.
    My personal subjective feeling to date is that tool is capable of
    finding extremely subtle algorithmic errors, memory fence placement
    errors and memory fence type errors within a second. And after error
    is detected error fixing is relatively trivial, because one has
    detailed execution history which leads to error.
    [...]


    After you integrate a mutex and condvar into the detection framework, I
    would like to model an eventcount.

    Also, it seems like you could parallelize the testing process somewhat... I
    need to think about this some more.

    Anyway, nice work.


    BTW, you ask if there is any market for this type of tool in an e-mail, well
    I hope you don't mind if I answer that here...



    IMVOH, when C++ 0x is finally released, it should have a fairly big impact.
    I can see it now... You trademark a logo, and give a company the chance to
    stick it on their products to give their customers piece of mind:

    Customer: "Well, the software has the Relacy logo on the front of the box;
    it must be correct concurrent software indeed!"

    Vendor: "Are software lives up to the prerequisites of the Relacy logo. We
    are proud to be able to display it on our software packaging."


    I hope you make tons of $$$ with this tool Dmitriy. Get rich and, then a
    little richer; AFAICT its a winner.


    Any thoughts?


    =^D

  • Chris M. Thomasson

    #2
    Re: Synchronization Algorithm Verificator for C++0x

    "Chris M. Thomasson" <no@spam.invali dwrote in message
    news:gH4mk.1178 5$KZ.2833@newsf e03.iad...
    "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
    news:51a36d85-2fc6-4bd6-af64-29950ce5ab6a@59 g2000hsb.google groups.com...
    >>I want to announce tool called Relacy Race Detector, which I've
    >developed.
    [...]
    >
    [...]
    >
    After you integrate a mutex and condvar into the detection framework, I
    would like to model an eventcount.
    I would be interested to see how long it takes to detect the following bug
    you found in an older version of AppCore:



    [...]

    Comment

    • Chris M. Thomasson

      #3
      Re: Synchronization Algorithm Verificator for C++0x

      "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
      news:8488efa1-9383-4232-b5a8-482852cee1f3@a1 g2000hsb.google groups.com...
      On 6 Á×Ç, 02:54, "Chris M. Thomasson" <n...@spam.inva lidwrote:
      I would be interested to see how long it takes to detect the following
      bug
      you found in an older version of AppCore:

      http://groups.google.com/group/comp..../browse_frm/th...
      Ok, let's see. I've just finished basic mutex and condvar
      implementation.
      Can you e-mail the new version to me please?



      Here is eventcount:
      class eventcount
      {
      [...]
      void wait(unsigned cmp)
      {
      unsigned ec = count($).load(s td::memory_orde r_seq_cst);
      if (cmp == (ec & 0x7FFFFFFF))
      {
      guard.lock($);
      ec = count($).load(s td::memory_orde r_seq_cst);
      ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^^^
      I don't think that you need a memory barrier under the lock because the
      only mutation which will make the following compare succeed is also
      performed under the same lock. The signal-slow path is under the lock, and
      the waiter slow-path is under the lock. However, I don't know if this
      violates some rule in C++ 0x.

      if (cmp == (ec & 0x7FFFFFFF))
      {
      waiters($) += 1;
      cv.wait(guard, $);
      }
      guard.unlock($) ;
      }
      }
      };
      Here is the queue:
      [...]
      Here is the test:
      [...]
      It takes around 40 minutes.
      Let's run it!
      Here is output (note error detected on iteration 3):
      struct spsc_queue_test
      DEADLOCK
      iteration: 3
      Cool; it sure does detect the race-condition in the old AppCore! Very nice
      indeed.



      [...]
      Operations on condition variables are not yet in execution history.
      Let's replace 'ec.signal_rela xed()' with 'ec.signal()'. Here is the
      output:
      struct spsc_queue_test
      [...]
      iterations: 1000000
      total time: 2422
      throughput: 412881
      Test passed. Throughput is around 400'000 test executions per second.
      Beautiful. My eventcount algorihtm works!

      =^D



      Also test reveals some errors with memory fences.
      In signaling function you are using acquire fence in cas, but relaxed
      is enough here:
      void signal_impl(uns igned cmp)
      {
      if (cmp & 0x80000000)
      {
      guard.lock($);
      while (false == count($).compar e_swap(cmp,
      (cmp + 1) & 0x7FFFFFFF, std::memory_ord er_relaxed));
      unsigned w = waiters($);
      waiters($) = 0;
      guard.unlock($) ;
      if (w)
      cv.notify_all($ );
      }
      }
      Right. The current public AppCore atomic API is not fine-grain enough to
      allow for relaxed operations. I need to change that.



      Also. Following naive approach doesn't work (at least in C++0x):
      void signal()
      {
      std::atomic_thr ead_fence(std:: memory_order_se q_cst);
      signal_relaxed( );
      }
      Good thing the MFENCE works in x86!



      You have to execute sequentially consistent atomic RMW on 'ec'
      variable here. Because sequentially consistent fence and sequentially
      consistent atomic RMW operation doesn' t synchronize-with each other.
      Hmmm... This is strange. But I can't find anything about this in
      draft... Anthony Williams said that committee made some changes to
      initial proposal on fences, but they are not yet published...
      This is weird. a seq_cst fence does not work with seq_cst RMW!? Do you
      happen to know the rational for this? IMVHO, it makes no sense at all... I
      must be missing something.

      :^/

      Comment

      • Dmitriy V'jukov

        #4
        Re: Synchronization Algorithm Verificator for C++0x

        On Aug 6, 2:46 am, "Chris M. Thomasson" <n...@spam.inva lidwrote:
        Also, it seems like you could parallelize the testing process somewhat... I
        need to think about this some more.
        Yes, this is in todo list:


        Feel free to comment on todo/feature list, propose new items, and
        prioritize items.
        It's better to do this here:


        Dmitriy V'jukov

        Comment

        • Dmitriy V'jukov

          #5
          Re: Synchronization Algorithm Verificator for C++0x

          On Aug 6, 2:46 am, "Chris M. Thomasson" <n...@spam.inva lidwrote:
          IMVOH, when C++ 0x is finally released, it should have a fairly big impact.
          I can see it now... You trademark a logo, and give a company the chance to
          stick it on their products to give their customers piece of mind:
          >
          Customer: "Well, the software has the Relacy logo on the front of the box;
          it must be correct concurrent software indeed!"
          >
          Vendor: "Are software lives up to the prerequisites of the Relacy logo. We
          are proud to be able to display it on our software packaging."
          >
          I hope you make tons of $$$ with this tool Dmitriy. Get rich and, then a
          little richer; AFAICT its a winner.
          >
          Any thoughts?
          >
          =^D
          :)
          Sounds great!

          I think I can start developing 2 logos:
          [RELACY APPROVED]
          and the second:
          [RELACY BUSTED]
          :)

          Dmitriy V'jukov

          Comment

          • Peter Dimov

            #6
            Re: Synchronization Algorithm Verificator for C++0x

            On Aug 6, 5:31 am, "Dmitriy V'jukov" <dvyu...@gmail. comwrote:
            Also. Following naive approach doesn't work (at least in C++0x):
                    void signal()
                    {
                            std::atomic_thr ead_fence(std:: memory_order_se q_cst);
                            signal_relaxed( );
                    }
            I'm not sure that I understand the algorithm here, but the reason this
            doesn't introduce a synchronize-with relationship is that the fence
            needs to follow the load, not precede it.

            load.relaxed count
            fence.seq_cst

            does synchronize with

            store.seq_cst count, v

            when the load sees the value v written by the store.

            If you do need a leading seq_cst fence, why not just use a seq_cst
            load?

            Comment

            • Dmitriy V'jukov

              #7
              Re: Synchronization Algorithm Verificator for C++0x

              On Aug 6, 5:42 pm, "Dmitriy V'jukov" <dvyu...@gmail. comwrote:
              On Aug 6, 5:08 pm, Peter Dimov <pdi...@gmail.c omwrote:
              >
              On Aug 6, 5:31 am, "Dmitriy V'jukov" <dvyu...@gmail. comwrote:
              >
              Also. Following naive approach doesn't work (at least in C++0x):
              void signal()
              {
              std::atomic_thr ead_fence(std:: memory_order_se q_cst);
              signal_relaxed( );
              }
              >
              I'm not sure that I understand the algorithm here, but the reason this
              doesn't introduce a synchronize-with relationship is that the fence
              needs to follow the load, not precede it.
              >
              I will try to describe the issue in more detail in separate post in
              near future.
              Here:


              Dmitriy V'jukov

              Comment

              • Dmitriy V'jukov

                #8
                Re: Synchronization Algorithm Verificator for C++0x

                On Aug 7, 7:13 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
                In terms of the C++MM, the options I see are:
                1. store.seq_cst in push, load.seq_cst in signal, cas.seq_cst in get
                2. fence.seq_cst+l oad.relaxed in signal, cas.relaxed+fen ce.seq_cst in
                get
                3. RMW.seq_cst in signal, cas.seq_cst in get
                and maybe
                4. RMW.acq_rel in signal, cas.acq_rel in get
                >
                For now, I choose number 2 because in the current eventcount algorithm
                as-is, the `signal()' procedure needs a preceding StoreLoad, and the `get()'
                function needs a trailing StoreLoad. AFAICT, this is fine... As for the
                other choices, well, I don't like number 1 because it moves memory
                visibility rules directly into an end-user algorithm. I dislike 3-4 because
                of the dummy RMW; for instance, I could use a `seq_cst' fetch-and-add on the
                eventcount state with an addend of zero. However, IMVHO, that looks like
                fairly dirty hack indeed... I mean, IMVHO, a RMW which does not perform a
                mutation is basically pointless...

                Consider implementation of number 2 on x86.
                Producer:
                1. Interlocked RMW to enqueue element
                2. mfence in signal (even more expensive than Interlocked RMW, and
                absolutely useless)

                Consumer:
                3. Interlocked RMW in get
                4. mfence in get (even more expensive than Interlocked RMW, and
                absolutely useless)

                Maybe compiler can optimize 4 away, because 3 and 4 will be in one
                function. But I am not sure that first versions of C++0x compilers
                will be so smart. And it's questionable that compiler will be able to
                optimize 2 away, because 1 and 2 will be in different functions and
                probably in different source files.

                Do you like this? Don't hasten forgetting your asm skills ;)

                Dmitriy V'jukov

                Comment

                • Peter Dimov

                  #9
                  Re: Synchronization Algorithm Verificator for C++0x

                  On Aug 7, 6:31 pm, "Dmitriy V'jukov" <dvyu...@gmail. comwrote:
                  Consider implementation of number 2 on x86.
                  Producer:
                  1. Interlocked RMW to enqueue element
                  2. mfence in signal (even more expensive than Interlocked RMW, and
                  absolutely useless)
                  It might be interesting to time the actual mfence overhead when the
                  preceding locked RMW instruction has already drained the store queue.

                  Comment

                  • kwikius

                    #10
                    Re: Synchronization Algorithm Verificator for C++0x

                    On Aug 7, 8:31 pm, Peter Dimov <pdi...@gmail.c omwrote:
                    On Aug 7, 6:31 pm, "Dmitriy V'jukov" <dvyu...@gmail. comwrote:
                    >
                    Consider implementation of number 2 on x86.
                    Producer:
                    1. Interlocked RMW to enqueue element
                    2. mfence in signal (even more expensive than Interlocked RMW, and
                    absolutely useless)
                    >
                    It might be interesting to time the actual mfence overhead when the
                    preceding locked RMW instruction has already drained the store queue.
                    You could , I guess try to extract the individual elements of soup, or
                    you could just call it soup.

                    regards
                    Andy Little

                    Comment

                    • Chris M. Thomasson

                      #11
                      Re: Synchronization Algorithm Verificator for C++0x

                      "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
                      news:490d36a1-28e6-48ae-b5e4-2e9a115edaed@k3 6g2000pri.googl egroups.com...
                      On Aug 7, 7:13 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
                      >
                      In terms of the C++MM, the options I see are:
                      1. store.seq_cst in push, load.seq_cst in signal, cas.seq_cst in get
                      2. fence.seq_cst+l oad.relaxed in signal, cas.relaxed+fen ce.seq_cst in
                      get
                      3. RMW.seq_cst in signal, cas.seq_cst in get
                      and maybe
                      4. RMW.acq_rel in signal, cas.acq_rel in get
                      >>
                      >For now, I choose number 2 because in the current eventcount algorithm
                      >as-is, the `signal()' procedure needs a preceding StoreLoad, and the
                      >`get()'
                      >function needs a trailing StoreLoad. AFAICT, this is fine... As for the
                      >other choices, well, I don't like number 1 because it moves memory
                      >visibility rules directly into an end-user algorithm. I dislike 3-4
                      >because
                      >of the dummy RMW; for instance, I could use a `seq_cst' fetch-and-add on
                      >the
                      >eventcount state with an addend of zero. However, IMVHO, that looks like
                      >fairly dirty hack indeed... I mean, IMVHO, a RMW which does not perform a
                      >mutation is basically pointless...
                      >
                      >
                      Consider implementation of number 2 on x86.
                      Producer:
                      1. Interlocked RMW to enqueue element
                      2. mfence in signal (even more expensive than Interlocked RMW, and
                      absolutely useless)
                      >
                      Consumer:
                      3. Interlocked RMW in get
                      4. mfence in get (even more expensive than Interlocked RMW, and
                      absolutely useless)
                      >
                      Maybe compiler can optimize 4 away, because 3 and 4 will be in one
                      function. But I am not sure that first versions of C++0x compilers
                      will be so smart. And it's questionable that compiler will be able to
                      optimize 2 away, because 1 and 2 will be in different functions and
                      probably in different source files.
                      >
                      Do you like this?
                      No.

                      Don't hasten forgetting your asm skills ;)
                      Ouch.


                      Comment

                      Working...