Re: new multicore programming docs for GCC and Visual Studio

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Chris M. Thomasson

    Re: new multicore programming docs for GCC and Visual Studio

    [comp.lang.c++ added...



    ]




    "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
    news:4306b243-2d28-4ec6-a275-883db633b9b8@l4 2g2000hsc.googl egroups.com...
    On Sep 29, 4:23 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
    >
    >Are you using something like the algorithm in TBB?
    TBB is based on MIT's Cilk work:
    >http://www.cilk.com/multicore-blog/b...ost-Influentia...
    How does the internal low-level impl compare to the following:



    I am looking for raw pseudo-code for atomic deque internal impl
    details...
    AFAICT, this work from SUN would scale better than Clik. Please correct
    me
    if I am way off base here. It seems like spawning a successor thread has
    overheads... Humm. Pleas try to bear with me here; okay? Correct my
    ignorance on Clik's work-stealing internal impl... Well, let me pick an
    impl
    to focus on... Say, DEC Alpha?
    AFAIK, in early days Cilk's work-stealing deque used mutex-based
    pop(). But I remember there was some mentions of non-blocking
    algorithms in the Cilk's papers, something like "some people point us
    that it's possible to implement work-stealing deque in completely non-
    blocking manner". And I don't know whether non-blocking deque was
    finally incorporated into Cilk.
    If mutex is spin-mutex (i.e. there is only 1 atomic RMW per lock/
    unlock) and stealing is rare, then mutex-based deque is nearly the
    same as non-blocking deque with 1 RMW... provided that push() doesn't
    use mutex. And provided that atomic RMW has the same cost as StoreLoad
    memory fence (x86).
    You mean:

    1 atomic RMW and 1 StoreLoad membar for lock
    1 atomic RMW and 1 LoadStore membar for unlock

    2 atomics, and 2 membars; fairly expensive... Well... You indeed have a good
    point in the case that stealing is rare...

  • Dmitriy V'jukov

    #2
    Re: new multicore programming docs for GCC and Visual Studio

    On Sep 29, 5:02 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
    You mean:
    >
    1 atomic RMW and 1 StoreLoad membar for lock
    1 atomic RMW and 1 LoadStore membar for unlock
    >
    2 atomics, and 2 membars; fairly expensive... Well... You indeed have a good
    point in the case that stealing is rare...

    No, I mean for spin-mutex:
    1 atomic RMW + 1 StoreLoad membar for lock
    store-release for unlock

    Which for x86 means:
    1 atomic RMW for lock (StoreLoad is implied, and so basically
    costless)
    plain store for unlock (release is implied)


    Dmitriy V'jukov

    Comment

    • Jean-Marc Desperrier

      #3
      Re: new multicore programming docs for GCC and Visual Studio

      Chris M. Thomasson wrote:And why not remove fr.comp.lang.c+ + ?

      Please everyone posting to this thread !

      Comment

      • Chris M. Thomasson

        #4
        Re: new multicore programming docs for GCC and Visual Studio

        "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
        news:dd05d47f-9502-43a0-ac49-844bbe34c6c7@w7 g2000hsa.google groups.com...
        On Sep 29, 5:02 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
        You mean:

        1 atomic RMW and 1 StoreLoad membar for lock
        1 atomic RMW and 1 LoadStore membar for unlock

        2 atomics, and 2 membars; fairly expensive... Well... You indeed have a
        good
        point in the case that stealing is rare...
        No, I mean for spin-mutex:
        1 atomic RMW + 1 StoreLoad membar for lock
        store-release for unlock
        Ahhh, I fail see spin-mutex and thing adaptive mutex. Sorry for my
        confusion.



        Which for x86 means:
        1 atomic RMW for lock (StoreLoad is implied, and so basically
        costless)
        IMHO, its not costless. Not at all...



        plain store for unlock (release is implied)
        Indeed.

        Comment

        • Chris M. Thomasson

          #5
          Re: new multicore programming docs for GCC and Visual Studio


          "Chris M. Thomasson" <no@spam.invali dwrote in message
          news:7ImEk.572$ kc.394@newsfe12 .iad...
          "Dmitriy V'jukov" <dvyukov@gmail. comwrote in message
          news:dd05d47f-9502-43a0-ac49-844bbe34c6c7@w7 g2000hsa.google groups.com...
          On Sep 29, 5:02 pm, "Chris M. Thomasson" <n...@spam.inva lidwrote:
          >
          You mean:
          >
          1 atomic RMW and 1 StoreLoad membar for lock
          1 atomic RMW and 1 LoadStore membar for unlock
          >
          2 atomics, and 2 membars; fairly expensive... Well... You indeed have a
          good
          point in the case that stealing is rare...
          >
          >
          >No, I mean for spin-mutex:
          >1 atomic RMW + 1 StoreLoad membar for lock
          >store-release for unlock
          >
          Ahhh, I fail see spin-mutex and thing adaptive mutex.

          Let me rephrase...


          I see spin-mutex and thought adaptive mutex.



          Sorry for my confusion.
          :^/



          >Which for x86 means:
          >1 atomic RMW for lock (StoreLoad is implied, and so basically
          >costless)
          >
          IMHO, its not costless. Not at all...
          Bus locking slow-path... Cache locking fast-path...



          >plain store for unlock (release is implied)
          >
          Indeed.
          Even this operation has excessive costs when compared to an arch that does
          not have these implied membars... RMO SPARC has its advantages indeed. TSO
          compared to RMO? Which one is more light weight and able to scale... I
          personally prefer a fine-grain model over TSO any day...

          Comment

          Working...