Reasoning behind process instead of thread based arch?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Jim C. Nasby

    #16
    Re: Reasoning behind process instead of thread based

    On Thu, Oct 28, 2004 at 02:44:55PM +0200, Marco Colombo wrote:[color=blue]
    > I think that it would be interesting to discuss multi(processes/threades)
    > model vs mono (process/thread). Mono as in _one_ single process/thread
    > per CPU, not one per session. That is, moving all the "scheduling "
    > between sessions entirely to userspace. The server gains almost complete
    > control over the data structures allocated per session, and the resources
    > allocated _to_ sessions.[/color]

    This is how DB2 and Oracle work. Having scheduling control is very
    interesting, but I'm not sure it needs to be accomplished this way.
    There are other advantages too; in both products you have a single pool
    of sort memory; you can allocate as much memory to sorting as you want
    without the risk of exceeding it. PostgreSQL can't do this and it makes
    writing code that wants a lot of sort memory a real pain. Of course this
    could probably be solved without going to a 'mono process' model.
    --
    Jim C. Nasby, Database Consultant decibel@decibel .org
    Give your computer some brain candy! www.distributed.net Team #1828

    Windows: "Where do you want to go today?"
    Linux: "Where do you want to go tomorrow?"
    FreeBSD: "Are you guys coming, or what?"

    ---------------------------(end of broadcast)---------------------------
    TIP 5: Have you checked our extensive FAQ?



    Comment

    • Chris Browne

      #17
      Re: Reasoning behind process instead of thread based

      nd02tsk@student .hig.se writes:[color=blue][color=green]
      >>Two: If a
      >> single process in a multi-process application crashes, that process
      >> alone dies. The buffer is flushed, and all the other child processes
      >> continue happily along. In a multi-threaded environment, when one
      >> thread dies, they all die.[/color]
      >
      > So this means that if a single connection thread dies in MySQL, all
      > connections die?[/color]

      Yes, that's right.
      [color=blue]
      > Seems rather serious. I am doubtful that is how they have
      > implemented it.[/color]

      If it's a multithreaded application, then there is nothing to doubt
      about the matter. If any thread dies, the whole process croaks, and
      there's no choice in the matter. If a thread has been corrupted to
      the point of crashing, then the entire process has been corrupted.
      --
      let name="cbbrowne" and tld="cbbrowne.c om" in String.concat "@" [name;tld];;

      A VAX is virtually a computer, but not quite.

      Comment

      • Marco Colombo

        #18
        Re: Reasoning behind process instead of thread based

        On Thu, 28 Oct 2004, Thomas Hallgren wrote:
        [color=blue]
        > Marco Colombo wrote:[color=green]
        >> [processes vs threads stuff deleted]
        >>
        >> In any modern and reasonable Unix-like OS, there's very little difference
        >> between the multi-process or the multi-thread model. _Default_ behaviour
        >> is different, e.g. memory is shared by default for threads, but processes
        >> can share memory as well. There are very few features threads have
        >> that processes don't, and vice versa. And if the OS is good enough,
        >> there are hardly performance issues.
        >>[/color]
        > Most servers have a desire to run on Windows-NT and I would consider Solaris
        > a "modern and reasonable Unix-like OS". On both, you will find a significant
        > performance difference. I think that's true for Irix as well. Your statement
        > is very true for Linux based OS'es though.[/color]

        See the "if the OS is good enough" part... :-)

        AFAIK, many techniques developed under Linux have been included in
        recent releases of other OSes. I haven't seen the source, of course.

        If recent Solaris still has processes which are actually "heavy", well
        I call that "an old legacy (mis-)feature on a modern and reasonable OS"...
        Back in '93, Mr. Gates used to state: "NT is Unix". If it's not the case
        yet, well, it's not _my_ fault.
        [color=blue][color=green]
        >> I think that it would be interesting to discuss multi(processes/threades)
        >> model vs mono (process/thread). Mono as in _one_ single process/thread
        >> per CPU, not one per session. That is, moving all the "scheduling "
        >> between sessions entirely to userspace. The server gains almost complete
        >> control over the data structures allocated per session, and the resources
        >> allocated _to_ sessions.
        >>[/color]
        > I think what you mean is user space threads. In the Java community known as
        > "green" threads, Windows call it "fibers". That approach has been more or
        > less abandoned by Sun, BEA, and other Java VM manufacturers since a user
        > space scheduler is confined to one CPU, one process, and unable to balance
        > the scheduling with other processes and their threads. A kernel scheduler
        > might be slightly heavier but it does a much better job.
        >
        > Regards,
        > Thomas Hallgren[/color]

        No. I just meant "scheduling " between PG sessions. I'm not interested in
        userspace threads. Those are general purpose solutions, with the drawbacks
        you pointed out.

        I mean an entirely event driven server. The trickiest part is to handle
        N-way. On 1-way, it's quite a clear and well-defined model.

        I'm not going to say it's easy. I'd like to move the discussion away from
        the sterile processes vs threads issue. Most differences there are
        platform specific anyway. The model is the same: one thread of execution
        per session. I'm proposing a new model entirely (well I'm proposing
        a _discussion_ on a model vs. model basis and not implementation vs
        implementation of the same model).

        If you read this thread, you'll notice most people miss the point:
        either processes or threads, the model is the same, many many actors
        that share a big part of their memory. The problems are the same, too.
        Should we buy the fact that processes are safer? Of course, it's not the
        case, when they share such a big memory segment. The chance of a runaway
        pointer thrashing some important shared data is almost the same for both
        processes and threads. If one backend crashes for a SIGSEGV, I'd bet
        nothing on the shared mem not being corrupted somehow.

        My point being: how about [discussing of] a completely different model
        instead?

        ..TM.
        --
        ____/ ____/ /
        / / / Marco Colombo
        ___/ ___ / / Technical Manager
        / / / ESI s.r.l.
        _____/ _____/ _/ Colombo@ESI.it

        ---------------------------(end of broadcast)---------------------------
        TIP 8: explain analyze is your friend

        Comment

        • Thomas Hallgren

          #19
          Re: Reasoning behind process instead of thread based

          Marco,
          [color=blue]
          > I mean an entirely event driven server. The trickiest part is to handle
          > N-way. On 1-way, it's quite a clear and well-defined model.[/color]

          You need to clarify this a bit.

          You say that the scheduler is in user-space, yet there's only one thread
          per process and one process per CPU. You state that instead of threads,
          you want it to be completely event driven. In essence that would mean
          serving one event per CPU from start to end at any given time. What is
          an event in this case? Where did it come from? How will this system
          serve concurrent users?

          Regards,
          Thomas Hallgren



          ---------------------------(end of broadcast)---------------------------
          TIP 4: Don't 'kill -9' the postmaster

          Comment

          • Marco Colombo

            #20
            Re: Reasoning behind process instead of thread based

            On Thu, 28 Oct 2004, Thomas Hallgren wrote:
            [color=blue]
            > Marco Colombo wrote:[color=green]
            >> [processes vs threads stuff deleted]
            >>
            >> In any modern and reasonable Unix-like OS, there's very little difference
            >> between the multi-process or the multi-thread model. _Default_ behaviour
            >> is different, e.g. memory is shared by default for threads, but processes
            >> can share memory as well. There are very few features threads have
            >> that processes don't, and vice versa. And if the OS is good enough,
            >> there are hardly performance issues.
            >>[/color]
            > Most servers have a desire to run on Windows-NT and I would consider Solaris
            > a "modern and reasonable Unix-like OS". On both, you will find a significant
            > performance difference. I think that's true for Irix as well. Your statement
            > is very true for Linux based OS'es though.[/color]

            See the "if the OS is good enough" part... :-)

            AFAIK, many techniques developed under Linux have been included in
            recent releases of other OSes. I haven't seen the source, of course.

            If recent Solaris still has processes which are actually "heavy", well
            I call that "an old legacy (mis-)feature on a modern and reasonable OS"...
            Back in '93, Mr. Gates used to state: "NT is Unix". If it's not the case
            yet, well, it's not _my_ fault.
            [color=blue][color=green]
            >> I think that it would be interesting to discuss multi(processes/threades)
            >> model vs mono (process/thread). Mono as in _one_ single process/thread
            >> per CPU, not one per session. That is, moving all the "scheduling "
            >> between sessions entirely to userspace. The server gains almost complete
            >> control over the data structures allocated per session, and the resources
            >> allocated _to_ sessions.
            >>[/color]
            > I think what you mean is user space threads. In the Java community known as
            > "green" threads, Windows call it "fibers". That approach has been more or
            > less abandoned by Sun, BEA, and other Java VM manufacturers since a user
            > space scheduler is confined to one CPU, one process, and unable to balance
            > the scheduling with other processes and their threads. A kernel scheduler
            > might be slightly heavier but it does a much better job.
            >
            > Regards,
            > Thomas Hallgren[/color]

            No. I just meant "scheduling " between PG sessions. I'm not interested in
            userspace threads. Those are general purpose solutions, with the drawbacks
            you pointed out.

            I mean an entirely event driven server. The trickiest part is to handle
            N-way. On 1-way, it's quite a clear and well-defined model.

            I'm not going to say it's easy. I'd like to move the discussion away from
            the sterile processes vs threads issue. Most differences there are
            platform specific anyway. The model is the same: one thread of execution
            per session. I'm proposing a new model entirely (well I'm proposing
            a _discussion_ on a model vs. model basis and not implementation vs
            implementation of the same model).

            If you read this thread, you'll notice most people miss the point:
            either processes or threads, the model is the same, many many actors
            that share a big part of their memory. The problems are the same, too.
            Should we buy the fact that processes are safer? Of course, it's not the
            case, when they share such a big memory segment. The chance of a runaway
            pointer thrashing some important shared data is almost the same for both
            processes and threads. If one backend crashes for a SIGSEGV, I'd bet
            nothing on the shared mem not being corrupted somehow.

            My point being: how about [discussing of] a completely different model
            instead?

            ..TM.
            --
            ____/ ____/ /
            / / / Marco Colombo
            ___/ ___ / / Technical Manager
            / / / ESI s.r.l.
            _____/ _____/ _/ Colombo@ESI.it

            ---------------------------(end of broadcast)---------------------------
            TIP 8: explain analyze is your friend

            Comment

            • Thomas Hallgren

              #21
              Re: Reasoning behind process instead of thread based

              Marco,
              [color=blue]
              > I mean an entirely event driven server. The trickiest part is to handle
              > N-way. On 1-way, it's quite a clear and well-defined model.[/color]

              You need to clarify this a bit.

              You say that the scheduler is in user-space, yet there's only one thread
              per process and one process per CPU. You state that instead of threads,
              you want it to be completely event driven. In essence that would mean
              serving one event per CPU from start to end at any given time. What is
              an event in this case? Where did it come from? How will this system
              serve concurrent users?

              Regards,
              Thomas Hallgren



              ---------------------------(end of broadcast)---------------------------
              TIP 4: Don't 'kill -9' the postmaster

              Comment

              • Marco Colombo

                #22
                Re: Reasoning behind process instead of thread based

                On Thu, 28 Oct 2004, Thomas Hallgren wrote:
                [color=blue]
                > Marco,
                >[color=green]
                >> I mean an entirely event driven server. The trickiest part is to handle
                >> N-way. On 1-way, it's quite a clear and well-defined model.[/color]
                >
                > You need to clarify this a bit.
                >
                > You say that the scheduler is in user-space, yet there's only one thread per
                > process and one process per CPU. You state that instead of threads, you want
                > it to be completely event driven. In essence that would mean serving one
                > event per CPU from start to end at any given time. What is an event in this
                > case? Where did it come from? How will this system serve concurrent users?[/color]

                Let's take a look at the bigger picture. We need to serve many clients,
                that is many sessions, that is many requests (queries) at the same time.
                Since there may be more than one active request, we need to schedule
                them in some way. That's what I meant with "session scheduler".

                The traditional accept&fork model doesn't handle that directly: by
                creating one process per session, it relays on the process scheduler
                in the kernel. I state this is suboptimal, both for extra resources
                allocated to each session, and for the kernel policies not being
                perfectly tailored to the job of scheduling PG sessions (*).
                Not to mention the postmaster has almost no control over these policies.

                Now, threads help a bit in reducing the per session overhead. But that's
                more an implementation detail, and it's _very_ platform specific.
                Switching to threads has a great impact on many _details_ of the
                server, the benefits depend a lot on the platform, but the model is
                just the same, with the same essential problems.
                Many big changes for little gain. Let's explore, at least in theory,
                the advantages of a completely different model (that implies a lot
                of changes too, of course - but for something).

                You ask what an event is? An event can be:
                - input from a connection (usually a new query);
                - notification that I/O needed by a pending query has completed;
                - if we don't want a single query starve the server, an alarm of kind
                (I think this is a corner case, but still possible;)
                - something else I haven't thought about.

                At any given moment, there are many pending queries. Most of them
                will be waiting for I/O to complete. That's how the server handles
                concurrent users.
                [color=blue]
                >
                > Regards,
                > Thomas Hallgren[/color]

                (*) They're oriented to general purpose processes. Think of how CPU
                usage affects relative priorities. In a DB context, there may be
                other criteria of greater significance. Roughly speaking, the larger
                the part of the data a single session holds locked, the sooner it should
                be completed. The kernel has no knowledge of this. To the kernel,
                "big" processes are those that are using a lot of CPU. And the policy is
                to slow them down. To a DB, a "big" queries are those that force the most
                serialization ("lock a lot"), and they should be completed as soon as
                possible.

                ..TM.
                --
                ____/ ____/ /
                / / / Marco Colombo
                ___/ ___ / / Technical Manager
                / / / ESI s.r.l.
                _____/ _____/ _/ Colombo@ESI.it

                ---------------------------(end of broadcast)---------------------------
                TIP 3: if posting/reading through Usenet, please send an appropriate
                subscribe-nomail command to majordomo@postg resql.org so that your
                message can get through to the mailing list cleanly

                Comment

                • Marco Colombo

                  #23
                  Re: Reasoning behind process instead of thread based

                  On Thu, 28 Oct 2004, Thomas Hallgren wrote:
                  [color=blue]
                  > Marco,
                  >[color=green]
                  >> I mean an entirely event driven server. The trickiest part is to handle
                  >> N-way. On 1-way, it's quite a clear and well-defined model.[/color]
                  >
                  > You need to clarify this a bit.
                  >
                  > You say that the scheduler is in user-space, yet there's only one thread per
                  > process and one process per CPU. You state that instead of threads, you want
                  > it to be completely event driven. In essence that would mean serving one
                  > event per CPU from start to end at any given time. What is an event in this
                  > case? Where did it come from? How will this system serve concurrent users?[/color]

                  Let's take a look at the bigger picture. We need to serve many clients,
                  that is many sessions, that is many requests (queries) at the same time.
                  Since there may be more than one active request, we need to schedule
                  them in some way. That's what I meant with "session scheduler".

                  The traditional accept&fork model doesn't handle that directly: by
                  creating one process per session, it relays on the process scheduler
                  in the kernel. I state this is suboptimal, both for extra resources
                  allocated to each session, and for the kernel policies not being
                  perfectly tailored to the job of scheduling PG sessions (*).
                  Not to mention the postmaster has almost no control over these policies.

                  Now, threads help a bit in reducing the per session overhead. But that's
                  more an implementation detail, and it's _very_ platform specific.
                  Switching to threads has a great impact on many _details_ of the
                  server, the benefits depend a lot on the platform, but the model is
                  just the same, with the same essential problems.
                  Many big changes for little gain. Let's explore, at least in theory,
                  the advantages of a completely different model (that implies a lot
                  of changes too, of course - but for something).

                  You ask what an event is? An event can be:
                  - input from a connection (usually a new query);
                  - notification that I/O needed by a pending query has completed;
                  - if we don't want a single query starve the server, an alarm of kind
                  (I think this is a corner case, but still possible;)
                  - something else I haven't thought about.

                  At any given moment, there are many pending queries. Most of them
                  will be waiting for I/O to complete. That's how the server handles
                  concurrent users.
                  [color=blue]
                  >
                  > Regards,
                  > Thomas Hallgren[/color]

                  (*) They're oriented to general purpose processes. Think of how CPU
                  usage affects relative priorities. In a DB context, there may be
                  other criteria of greater significance. Roughly speaking, the larger
                  the part of the data a single session holds locked, the sooner it should
                  be completed. The kernel has no knowledge of this. To the kernel,
                  "big" processes are those that are using a lot of CPU. And the policy is
                  to slow them down. To a DB, a "big" queries are those that force the most
                  serialization ("lock a lot"), and they should be completed as soon as
                  possible.

                  ..TM.
                  --
                  ____/ ____/ /
                  / / / Marco Colombo
                  ___/ ___ / / Technical Manager
                  / / / ESI s.r.l.
                  _____/ _____/ _/ Colombo@ESI.it

                  ---------------------------(end of broadcast)---------------------------
                  TIP 3: if posting/reading through Usenet, please send an appropriate
                  subscribe-nomail command to majordomo@postg resql.org so that your
                  message can get through to the mailing list cleanly

                  Comment

                  Working...