SSE and MMX support in the JIT compiler

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Andre

    SSE and MMX support in the JIT compiler

    Hi,

    I was wondering if anyone knows whether the JIT compiler supports
    SSE/SSE2 instructions? Thanks

    -Andre

  • David Notario

    #2
    Re: SSE and MMX support in the JIT compiler

    We currently use SSE2 for some things like the double to int cast, it's not
    used for general codegen though.

    --
    David Notario
    Software Design Engineer - CLR JIT Compiler


    "Andre" <food_crazy@hot mail.com> wrote in message
    news:3F276073.3 090707@hotmail. com...[color=blue]
    > Hi,
    >
    > I was wondering if anyone knows whether the JIT compiler supports
    > SSE/SSE2 instructions? Thanks
    >
    > -Andre
    >[/color]


    Comment

    • David Notario

      #3
      Re: SSE and MMX support in the JIT compiler

      We currently use SSE2 for some things like the double to int cast, it's not
      used for general codegen though.

      --
      David Notario
      Software Design Engineer - CLR JIT Compiler


      "Andre" <food_crazy@hot mail.com> wrote in message
      news:3F276073.3 090707@hotmail. com...[color=blue]
      > Hi,
      >
      > I was wondering if anyone knows whether the JIT compiler supports
      > SSE/SSE2 instructions? Thanks
      >
      > -Andre
      >[/color]


      Comment

      • Andre

        #4
        Re: SSE and MMX support in the JIT compiler

        Thanks David,

        There's one thing I need to ask you - were there a fair amount of
        features/improvements in ver1.1 of the CLR compared to v1.0? I see that
        total numebr of bytes JIT'd are noticeably less in ver1.1 and some
        profiling showed that ver1.1 gave better MFLOPs in executing some
        benchmarking code. Thanks

        -Andre

        David Notario wrote:[color=blue]
        > We currently use SSE2 for some things like the double to int cast, it's not
        > used for general codegen though.
        >[/color]

        Comment

        • Andre

          #5
          Re: SSE and MMX support in the JIT compiler

          Thanks David,

          There's one thing I need to ask you - were there a fair amount of
          features/improvements in ver1.1 of the CLR compared to v1.0? I see that
          total numebr of bytes JIT'd are noticeably less in ver1.1 and some
          profiling showed that ver1.1 gave better MFLOPs in executing some
          benchmarking code. Thanks

          -Andre

          David Notario wrote:[color=blue]
          > We currently use SSE2 for some things like the double to int cast, it's not
          > used for general codegen though.
          >[/color]

          Comment

          • David Notario

            #6
            Re: SSE and MMX support in the JIT compiler

            No, we didn't do much optimization work in the JIT for v1.1, except for some
            very targetted ones that offered a significant speed boost in exchange for
            little dev work (1.1 was mainly a security fixes only release for the CLR),
            such as the double to int cast (40x speed increase by just using SSE2
            instruction)

            --
            David Notario
            Software Design Engineer - CLR JIT Compiler


            "Andre" <food_crazy@hot mail.com> wrote in message
            news:3F2B7FF2.1 010003@hotmail. com...[color=blue]
            > Thanks David,
            >
            > There's one thing I need to ask you - were there a fair amount of
            > features/improvements in ver1.1 of the CLR compared to v1.0? I see that
            > total numebr of bytes JIT'd are noticeably less in ver1.1 and some
            > profiling showed that ver1.1 gave better MFLOPs in executing some
            > benchmarking code. Thanks
            >
            > -Andre
            >
            > David Notario wrote:[color=green]
            > > We currently use SSE2 for some things like the double to int cast, it's[/color][/color]
            not[color=blue][color=green]
            > > used for general codegen though.
            > >[/color]
            >[/color]


            Comment

            • David Notario

              #7
              Re: SSE and MMX support in the JIT compiler

              Where I say security fixes for the CLR, I mean for the JIT, there were perf
              improvements in other areas different than the JIT

              --
              David Notario
              Software Design Engineer - CLR JIT Compiler


              "David Notario" <dnotario@onlin e.microsoft.com > wrote in message
              news:eZt1DbvWDH A.608@TK2MSFTNG P12.phx.gbl...[color=blue]
              > No, we didn't do much optimization work in the JIT for v1.1, except for[/color]
              some[color=blue]
              > very targetted ones that offered a significant speed boost in exchange for
              > little dev work (1.1 was mainly a security fixes only release for the[/color]
              CLR),[color=blue]
              > such as the double to int cast (40x speed increase by just using SSE2
              > instruction)
              >
              > --
              > David Notario
              > Software Design Engineer - CLR JIT Compiler
              >
              >
              > "Andre" <food_crazy@hot mail.com> wrote in message
              > news:3F2B7FF2.1 010003@hotmail. com...[color=green]
              > > Thanks David,
              > >
              > > There's one thing I need to ask you - were there a fair amount of
              > > features/improvements in ver1.1 of the CLR compared to v1.0? I see that
              > > total numebr of bytes JIT'd are noticeably less in ver1.1 and some
              > > profiling showed that ver1.1 gave better MFLOPs in executing some
              > > benchmarking code. Thanks
              > >
              > > -Andre
              > >
              > > David Notario wrote:[color=darkred]
              > > > We currently use SSE2 for some things like the double to int cast,[/color][/color][/color]
              it's[color=blue]
              > not[color=green][color=darkred]
              > > > used for general codegen though.
              > > >[/color]
              > >[/color]
              >
              >[/color]


              Comment

              • Andre

                #8
                Re: SSE and MMX support in the JIT compiler

                Thanks David,

                David Notario wrote:[color=blue]
                > No, we didn't do much optimization work in the JIT for v1.1, except for some
                > very targetted ones that offered a significant speed boost in exchange for
                > little dev work (1.1 was mainly a security fixes only release for the CLR),
                > such as the double to int cast (40x speed increase by just using SSE2
                > instruction)[/color]

                So does that mean v1.0 didn't use SSE2 at all (and only used SSE?)? I
                guess that's just why I see an increase in the number of FLOPS using v1.1.

                If optimizations are being targetted to a particular platform.. does
                that imply that there are other platforms .NET is being ported to? (I'm
                only aware of Mono and that's on a x86) Does Microsoft plan on porting
                ..NET (or allow others) to Sun or any other platform for instance?

                You mentioned that there have been some improvement in areas other than
                the JIT.. could you name some? I'm trying to write up a report for my
                company to convince them to completely switch to .NET from J2EE/J2SE and
                for that I need to have solid reasoning and give accurate measurements
                to show improvements in CLR v1.1 over v1.0. After a months study I'm
                personally convinced that the CLR will improve (and some very
                interesting features are being added to C# in the next release).. I
                can't seem to find anything documented on the current implementation of
                the CLR and Rotor, for that matter, is simply not worth studying (as the
                optimizing compiler has bee stripped off from it). It would really help
                me if you could shed a little more light on this please. Thanks again
                for your time David,

                -Andre


                Comment

                • Jon Skeet

                  #9
                  Re: SSE and MMX support in the JIT compiler

                  Andre <food_crazy@hot mail.com> wrote:[color=blue][color=green]
                  > > Is this also true for Whidbey, do you know? (And can you say? :)[/color]
                  >
                  > What's Whidbey? (is that the code name for the next version of C#?)[/color]

                  I believe it's the next version of Visual Studio .NET, including the
                  next version of .NET itself, which will in turn support the features of
                  the next version of C# (such as generics).

                  --
                  Jon Skeet - <skeet@pobox.co m>
                  Pobox has been discontinued as a separate service, and all existing customers moved to the Fastmail platform.

                  If replying to the group, please do not mail me too

                  Comment

                  • Andre

                    #10
                    Re: SSE and MMX support in the JIT compiler

                    Jon Skeet wrote:[color=blue]
                    > Andre <food_crazy@hot mail.com> wrote:
                    >[color=green][color=darkred]
                    >>>Is this also true for Whidbey, do you know? (And can you say? :)[/color]
                    >>
                    >>What's Whidbey? (is that the code name for the next version of C#?)[/color]
                    >
                    >
                    > I believe it's the next version of Visual Studio .NET, including the
                    > next version of .NET itself, which will in turn support the features of
                    > the next version of C# (such as generics).
                    >[/color]
                    Ah.. catchy name :) Thanks Jon

                    -Andre

                    Comment

                    • David Notario

                      #11
                      Re: SSE and MMX support in the JIT compiler

                      We've done more perf work in the JIT for out next version than for our
                      previous version, but we still won't be generating SSE2 or MMX code in our
                      codegen.

                      The rationale behind not doing SSE2 was that we didn't have the time to do a
                      vectorizing optimizations. If you use SSE2 for scalar operations, it's not
                      always faster than the equivalent x87 code in 'normal' code (adds and muls
                      have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
                      add is higher, IIRC), plus some operations (casting from doubles to floats
                      or floats to doubles) are quite slow in SSE2 compared to x87. We also have
                      to support processors without SSE2.

                      So, with all these arguments against it, we decided to focus our work on
                      improving our x87 codegen and leaving the door open for an SSE2
                      implementation, instead of putting all our eggs in the SSE2 basket.

                      --
                      David Notario
                      Software Design Engineer - CLR JIT Compiler


                      "Jon Skeet" <skeet@pobox.co m> wrote in message
                      news:MPG.19996b e441f153a098a26 c@news.microsof t.com...[color=blue]
                      > David Notario <dnotario@onlin e.microsoft.com > wrote:[color=green]
                      > > No, we didn't do much optimization work in the JIT for v1.1, except for[/color][/color]
                      some[color=blue][color=green]
                      > > very targetted ones that offered a significant speed boost in exchange[/color][/color]
                      for[color=blue][color=green]
                      > > little dev work (1.1 was mainly a security fixes only release for the[/color][/color]
                      CLR),[color=blue][color=green]
                      > > such as the double to int cast (40x speed increase by just using SSE2
                      > > instruction)[/color]
                      >
                      > Is this also true for Whidbey, do you know? (And can you say? :)
                      >
                      > --
                      > Jon Skeet - <skeet@pobox.co m>
                      > http://www.pobox.com/~skeet/
                      > If replying to the group, please do not mail me too[/color]


                      Comment

                      • Austin Ehlers

                        #12
                        Re: SSE and MMX support in the JIT compiler

                        Hello,
                        Is there any work being done on using specific features of a processor
                        to increase performance? For example, on AMD Athlon XPs, there are 4
                        integer execution pipelines. I can get a 500% decrease in time if I
                        do a loop like this:

                        int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
                        for(x=0;x<nums. Length/4;x+=4)
                        {
                        sums0+=nums[x];
                        sums1+=nums[x+1];
                        sums2+=nums[x+2];
                        sums3+=nums[x+3];
                        }
                        sums=(sums0+sum s1)+(sums2+sums 3);

                        where nums[] is an array of integers. I know this would be hard to
                        implement in the JIT, but isn't one of the (main) ideas behind the JIT
                        is the ability to do run-time optimizations for whatever platform the
                        code is running on?

                        Thanks,
                        Austin Ehlers


                        On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
                        <dnotario@onlin e.microsoft.com > wrote:
                        [color=blue]
                        >We've done more perf work in the JIT for out next version than for our
                        >previous version, but we still won't be generating SSE2 or MMX code in our
                        >codegen.
                        >
                        >The rationale behind not doing SSE2 was that we didn't have the time to do a
                        >vectorizing optimizations. If you use SSE2 for scalar operations, it's not
                        >always faster than the equivalent x87 code in 'normal' code (adds and muls
                        >have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
                        >add is higher, IIRC), plus some operations (casting from doubles to floats
                        >or floats to doubles) are quite slow in SSE2 compared to x87. We also have
                        >to support processors without SSE2.
                        >
                        >So, with all these arguments against it, we decided to focus our work on
                        >improving our x87 codegen and leaving the door open for an SSE2
                        >implementation , instead of putting all our eggs in the SSE2 basket.[/color]

                        Comment

                        • David Notario

                          #13
                          Re: SSE and MMX support in the JIT compiler

                          What's the original code? I think you made a mistake in your unrolling and
                          you are effectively doing 4 times less work (loop condition should be
                          x<nums.Length)

                          We do take advantage of some processor features and generate different code
                          for different processors. We could get better there, though, but we also
                          have a finite amount of time. Also, any processor specifics add a lot of
                          work to our QA process.

                          --
                          David Notario
                          Software Design Engineer - CLR JIT Compiler


                          "Austin Ehlers" <the*remove*bor ed*me*guy16@hot mail.com> wrote in message
                          news:59i3jvknjt naljk39jui7337e u7bibia92@4ax.c om...[color=blue]
                          > Hello,
                          > Is there any work being done on using specific features of a processor
                          > to increase performance? For example, on AMD Athlon XPs, there are 4
                          > integer execution pipelines. I can get a 500% decrease in time if I
                          > do a loop like this:
                          >
                          > int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
                          > for(x=0;x<nums. Length/4;x+=4)
                          > {
                          > sums0+=nums[x];
                          > sums1+=nums[x+1];
                          > sums2+=nums[x+2];
                          > sums3+=nums[x+3];
                          > }
                          > sums=(sums0+sum s1)+(sums2+sums 3);
                          >
                          > where nums[] is an array of integers. I know this would be hard to
                          > implement in the JIT, but isn't one of the (main) ideas behind the JIT
                          > is the ability to do run-time optimizations for whatever platform the
                          > code is running on?
                          >
                          > Thanks,
                          > Austin Ehlers
                          >
                          >
                          > On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
                          > <dnotario@onlin e.microsoft.com > wrote:
                          >[color=green]
                          > >We've done more perf work in the JIT for out next version than for our
                          > >previous version, but we still won't be generating SSE2 or MMX code in[/color][/color]
                          our[color=blue][color=green]
                          > >codegen.
                          > >
                          > >The rationale behind not doing SSE2 was that we didn't have the time to[/color][/color]
                          do a[color=blue][color=green]
                          > >vectorizing optimizations. If you use SSE2 for scalar operations, it's[/color][/color]
                          not[color=blue][color=green]
                          > >always faster than the equivalent x87 code in 'normal' code (adds and[/color][/color]
                          muls[color=blue][color=green]
                          > >have different latencies in SSE2 vs x87 (mul has lower latency in SSE2,[/color][/color]
                          but[color=blue][color=green]
                          > >add is higher, IIRC), plus some operations (casting from doubles to[/color][/color]
                          floats[color=blue][color=green]
                          > >or floats to doubles) are quite slow in SSE2 compared to x87. We also[/color][/color]
                          have[color=blue][color=green]
                          > >to support processors without SSE2.
                          > >
                          > >So, with all these arguments against it, we decided to focus our work on
                          > >improving our x87 codegen and leaving the door open for an SSE2
                          > >implementation , instead of putting all our eggs in the SSE2 basket.[/color]
                          >[/color]


                          Comment

                          • Andre

                            #14
                            Re: SSE and MMX support in the JIT compiler

                            did you mean:

                            for(x=0;x<nums. Length/4;x++)
                            {
                            sum+=nums[x];
                            }

                            for(x=nums.Leng th/4;x<nums.Length ;x+=4)
                            {
                            sums0+=nums[x];
                            sums1+=nums[x+1];
                            sums2+=nums[x+2];
                            sums3+=nums[x+3];
                            }

                            sum = sums0 + sums1 + sums2 + sums03;

                            -Andre

                            Austin Ehlers wrote:[color=blue]
                            > Hello,
                            > Is there any work being done on using specific features of a processor
                            > to increase performance? For example, on AMD Athlon XPs, there are 4
                            > integer execution pipelines. I can get a 500% decrease in time if I
                            > do a loop like this:
                            >
                            > int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
                            > for(x=0;x<nums. Length/4;x+=4)
                            > {
                            > sums0+=nums[x];
                            > sums1+=nums[x+1];
                            > sums2+=nums[x+2];
                            > sums3+=nums[x+3];
                            > }
                            > sums=(sums0+sum s1)+(sums2+sums 3);
                            >
                            > where nums[] is an array of integers. I know this would be hard to
                            > implement in the JIT, but isn't one of the (main) ideas behind the JIT
                            > is the ability to do run-time optimizations for whatever platform the
                            > code is running on?
                            >
                            > Thanks,
                            > Austin Ehlers
                            >
                            >
                            > On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
                            > <dnotario@onlin e.microsoft.com > wrote:
                            >
                            >[color=green]
                            >>We've done more perf work in the JIT for out next version than for our
                            >>previous version, but we still won't be generating SSE2 or MMX code in our
                            >>codegen.
                            >>
                            >>The rationale behind not doing SSE2 was that we didn't have the time to do a
                            >>vectorizing optimizations. If you use SSE2 for scalar operations, it's not
                            >>always faster than the equivalent x87 code in 'normal' code (adds and muls
                            >>have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
                            >>add is higher, IIRC), plus some operations (casting from doubles to floats
                            >>or floats to doubles) are quite slow in SSE2 compared to x87. We also have
                            >>to support processors without SSE2.
                            >>
                            >>So, with all these arguments against it, we decided to focus our work on
                            >>improving our x87 codegen and leaving the door open for an SSE2
                            >>implementatio n, instead of putting all our eggs in the SSE2 basket.[/color]
                            >
                            >[/color]

                            Comment

                            Working...