Performance tuning

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • atlaste

    Performance tuning

    Hi,

    I'm currently developing an application that uses a lot of
    computational power, disk access and memory caching (to be more exact:
    an information retrieval platform). In these kind of applications the
    last thing that remains is bare performance tuning.

    So for example, you can do an 'if then else' on a bit like a 'case/
    switch', an 'if/then/else' and as a multiplication with a static
    buffer. Or, you can do sorting with an inline delegate, with a
    delegate and with a static delegate. Which is best in terms of raw
    speed? Or should I just use reflector, fill the sort in and save time?
    Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
    each? Or rather use it? What about 'yield'? Should I avoid exceptions?
    What are real performance killers? So on...

    Since I don't feel like testing every possible optimization and
    thereby reinventing the wheel I wonder if someone else has already
    made up a document, book or article containing all best practices
    concerning low-level optimizations in C#.

    Your help is appreciated.

    Thanks,
    Stefan.

  • Marc Gravell

    #2
    Re: Performance tuning

    Use a decent profiler and find out. Otherwise you are simply guessing,
    and could be having little effect, or making things worse. In most
    cases, the performance isn't down to micro-optimisations such as
    if/else vs switch, but rather down to bulk algorithms e.g. looping
    over a pair of collections rather than using a dictionary for "exists"
    etc.

    For specifics; I recall reading an article demonstrating that switch
    was *marginally* quicker, but barely by enough margin to warrant the
    risks from an edit... "unsafe" would do very little unless you
    actually use unsafe functionality inside the block. For sorting, the
    interface-based approach (IComparer<T>) /may/ be quicker than
    delegates - but don't trust anybody: rig a test harness and find out,
    or profile your app. As for exceptions - well, my code can screw
    things up very quickly indeed if I don't correctly handle error
    conditions ;-p

    Marc


    Comment

    • Nicholas Paldino [.NET/C# MVP]

      #3
      Re: Performance tuning

      To follow up a little, performance tuning is a phased approach as well.
      You have to look at big picture issues before you dig down into
      micro-optimizations like you are asking about now. Like Marc said, use a
      profiler, see what the bottlenecks in your application are, and then address
      those. If you are performance tuning, then you have a goal in mind for what
      is acceptable for your application in terms of performance (if you don't
      have a goal, then performance tuning for the sake of performance tuning is a
      waste here since you are more than likely going to compromise the
      maintainability and design of the code in the process).

      For switch statements, once you have more than a few cases, the C#
      compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
      beyond) which is used for the switch. For very large switch statements, my
      guess is that this is faster. Also, this should be faster than repeated
      if/then statements as well.

      I agree that unsafe is going to do much for you here.

      IIRC, the speed of invoking delegates was increased dramatically with
      ..NET 2.0. Making calls on interfaces/class instances is always going to be
      faster, but I think that calling delegates might be much closer to the speed
      of calling a member directly. Of course, if you are doing this over a
      massive number of iterations, this small amount can add up.

      Using "yield" is always going to add overhead. When you use "yield" for
      an IEnumerable implementation, the compiler is generating a state machine
      for you in the background, and that's always going to cost more resources
      than doing it yourself.

      Hope this helps.


      --
      - Nicholas Paldino [.NET/C# MVP]
      - mvp@spam.guard. caspershouse.co m

      "Marc Gravell" <marc.gravell@g mail.comwrote in message
      news:ONwUPTjkHH A.4624@TK2MSFTN GP04.phx.gbl...
      Use a decent profiler and find out. Otherwise you are simply guessing, and
      could be having little effect, or making things worse. In most cases, the
      performance isn't down to micro-optimisations such as if/else vs switch,
      but rather down to bulk algorithms e.g. looping over a pair of collections
      rather than using a dictionary for "exists" etc.
      >
      For specifics; I recall reading an article demonstrating that switch was
      *marginally* quicker, but barely by enough margin to warrant the risks
      from an edit... "unsafe" would do very little unless you actually use
      unsafe functionality inside the block. For sorting, the interface-based
      approach (IComparer<T>) /may/ be quicker than delegates - but don't trust
      anybody: rig a test harness and find out, or profile your app. As for
      exceptions - well, my code can screw things up very quickly indeed if I
      don't correctly handle error conditions ;-p
      >
      Marc
      >

      Comment

      • atlaste

        #4
        Re: Performance tuning

        Thanks for the reactions. They do help.

        I do feel like clearing up some things. We work with ANTS profiler
        here, which of course helps determining the bottlenecks. We've already
        implemented heuristics on algorithms to speed things up; even
        implemented a custom compression component to speed up disk writes.
        Tried different algorithmic approaches to matching, determining which
        algorithm is best in which situation by heuristics. But when you're
        doing some algorithmic operations a couple of million times per cpu,
        then it pais off to do micro-optimizations.

        And of course optimizing for the sake of optimizing is of course never
        a good idea... unless you're in the position where you're making
        Google-alike systems where it pays off to have less servers lying
        around (where the trade-off is that the resulting applications
        shouldn't become unstable, which the main reason we're not using C+
        +)... Hmm and well, yes, that's exactly what we're doing. So to put it
        bluntly and in the scope of the large picture: the goal is cost
        reduction. Which in our case specifies the "large picture" quite
        clearly.

        About maintainability : we're talking HELL here, where hell of course
        stands for "Highly Efficient Low Level" code. It's not supposed to be
        maintainable; that's why we have the non-optimized version. If it
        stops working, we'll throw it away and work our way up from the non-
        optimized version, taking along the lessons learned (but that's
        unlikely for these particular pieces of code). I clearly want to point
        out that we're not just optimizing every line of code and acknowledge
        the issues concerning maintainability and design.

        I already see a lot of good information in these reactions. I'm
        particulary interested in large volumes of (floating point) operations
        on arrays (of structs) - like sorting, but also like filtering, array
        resizing, filling these arrays with calculations based on other
        arrays, etc - and ways of dealing with this. I'm mostly using the
        basic types 'long' and 'double' in these structs; where long is used
        because 32-bit isn't large enough. I'm also interested in the overhead
        caused by inheritance; particulary if a combination with generics pose
        additional overhead.

        I find it funny that switch uses a dictionary, since I expected a jump-
        by-value. Perhaps I should dig into IL some more...

        Thanks,
        Stefan.

        On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
        <m...@spam.guar d.caspershouse. comwrote:
        To follow up a little, performance tuning is a phased approach as well.
        You have to look at big picture issues before you dig down into
        micro-optimizations like you are asking about now. Like Marc said, use a
        profiler, see what the bottlenecks in your application are, and then address
        those. If you are performance tuning, then you have a goal in mind for what
        is acceptable for your application in terms of performance (if you don't
        have a goal, then performance tuning for the sake of performance tuning is a
        waste here since you are more than likely going to compromise the
        maintainability and design of the code in the process).
        >
        For switch statements, once you have more than a few cases, the C#
        compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
        beyond) which is used for the switch. For very large switch statements, my
        guess is that this is faster. Also, this should be faster than repeated
        if/then statements as well.
        >
        I agree that unsafe is going to do much for you here.
        >
        IIRC, the speed of invoking delegates was increased dramatically with
        .NET 2.0. Making calls on interfaces/class instances is always going to be
        faster, but I think that calling delegates might be much closer to the speed
        of calling a member directly. Of course, if you are doing this over a
        massive number of iterations, this small amount can add up.
        >
        Using "yield" is always going to add overhead. When you use "yield" for
        an IEnumerable implementation, the compiler is generating a state machine
        for you in the background, and that's always going to cost more resources
        than doing it yourself.
        >
        Hope this helps.
        >
        --
        - Nicholas Paldino [.NET/C# MVP]
        - m...@spam.guard .caspershouse.c om
        >
        "Marc Gravell" <marc.grav...@g mail.comwrote in message
        >
        news:ONwUPTjkHH A.4624@TK2MSFTN GP04.phx.gbl...
        >
        Use a decent profiler and find out. Otherwise you are simply guessing, and
        could be having little effect, or making things worse. In most cases, the
        performance isn't down to micro-optimisations such as if/else vs switch,
        but rather down to bulk algorithms e.g. looping over a pair of collections
        rather than using a dictionary for "exists" etc.
        >
        For specifics; I recall reading an article demonstrating that switch was
        *marginally* quicker, but barely by enough margin to warrant the risks
        from an edit... "unsafe" would do very little unless you actually use
        unsafe functionality inside the block. For sorting, the interface-based
        approach (IComparer<T>) /may/ be quicker than delegates - but don't trust
        anybody: rig a test harness and find out, or profile your app. As for
        exceptions - well, my code can screw things up very quickly indeed if I
        don't correctly handle error conditions ;-p
        >
        Marc

        Comment

        • Nicholas Paldino [.NET/C# MVP]

          #5
          Re: Performance tuning

          If you are using a good number of longs in your structures, have you
          considered a move to a 64 bit processor (if you are not already on one). It
          could help if you are performing a large number of calculations with that
          data type. Of course, you would have to do a great deal of other testing as
          well to make sure that everything else works correctly as well.

          What kind of inheritance overhead are you thinking of? Also, where do
          you think you are being impacted by generics?

          If you are doing a large number of sheer calculations on a primitive
          type, then unsafe could be of use to you here. Also, the work in moving
          elements in an array could possibly be faster. However, could possibly
          introduce other bottlenecks into your app because of fixing the arrays when
          you manipulate them in unsafe code.

          --
          - Nicholas Paldino [.NET/C# MVP]
          - mvp@spam.guard. caspershouse.co m


          "atlaste" <atlaste@gmail. comwrote in message
          news:1178723762 .282719.70250@o 5g2000hsb.googl egroups.com...
          Thanks for the reactions. They do help.
          >
          I do feel like clearing up some things. We work with ANTS profiler
          here, which of course helps determining the bottlenecks. We've already
          implemented heuristics on algorithms to speed things up; even
          implemented a custom compression component to speed up disk writes.
          Tried different algorithmic approaches to matching, determining which
          algorithm is best in which situation by heuristics. But when you're
          doing some algorithmic operations a couple of million times per cpu,
          then it pais off to do micro-optimizations.
          >
          And of course optimizing for the sake of optimizing is of course never
          a good idea... unless you're in the position where you're making
          Google-alike systems where it pays off to have less servers lying
          around (where the trade-off is that the resulting applications
          shouldn't become unstable, which the main reason we're not using C+
          +)... Hmm and well, yes, that's exactly what we're doing. So to put it
          bluntly and in the scope of the large picture: the goal is cost
          reduction. Which in our case specifies the "large picture" quite
          clearly.
          >
          About maintainability : we're talking HELL here, where hell of course
          stands for "Highly Efficient Low Level" code. It's not supposed to be
          maintainable; that's why we have the non-optimized version. If it
          stops working, we'll throw it away and work our way up from the non-
          optimized version, taking along the lessons learned (but that's
          unlikely for these particular pieces of code). I clearly want to point
          out that we're not just optimizing every line of code and acknowledge
          the issues concerning maintainability and design.
          >
          I already see a lot of good information in these reactions. I'm
          particulary interested in large volumes of (floating point) operations
          on arrays (of structs) - like sorting, but also like filtering, array
          resizing, filling these arrays with calculations based on other
          arrays, etc - and ways of dealing with this. I'm mostly using the
          basic types 'long' and 'double' in these structs; where long is used
          because 32-bit isn't large enough. I'm also interested in the overhead
          caused by inheritance; particulary if a combination with generics pose
          additional overhead.
          >
          I find it funny that switch uses a dictionary, since I expected a jump-
          by-value. Perhaps I should dig into IL some more...
          >
          Thanks,
          Stefan.
          >
          On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
          <m...@spam.guar d.caspershouse. comwrote:
          > To follow up a little, performance tuning is a phased approach as
          >well.
          >You have to look at big picture issues before you dig down into
          >micro-optimizations like you are asking about now. Like Marc said, use a
          >profiler, see what the bottlenecks in your application are, and then
          >address
          >those. If you are performance tuning, then you have a goal in mind for
          >what
          >is acceptable for your application in terms of performance (if you don't
          >have a goal, then performance tuning for the sake of performance tuning
          >is a
          >waste here since you are more than likely going to compromise the
          >maintainabilit y and design of the code in the process).
          >>
          > For switch statements, once you have more than a few cases, the C#
          >compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
          >beyond) which is used for the switch. For very large switch statements,
          >my
          >guess is that this is faster. Also, this should be faster than repeated
          >if/then statements as well.
          >>
          > I agree that unsafe is going to do much for you here.
          >>
          > IIRC, the speed of invoking delegates was increased dramatically with
          >.NET 2.0. Making calls on interfaces/class instances is always going to
          >be
          >faster, but I think that calling delegates might be much closer to the
          >speed
          >of calling a member directly. Of course, if you are doing this over a
          >massive number of iterations, this small amount can add up.
          >>
          > Using "yield" is always going to add overhead. When you use "yield"
          >for
          >an IEnumerable implementation, the compiler is generating a state machine
          >for you in the background, and that's always going to cost more resources
          >than doing it yourself.
          >>
          > Hope this helps.
          >>
          >--
          > - Nicholas Paldino [.NET/C# MVP]
          > - m...@spam.guard .caspershouse.c om
          >>
          >"Marc Gravell" <marc.grav...@g mail.comwrote in message
          >>
          >news:ONwUPTjkH HA.4624@TK2MSFT NGP04.phx.gbl.. .
          >>
          Use a decent profiler and find out. Otherwise you are simply guessing,
          and
          could be having little effect, or making things worse. In most cases,
          the
          performance isn't down to micro-optimisations such as if/else vs
          switch,
          but rather down to bulk algorithms e.g. looping over a pair of
          collections
          rather than using a dictionary for "exists" etc.
          >>
          For specifics; I recall reading an article demonstrating that switch
          was
          *marginally* quicker, but barely by enough margin to warrant the risks
          from an edit... "unsafe" would do very little unless you actually use
          unsafe functionality inside the block. For sorting, the interface-based
          approach (IComparer<T>) /may/ be quicker than delegates - but don't
          trust
          anybody: rig a test harness and find out, or profile your app. As for
          exceptions - well, my code can screw things up very quickly indeed if I
          don't correctly handle error conditions ;-p
          >>
          Marc
          >
          >

          Comment

          • David Madden

            #6
            Re: Performance tuning

            One thing that I like to do before hitting disk access and network access
            tuning is to check my variables. Make sure that variables can be the
            minimum they can be. If you can use int16 for elements in an array then
            that might help if you have some intensity in that area. Make sure you
            would never have to go beyond int16, of course.

            Variable checking is something that people forget and can be a cause of
            performance issues when reading data into the program for processing. I
            cannot recall the name of the program at this time but there are tools out
            there that help analyze memory allocation and usage of an application when
            it its launched.

            Also check for objects that might be in memory that did not need to be at
            the time. I hope this helps as a start. It is something that I usually try
            to tackle since my apps run on PCs that are not always up to my standards.

            "atlaste" <atlaste@gmail. comwrote in message
            news:1178712571 .866697.112920@ e51g2000hsg.goo glegroups.com.. .
            Hi,
            >
            I'm currently developing an application that uses a lot of
            computational power, disk access and memory caching (to be more exact:
            an information retrieval platform). In these kind of applications the
            last thing that remains is bare performance tuning.
            >
            So for example, you can do an 'if then else' on a bit like a 'case/
            switch', an 'if/then/else' and as a multiplication with a static
            buffer. Or, you can do sorting with an inline delegate, with a
            delegate and with a static delegate. Which is best in terms of raw
            speed? Or should I just use reflector, fill the sort in and save time?
            Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
            each? Or rather use it? What about 'yield'? Should I avoid exceptions?
            What are real performance killers? So on...
            >
            Since I don't feel like testing every possible optimization and
            thereby reinventing the wheel I wonder if someone else has already
            made up a document, book or article containing all best practices
            concerning low-level optimizations in C#.
            >
            Your help is appreciated.
            >
            Thanks,
            Stefan.
            >

            Comment

            • atlaste

              #7
              Re: Performance tuning

              David,

              You're absolutely right. Since disk is a factor 1K slower than memory
              it helps to compact things. Books like "Managing Gigabytes" tell this
              story for the field of information retrieval in great detail. Which is
              why we apply gamma encoding and some variant of arithmetic coding to
              the storing process.

              Furthermore I set variables to null and use 'using' so on because I
              found this to help the GC. And structs rather than classes if
              appropriate.

              Useful suggestions. Have more? :-)

              Cheers,
              Stefan.

              On May 10, 4:29 am, "David Madden" <dmadd...@my.de vry.eduwrote:
              One thing that I like to do before hitting disk access and network access
              tuning is to check my variables. Make sure that variables can be the
              minimum they can be. If you can use int16 for elements in an array then
              that might help if you have some intensity in that area. Make sure you
              would never have to go beyond int16, of course.
              >
              Variable checking is something that people forget and can be a cause of
              performance issues when reading data into the program for processing. I
              cannot recall the name of the program at this time but there are tools out
              there that help analyze memory allocation and usage of an application when
              it its launched.
              >
              Also check for objects that might be in memory that did not need to be at
              the time. I hope this helps as a start. It is something that I usually try
              to tackle since my apps run on PCs that are not always up to my standards.
              >
              "atlaste" <atla...@gmail. comwrote in message
              >
              news:1178712571 .866697.112920@ e51g2000hsg.goo glegroups.com.. .
              >
              Hi,
              >
              I'm currently developing an application that uses a lot of
              computational power, disk access and memory caching (to be more exact:
              an information retrieval platform). In these kind of applications the
              last thing that remains is bare performance tuning.
              >
              So for example, you can do an 'if then else' on a bit like a 'case/
              switch', an 'if/then/else' and as a multiplication with a static
              buffer. Or, you can do sorting with an inline delegate, with a
              delegate and with a static delegate. Which is best in terms of raw
              speed? Or should I just use reflector, fill the sort in and save time?
              Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
              each? Or rather use it? What about 'yield'? Should I avoid exceptions?
              What are real performance killers? So on...
              >
              Since I don't feel like testing every possible optimization and
              thereby reinventing the wheel I wonder if someone else has already
              made up a document, book or article containing all best practices
              concerning low-level optimizations in C#.
              >
              Your help is appreciated.
              >
              Thanks,
              Stefan.

              Comment

              • atlaste

                #8
                Re: Performance tuning

                Hmm good point. To be honest: no I haven't thought about that.

                About generics / inheritance. If you're calling a method in a class,
                you're using a virtual table. So that makes an overhead. If you're
                creating a generic (with an where T: interface) and call the methods
                in the generic, you're solving the problem by aggregation, but don't
                really need the virtual table. That is: unless the generics actually
                use a vtable to resolve the methods in the interface to the type.

                And then there's the || class Foo<T: Bar<Twhere T:interface ||
                construct. I'm interested what this generates in terms of vtable and
                so on.

                I am using a sheer amount of calculations on structs with basic types.

                Cheers,
                Stefan.

                On May 9, 6:24 pm, "Nicholas Paldino [.NET/C# MVP]"
                <m...@spam.guar d.caspershouse. comwrote:
                If you are using a good number of longs in your structures, have you
                considered a move to a 64 bit processor (if you are not already on one). It
                could help if you are performing a large number of calculations with that
                data type. Of course, you would have to do a great deal of other testing as
                well to make sure that everything else works correctly as well.
                >
                What kind of inheritance overhead are you thinking of? Also, where do
                you think you are being impacted by generics?
                >
                If you are doing a large number of sheer calculations on a primitive
                type, then unsafe could be of use to you here. Also, the work in moving
                elements in an array could possibly be faster. However, could possibly
                introduce other bottlenecks into your app because of fixing the arrays when
                you manipulate them in unsafe code.
                >
                --
                - Nicholas Paldino [.NET/C# MVP]
                - m...@spam.guard .caspershouse.c om
                >
                "atlaste" <atla...@gmail. comwrote in message
                >
                news:1178723762 .282719.70250@o 5g2000hsb.googl egroups.com...
                >
                Thanks for the reactions. They do help.
                >
                I do feel like clearing up some things. We work with ANTS profiler
                here, which of course helps determining the bottlenecks. We've already
                implemented heuristics on algorithms to speed things up; even
                implemented a custom compression component to speed up disk writes.
                Tried different algorithmic approaches to matching, determining which
                algorithm is best in which situation by heuristics. But when you're
                doing some algorithmic operations a couple of million times per cpu,
                then it pais off to do micro-optimizations.
                >
                And of course optimizing for the sake of optimizing is of course never
                a good idea... unless you're in the position where you're making
                Google-alike systems where it pays off to have less servers lying
                around (where the trade-off is that the resulting applications
                shouldn't become unstable, which the main reason we're not using C+
                +)... Hmm and well, yes, that's exactly what we're doing. So to put it
                bluntly and in the scope of the large picture: the goal is cost
                reduction. Which in our case specifies the "large picture" quite
                clearly.
                >
                About maintainability : we're talking HELL here, where hell of course
                stands for "Highly Efficient Low Level" code. It's not supposed to be
                maintainable; that's why we have the non-optimized version. If it
                stops working, we'll throw it away and work our way up from the non-
                optimized version, taking along the lessons learned (but that's
                unlikely for these particular pieces of code). I clearly want to point
                out that we're not just optimizing every line of code and acknowledge
                the issues concerning maintainability and design.
                >
                I already see a lot of good information in these reactions. I'm
                particulary interested in large volumes of (floating point) operations
                on arrays (of structs) - like sorting, but also like filtering, array
                resizing, filling these arrays with calculations based on other
                arrays, etc - and ways of dealing with this. I'm mostly using the
                basic types 'long' and 'double' in these structs; where long is used
                because 32-bit isn't large enough. I'm also interested in the overhead
                caused by inheritance; particulary if a combination with generics pose
                additional overhead.
                >
                I find it funny that switch uses a dictionary, since I expected a jump-
                by-value. Perhaps I should dig into IL some more...
                >
                Thanks,
                Stefan.
                >
                On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
                <m...@spam.guar d.caspershouse. comwrote:
                To follow up a little, performance tuning is a phased approach as
                well.
                You have to look at big picture issues before you dig down into
                micro-optimizations like you are asking about now. Like Marc said, use a
                profiler, see what the bottlenecks in your application are, and then
                address
                those. If you are performance tuning, then you have a goal in mind for
                what
                is acceptable for your application in terms of performance (if you don't
                have a goal, then performance tuning for the sake of performance tuning
                is a
                waste here since you are more than likely going to compromise the
                maintainability and design of the code in the process).
                >
                For switch statements, once you have more than a few cases, the C#
                compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
                beyond) which is used for the switch. For very large switch statements,
                my
                guess is that this is faster. Also, this should be faster than repeated
                if/then statements as well.
                >
                I agree that unsafe is going to do much for you here.
                >
                IIRC, the speed of invoking delegates was increased dramatically with
                .NET 2.0. Making calls on interfaces/class instances is always going to
                be
                faster, but I think that calling delegates might be much closer to the
                speed
                of calling a member directly. Of course, if you are doing this over a
                massive number of iterations, this small amount can add up.
                >
                Using "yield" is always going to add overhead. When you use "yield"
                for
                an IEnumerable implementation, the compiler is generating a state machine
                for you in the background, and that's always going to cost more resources
                than doing it yourself.
                >
                Hope this helps.
                >
                --
                - Nicholas Paldino [.NET/C# MVP]
                - m...@spam.guard .caspershouse.c om
                >
                "Marc Gravell" <marc.grav...@g mail.comwrote in message
                >
                >news:ONwUPTjkH HA.4624@TK2MSFT NGP04.phx.gbl.. .
                >
                Use a decent profiler and find out. Otherwise you are simply guessing,
                and
                could be having little effect, or making things worse. In most cases,
                the
                performance isn't down to micro-optimisations such as if/else vs
                switch,
                but rather down to bulk algorithms e.g. looping over a pair of
                collections
                rather than using a dictionary for "exists" etc.
                >
                For specifics; I recall reading an article demonstrating that switch
                was
                *marginally* quicker, but barely by enough margin to warrant the risks
                from an edit... "unsafe" would do very little unless you actually use
                unsafe functionality inside the block. For sorting, the interface-based
                approach (IComparer<T>) /may/ be quicker than delegates - but don't
                trust
                anybody: rig a test harness and find out, or profile your app. As for
                exceptions - well, my code can screw things up very quickly indeed if I
                don't correctly handle error conditions ;-p
                >
                Marc

                Comment

                • Nicholas Paldino [.NET/C# MVP]

                  #9
                  Re: Performance tuning

                  Stefan,

                  Well, if you are using a method in any class, derived or not (and
                  everything derives from object except object, so technically, everything is
                  derived), yes, you are using a virtual table.

                  This is definitely a micro-optimization. Also, you don't know if your
                  methods are being inlined or not. If you have a good deal of smaller
                  methods that really arent doing anything that are being run repeatedly in a
                  loop, then I would manually move that code into the loop if you are
                  concerned with the lookup overhead of the virtual method table.

                  However, if the method is not marked as virtual, it is possible, if the
                  method is small enough, that the JIT compiler is already doing this for you
                  (assuming that the method is not marked with the MethodImpl attribute with
                  the MethodImplOptio ns.NoInlining value passed to it), so you might not make
                  any gains here (I believe that the body of the method has to be less than 32
                  bytes in order to be a candidtate for inlining, you will have to
                  double-check that).

                  With parameterized types (generic classes), IIRC, each type is going to
                  have it's own type handle, and it's own virtual method table, so it's like
                  using a different class, you still have the virtual method table to contend
                  with.

                  With parameterized methods (generic methods), I believe (and I could
                  very well be wrong here) that it has a separate entry in the method table
                  for each combination of type parameters passed to the generic method based
                  on compatability. All reference types are considered equal, while different
                  value types are treated differently for the sake of comparison. Then, the
                  list of types (with previous rules applied) is compared when looking in the
                  method table.

                  Since you are doing a great amount of calcs on your structures, you
                  might want to try using unsafe code, as you can do direct memory management
                  (this is helpful if you want to swap out sections of memory) and you can
                  also try and turn off bounds checking for arrays in unsafe code if you are
                  sure you won't violate an array (buffer overruns are still checked for, and
                  I believe the CLR will tear down the process if you do overrun a buffer).
                  You can also look into stackalloc if you don't want to allocate memory on
                  the heap as well.


                  --
                  - Nicholas Paldino [.NET/C# MVP]
                  - mvp@spam.guard. caspershouse.co m


                  "atlaste" <atlaste@gmail. comwrote in message
                  news:1178779684 .671096.216670@ e65g2000hsc.goo glegroups.com.. .
                  Hmm good point. To be honest: no I haven't thought about that.
                  >
                  About generics / inheritance. If you're calling a method in a class,
                  you're using a virtual table. So that makes an overhead. If you're
                  creating a generic (with an where T: interface) and call the methods
                  in the generic, you're solving the problem by aggregation, but don't
                  really need the virtual table. That is: unless the generics actually
                  use a vtable to resolve the methods in the interface to the type.
                  >
                  And then there's the || class Foo<T: Bar<Twhere T:interface ||
                  construct. I'm interested what this generates in terms of vtable and
                  so on.
                  >
                  I am using a sheer amount of calculations on structs with basic types.
                  >
                  Cheers,
                  Stefan.
                  >
                  On May 9, 6:24 pm, "Nicholas Paldino [.NET/C# MVP]"
                  <m...@spam.guar d.caspershouse. comwrote:
                  > If you are using a good number of longs in your structures, have you
                  >considered a move to a 64 bit processor (if you are not already on one).
                  >It
                  >could help if you are performing a large number of calculations with that
                  >data type. Of course, you would have to do a great deal of other testing
                  >as
                  >well to make sure that everything else works correctly as well.
                  >>
                  > What kind of inheritance overhead are you thinking of? Also, where
                  >do
                  >you think you are being impacted by generics?
                  >>
                  > If you are doing a large number of sheer calculations on a primitive
                  >type, then unsafe could be of use to you here. Also, the work in moving
                  >elements in an array could possibly be faster. However, could possibly
                  >introduce other bottlenecks into your app because of fixing the arrays
                  >when
                  >you manipulate them in unsafe code.
                  >>
                  >--
                  > - Nicholas Paldino [.NET/C# MVP]
                  > - m...@spam.guard .caspershouse.c om
                  >>
                  >"atlaste" <atla...@gmail. comwrote in message
                  >>
                  >news:117872376 2.282719.70250@ o5g2000hsb.goog legroups.com...
                  >>
                  Thanks for the reactions. They do help.
                  >>
                  I do feel like clearing up some things. We work with ANTS profiler
                  here, which of course helps determining the bottlenecks. We've already
                  implemented heuristics on algorithms to speed things up; even
                  implemented a custom compression component to speed up disk writes.
                  Tried different algorithmic approaches to matching, determining which
                  algorithm is best in which situation by heuristics. But when you're
                  doing some algorithmic operations a couple of million times per cpu,
                  then it pais off to do micro-optimizations.
                  >>
                  And of course optimizing for the sake of optimizing is of course never
                  a good idea... unless you're in the position where you're making
                  Google-alike systems where it pays off to have less servers lying
                  around (where the trade-off is that the resulting applications
                  shouldn't become unstable, which the main reason we're not using C+
                  +)... Hmm and well, yes, that's exactly what we're doing. So to put it
                  bluntly and in the scope of the large picture: the goal is cost
                  reduction. Which in our case specifies the "large picture" quite
                  clearly.
                  >>
                  About maintainability : we're talking HELL here, where hell of course
                  stands for "Highly Efficient Low Level" code. It's not supposed to be
                  maintainable; that's why we have the non-optimized version. If it
                  stops working, we'll throw it away and work our way up from the non-
                  optimized version, taking along the lessons learned (but that's
                  unlikely for these particular pieces of code). I clearly want to point
                  out that we're not just optimizing every line of code and acknowledge
                  the issues concerning maintainability and design.
                  >>
                  I already see a lot of good information in these reactions. I'm
                  particulary interested in large volumes of (floating point) operations
                  on arrays (of structs) - like sorting, but also like filtering, array
                  resizing, filling these arrays with calculations based on other
                  arrays, etc - and ways of dealing with this. I'm mostly using the
                  basic types 'long' and 'double' in these structs; where long is used
                  because 32-bit isn't large enough. I'm also interested in the overhead
                  caused by inheritance; particulary if a combination with generics pose
                  additional overhead.
                  >>
                  I find it funny that switch uses a dictionary, since I expected a jump-
                  by-value. Perhaps I should dig into IL some more...
                  >>
                  Thanks,
                  Stefan.
                  >>
                  On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
                  <m...@spam.guar d.caspershouse. comwrote:
                  > To follow up a little, performance tuning is a phased approach as
                  >well.
                  >You have to look at big picture issues before you dig down into
                  >micro-optimizations like you are asking about now. Like Marc said,
                  >use a
                  >profiler, see what the bottlenecks in your application are, and then
                  >address
                  >those. If you are performance tuning, then you have a goal in mind
                  >for
                  >what
                  >is acceptable for your application in terms of performance (if you
                  >don't
                  >have a goal, then performance tuning for the sake of performance
                  >tuning
                  >is a
                  >waste here since you are more than likely going to compromise the
                  >maintainabilit y and design of the code in the process).
                  >>
                  > For switch statements, once you have more than a few cases, the C#
                  >compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0
                  >and
                  >beyond) which is used for the switch. For very large switch
                  >statements,
                  >my
                  >guess is that this is faster. Also, this should be faster than
                  >repeated
                  >if/then statements as well.
                  >>
                  > I agree that unsafe is going to do much for you here.
                  >>
                  > IIRC, the speed of invoking delegates was increased dramatically
                  >with
                  >.NET 2.0. Making calls on interfaces/class instances is always going
                  >to
                  >be
                  >faster, but I think that calling delegates might be much closer to the
                  >speed
                  >of calling a member directly. Of course, if you are doing this over a
                  >massive number of iterations, this small amount can add up.
                  >>
                  > Using "yield" is always going to add overhead. When you use
                  >"yield"
                  >for
                  >an IEnumerable implementation, the compiler is generating a state
                  >machine
                  >for you in the background, and that's always going to cost more
                  >resources
                  >than doing it yourself.
                  >>
                  > Hope this helps.
                  >>
                  >--
                  > - Nicholas Paldino [.NET/C# MVP]
                  > - m...@spam.guard .caspershouse.c om
                  >>
                  >"Marc Gravell" <marc.grav...@g mail.comwrote in message
                  >>
                  >>news:ONwUPTjk HHA.4624@TK2MSF TNGP04.phx.gbl. ..
                  >>
                  Use a decent profiler and find out. Otherwise you are simply
                  guessing,
                  and
                  could be having little effect, or making things worse. In most
                  cases,
                  the
                  performance isn't down to micro-optimisations such as if/else vs
                  switch,
                  but rather down to bulk algorithms e.g. looping over a pair of
                  collections
                  rather than using a dictionary for "exists" etc.
                  >>
                  For specifics; I recall reading an article demonstrating that switch
                  was
                  *marginally* quicker, but barely by enough margin to warrant the
                  risks
                  from an edit... "unsafe" would do very little unless you actually
                  use
                  unsafe functionality inside the block. For sorting, the
                  interface-based
                  approach (IComparer<T>) /may/ be quicker than delegates - but don't
                  trust
                  anybody: rig a test harness and find out, or profile your app. As
                  for
                  exceptions - well, my code can screw things up very quickly indeed
                  if I
                  don't correctly handle error conditions ;-p
                  >>
                  Marc
                  >
                  >

                  Comment

                  • atlaste

                    #10
                    Re: Performance tuning

                    Thanks, this was more or less the information I am looking for.

                    Cheers,
                    Stefan.

                    On May 10, 5:35 pm, "Nicholas Paldino [.NET/C# MVP]"
                    <m...@spam.guar d.caspershouse. comwrote:
                    Stefan,
                    >
                    Well, if you are using a method in any class, derived or not (and
                    everything derives from object except object, so technically, everything is
                    derived), yes, you are using a virtual table.
                    >
                    This is definitely a micro-optimization. Also, you don't know if your
                    methods are being inlined or not. If you have a good deal of smaller
                    methods that really arent doing anything that are being run repeatedly ina
                    loop, then I would manually move that code into the loop if you are
                    concerned with the lookup overhead of the virtual method table.
                    >
                    However, if the method is not marked as virtual, it is possible, if the
                    method is small enough, that the JIT compiler is already doing this for you
                    (assuming that the method is not marked with the MethodImpl attribute with
                    the MethodImplOptio ns.NoInlining value passed to it), so you might not make
                    any gains here (I believe that the body of the method has to be less than32
                    bytes in order to be a candidtate for inlining, you will have to
                    double-check that).
                    >
                    With parameterized types (generic classes), IIRC, each type is going to
                    have it's own type handle, and it's own virtual method table, so it's like
                    using a different class, you still have the virtual method table to contend
                    with.
                    >
                    With parameterized methods (generic methods), I believe (and I could
                    very well be wrong here) that it has a separate entry in the method table
                    for each combination of type parameters passed to the generic method based
                    on compatability. All reference types are considered equal, while different
                    value types are treated differently for the sake of comparison. Then, the
                    list of types (with previous rules applied) is compared when looking in the
                    method table.
                    >
                    Since you are doing a great amount of calcs on your structures, you
                    might want to try using unsafe code, as you can do direct memory management
                    (this is helpful if you want to swap out sections of memory) and you can
                    also try and turn off bounds checking for arrays in unsafe code if you are
                    sure you won't violate an array (buffer overruns are still checked for, and
                    I believe the CLR will tear down the process if you do overrun a buffer).
                    You can also look into stackalloc if you don't want to allocate memory on
                    the heap as well.
                    >
                    --
                    - Nicholas Paldino [.NET/C# MVP]
                    - m...@spam.guard .caspershouse.c om
                    >
                    "atlaste" <atla...@gmail. comwrote in message
                    >
                    news:1178779684 .671096.216670@ e65g2000hsc.goo glegroups.com.. .
                    >
                    Hmm good point. To be honest: no I haven't thought about that.
                    >
                    About generics / inheritance. If you're calling a method in a class,
                    you're using a virtual table. So that makes an overhead. If you're
                    creating a generic (with an where T: interface) and call the methods
                    in the generic, you're solving the problem by aggregation, but don't
                    really need the virtual table. That is: unless the generics actually
                    use a vtable to resolve the methods in the interface to the type.
                    >
                    And then there's the || class Foo<T: Bar<Twhere T:interface ||
                    construct. I'm interested what this generates in terms of vtable and
                    so on.
                    >
                    I am using a sheer amount of calculations on structs with basic types.
                    >
                    Cheers,
                    Stefan.
                    >
                    On May 9, 6:24 pm, "Nicholas Paldino [.NET/C# MVP]"
                    <m...@spam.guar d.caspershouse. comwrote:
                    If you are using a good number of longs in your structures, have you
                    considered a move to a 64 bit processor (if you are not already on one).
                    It
                    could help if you are performing a large number of calculations with that
                    data type. Of course, you would have to do a great deal of other testing
                    as
                    well to make sure that everything else works correctly as well.
                    >
                    What kind of inheritance overhead are you thinking of? Also, where
                    do
                    you think you are being impacted by generics?
                    >
                    If you are doing a large number of sheer calculations on a primitive
                    type, then unsafe could be of use to you here. Also, the work in moving
                    elements in an array could possibly be faster. However, could possibly
                    introduce other bottlenecks into your app because of fixing the arrays
                    when
                    you manipulate them in unsafe code.
                    >
                    --
                    - Nicholas Paldino [.NET/C# MVP]
                    - m...@spam.guard .caspershouse.c om
                    >
                    "atlaste" <atla...@gmail. comwrote in message
                    >
                    >news:117872376 2.282719.70250@ o5g2000hsb.goog legroups.com...
                    >
                    Thanks for the reactions. They do help.
                    >
                    I do feel like clearing up some things. We work with ANTS profiler
                    here, which of course helps determining the bottlenecks. We've already
                    implemented heuristics on algorithms to speed things up; even
                    implemented a custom compression component to speed up disk writes.
                    Tried different algorithmic approaches to matching, determining which
                    algorithm is best in which situation by heuristics. But when you're
                    doing some algorithmic operations a couple of million times per cpu,
                    then it pais off to do micro-optimizations.
                    >
                    And of course optimizing for the sake of optimizing is of course never
                    a good idea... unless you're in the position where you're making
                    Google-alike systems where it pays off to have less servers lying
                    around (where the trade-off is that the resulting applications
                    shouldn't become unstable, which the main reason we're not using C+
                    +)... Hmm and well, yes, that's exactly what we're doing. So to put it
                    bluntly and in the scope of the large picture: the goal is cost
                    reduction. Which in our case specifies the "large picture" quite
                    clearly.
                    >
                    About maintainability : we're talking HELL here, where hell of course
                    stands for "Highly Efficient Low Level" code. It's not supposed to be
                    maintainable; that's why we have the non-optimized version. If it
                    stops working, we'll throw it away and work our way up from the non-
                    optimized version, taking along the lessons learned (but that's
                    unlikely for these particular pieces of code). I clearly want to point
                    out that we're not just optimizing every line of code and acknowledge
                    the issues concerning maintainability and design.
                    >
                    I already see a lot of good information in these reactions. I'm
                    particulary interested in large volumes of (floating point) operations
                    on arrays (of structs) - like sorting, but also like filtering, array
                    resizing, filling these arrays with calculations based on other
                    arrays, etc - and ways of dealing with this. I'm mostly using the
                    basic types 'long' and 'double' in these structs; where long is used
                    because 32-bit isn't large enough. I'm also interested in the overhead
                    caused by inheritance; particulary if a combination with generics pose
                    additional overhead.
                    >
                    I find it funny that switch uses a dictionary, since I expected a jump-
                    by-value. Perhaps I should dig into IL some more...
                    >
                    Thanks,
                    Stefan.
                    >
                    On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
                    <m...@spam.guar d.caspershouse. comwrote:
                    To follow up a little, performance tuning is a phased approach as
                    well.
                    You have to look at big picture issues before you dig down into
                    micro-optimizations like you are asking about now. Like Marc said,
                    use a
                    profiler, see what the bottlenecks in your application are, and then
                    address
                    those. If you are performance tuning, then you have a goal in mind
                    for
                    what
                    is acceptable for your application in terms of performance (if you
                    don't
                    have a goal, then performance tuning for the sake of performance
                    tuning
                    is a
                    waste here since you are more than likely going to compromise the
                    maintainability and design of the code in the process).
                    >
                    For switch statements, once you have more than a few cases, theC#
                    compiler turns it into a Hashtable (possibly a Dictionary in .NET 2..0
                    and
                    beyond) which is used for the switch. For very large switch
                    statements,
                    my
                    guess is that this is faster. Also, this should be faster than
                    repeated
                    if/then statements as well.
                    >
                    I agree that unsafe is going to do much for you here.
                    >
                    IIRC, the speed of invoking delegates was increased dramatically
                    with
                    .NET 2.0. Making calls on interfaces/class instances is always going
                    to
                    be
                    faster, but I think that calling delegates might be much closer to the
                    speed
                    of calling a member directly. Of course, if you are doing this over a
                    massive number of iterations, this small amount can add up.
                    >
                    Using "yield" is always going to add overhead. When you use
                    "yield"
                    for
                    an IEnumerable implementation, the compiler is generating a state
                    machine
                    for you in the background, and that's always going to cost more
                    resources
                    than doing it yourself.
                    >
                    Hope this helps.
                    >
                    --
                    - Nicholas Paldino [.NET/C# MVP]
                    - m...@spam.guard .caspershouse.c om
                    >
                    "Marc Gravell" <marc.grav...@g mail.comwrote in message
                    >
                    >news:ONwUPTjkH HA.4624@TK2MSFT NGP04.phx.gbl.. .
                    >
                    Use a decent profiler and find out. Otherwise you are simply
                    guessing,
                    and
                    could be having little effect, or making things worse. In most
                    cases,
                    the
                    performance isn't down to micro-optimisations such as if/else vs
                    switch,
                    but rather down to bulk algorithms e.g. looping over a pair of
                    collections
                    rather than using a dictionary for "exists" etc.
                    >
                    For specifics; I recall reading an article demonstrating that switch
                    was
                    *marginally* quicker, but barely by enough margin to warrant the
                    risks
                    from an edit... "unsafe" would do very little unless you actually
                    use
                    unsafe functionality
                    >
                    ...
                    >
                    read more ยป

                    Comment

                    • David Madden

                      #11
                      Re: Performance tuning

                      Also take into consideration the PCs that the software will be running on.
                      You might be able to change the amount of times that you write out data to
                      improve system performance.

                      "David Madden" <dmadden2@my.de vry.eduwrote in message
                      news:480E94E8-E253-4AD1-8040-987F460F6C00@mi crosoft.com...
                      One thing that I like to do before hitting disk access and network access
                      tuning is to check my variables. Make sure that variables can be the
                      minimum they can be. If you can use int16 for elements in an array then
                      that might help if you have some intensity in that area. Make sure you
                      would never have to go beyond int16, of course.
                      >
                      Variable checking is something that people forget and can be a cause of
                      performance issues when reading data into the program for processing. I
                      cannot recall the name of the program at this time but there are tools out
                      there that help analyze memory allocation and usage of an application when
                      it its launched.
                      >
                      Also check for objects that might be in memory that did not need to be at
                      the time. I hope this helps as a start. It is something that I usually
                      try to tackle since my apps run on PCs that are not always up to my
                      standards.
                      >
                      "atlaste" <atlaste@gmail. comwrote in message
                      news:1178712571 .866697.112920@ e51g2000hsg.goo glegroups.com.. .
                      >Hi,
                      >>
                      >I'm currently developing an application that uses a lot of
                      >computationa l power, disk access and memory caching (to be more exact:
                      >an information retrieval platform). In these kind of applications the
                      >last thing that remains is bare performance tuning.
                      >>
                      >So for example, you can do an 'if then else' on a bit like a 'case/
                      >switch', an 'if/then/else' and as a multiplication with a static
                      >buffer. Or, you can do sorting with an inline delegate, with a
                      >delegate and with a static delegate. Which is best in terms of raw
                      >speed? Or should I just use reflector, fill the sort in and save time?
                      >Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
                      >each? Or rather use it? What about 'yield'? Should I avoid exceptions?
                      >What are real performance killers? So on...
                      >>
                      >Since I don't feel like testing every possible optimization and
                      >thereby reinventing the wheel I wonder if someone else has already
                      >made up a document, book or article containing all best practices
                      >concerning low-level optimizations in C#.
                      >>
                      >Your help is appreciated.
                      >>
                      >Thanks,
                      >Stefan.
                      >>
                      >

                      Comment

                      • ThunderMusic

                        #12
                        Re: Performance tuning

                        Hi,
                        Here we have done a bit of optimizing on a raw data export to csv file and
                        made a 91% time optimization on a process that took about 35 minutes before
                        we optimize and now takes about 3 minutes. The only thing we've done is make
                        our own ToString methods for DateTime, int and decimal, which is really
                        simple once you know how (there are many exemples on the net for this kind
                        of method). If you don't care about localization and use a lot of ToString,
                        maybe it could be something to begin with.

                        Working with arrays (<Type>[] myVar = new <Type>[<Count>];) is way much
                        faster using collection objects, even generics. It's a bit more trouble to
                        manage, but you then you can optimize manipulation exactly to your needs.

                        Throwing a lot of exceptions will have a big influence in debug mode, but
                        will tend to become less important when building in release mode.

                        With the results we got from our tests, it seems foreach is usually slower
                        than the normal for loop mecanism, but is sometimes faster (for some reason)
                        in some situation. So I guess it's just a matter of testing both and finding
                        some pattern for when it's faster or slower.

                        I hope it helps

                        ThunderMusic


                        "atlaste" <atlaste@gmail. comwrote in message
                        news:1178712571 .866697.112920@ e51g2000hsg.goo glegroups.com.. .
                        Hi,
                        >
                        I'm currently developing an application that uses a lot of
                        computational power, disk access and memory caching (to be more exact:
                        an information retrieval platform). In these kind of applications the
                        last thing that remains is bare performance tuning.
                        >
                        So for example, you can do an 'if then else' on a bit like a 'case/
                        switch', an 'if/then/else' and as a multiplication with a static
                        buffer. Or, you can do sorting with an inline delegate, with a
                        delegate and with a static delegate. Which is best in terms of raw
                        speed? Or should I just use reflector, fill the sort in and save time?
                        Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
                        each? Or rather use it? What about 'yield'? Should I avoid exceptions?
                        What are real performance killers? So on...
                        >
                        Since I don't feel like testing every possible optimization and
                        thereby reinventing the wheel I wonder if someone else has already
                        made up a document, book or article containing all best practices
                        concerning low-level optimizations in C#.
                        >
                        Your help is appreciated.
                        >
                        Thanks,
                        Stefan.
                        >

                        Comment

                        • Marc Gravell

                          #13
                          Re: Performance tuning

                          >Here we have done a bit of optimizing on a raw data export to csv file

                          Care to share any code / references? For instance, I do a lot of
                          custom serialization via IXmlSerializabl e, which essentially boils
                          down to the same thing; writing strings (representing lots of values)
                          to a stream... I'd be interested in practical things I can do to make
                          things more efficient...

                          Marc

                          Comment

                          • ThunderMusic

                            #14
                            Re: Performance tuning

                            I cannot send you the final code, but here's the page where we found all we
                            needed... ;)



                            I hope it helps

                            ThunderMusic


                            "Marc Gravell" <marc.gravell@g mail.comwrote in message
                            news:1178912878 .794704.135760@ y80g2000hsf.goo glegroups.com.. .
                            Here we have done a bit of optimizing on a raw data export to csv file
                            >
                            Care to share any code / references? For instance, I do a lot of
                            custom serialization via IXmlSerializabl e, which essentially boils
                            down to the same thing; writing strings (representing lots of values)
                            to a stream... I'd be interested in practical things I can do to make
                            things more efficient...
                            >
                            Marc
                            >

                            Comment

                            Working...