Any good refs on dealing with tightly packed data in C#?

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • _dee

    Any good refs on dealing with tightly packed data in C#?

    I'm working on a port of a legacy app originally written in C. Data
    was compacted into bit fields. Are there any sites or books that cover
    optimized handling of this type of data? I'd need to develop optimized
    functions for reading and writing, and given the volume of source, I
    don't want to end up with pages of obscure code. (Surprisingly, the
    original C was pretty clear).


  • _dee

    #2
    Re: Any good refs on dealing with tightly packed data in C#?

    On Sat, 12 Apr 2008 13:50:37 +0200, Jeroen Mostert
    <jmostert@xs4al l.nlwrote:
    >_dee wrote:
    >I'm working on a port of a legacy app originally written in C. Data
    >was compacted into bit fields. Are there any sites or books that cover
    >optimized handling of this type of data? I'd need to develop optimized
    >functions for reading and writing, and given the volume of source, I
    >don't want to end up with pages of obscure code. (Surprisingly, the
    >original C was pretty clear).
    >>
    >Yes, but the resulting assembly code probably wasn't. :-) To access those
    >bit fields, the compiler needs to generate tricky code for masking and
    >shifting the values every time they're accessed. The resulting slowdown can
    >be prohibitive, which is why storage space needs to really be at a premium
    >for bit fields to be worth it.
    Thanks for the followup comments, Jeroen! Some good ideas in your
    reply (logged to file for later).

    After checking into BitVector32, etc, I've decided that the best
    approach for the first version is probably direct mask/shift ops on
    the raw byte stream. I can get the first code version done by
    paralleling the earlier C code. After I've got the first version
    running, a more elegant approach may become obvious.
    >Having to write the masking and shifting code in C# itself is a real pain,
    >and you'll have to maintain separate code for 64-bit (if that's relevant).
    This was part of the motivation for dealing with individual bytes.
    That should not incur any sync problems and it should transport more
    easily.
    >It might be an option to just write those functions in pure C and p/invoke
    >to them, or to write them in C++/CLI for easier integration with managed
    >code. Doing it all in pure C# is possible, but probably not the most
    >effective approach.
    Yes, C++ may make sense here (especially after I get into trouble
    later. <g>). But I was trying to stay with C# for the first prototype.

    The biggest remaining hitch for C# is a simple one: Passing 'pointers'
    to individual tokens within the byte stream. I've done something like
    this in the past with 'unsafe' code, but I was trying to avoid that.

    The goal is to pass refs to tokens (short groups of bytes) so I can
    avoid copying bytes IOW, passing a pointer to the middle of a byte
    array. This is actually the main place where I miss C++. I don't
    suppose you know any tricks for doing that?

    Comment

    • Jeroen Mostert

      #3
      Re: Any good refs on dealing with tightly packed data in C#?

      _dee wrote:
      The biggest remaining hitch for C# is a simple one: Passing 'pointers'
      to individual tokens within the byte stream. I've done something like
      this in the past with 'unsafe' code, but I was trying to avoid that.
      >
      The goal is to pass refs to tokens (short groups of bytes) so I can
      avoid copying bytes IOW, passing a pointer to the middle of a byte
      array. This is actually the main place where I miss C++. I don't
      suppose you know any tricks for doing that?
      A pointer inside an array is just an unsafe way of using an index. All .NET
      functions that operate on buffers, like Socket.Read(), use the same pattern:

      public void BufferOperation (byte[] buffer, int offset, int count);

      If you're concerned this is slower than a pointer, don't be. JIT compilation
      makes indexed operations on arrays just about as fast as using pointers, and
      of course only profiling can show you where the real bottlenecks will be. I
      have a good feeling it's not going to be the code that accesses the
      individual bytes. There's also the obvious trade-off that if you're going to
      use managed code, it's a bit of an oxymoron to throw away the safety
      advantage by using pointers.

      If continuously passing an offset and count doesn't look attractive, you can
      also capture these items in a struct and pass those around, effectively
      using array slices (you could even call them "tokens"). If stream-based
      operations are more convenient you can also wrap the array in a MemoryStream
      and seek on that.

      --
      J.

      Comment

      • _dee

        #4
        Re: Any good refs on dealing with tightly packed data in C#?

        On Tue, 15 Apr 2008 08:19:00 +0200, Jeroen Mostert
        <jmostert@xs4al l.nlwrote:
        >_dee wrote:
        >The biggest remaining hitch for C# is a simple one: Passing 'pointers'
        >to individual tokens within the byte stream.
        >>
        >The goal is to pass refs to tokens (short groups of bytes) so I can
        >avoid copying bytes ...
        >
        >A pointer inside an array is just an unsafe way of using an index. All .NET
        >functions that operate on buffers, like Socket.Read(), use the same pattern:
        >
        public void BufferOperation (byte[] buffer, int offset, int count);
        >
        >If you're concerned this is slower than a pointer, don't be. JIT compilation
        >makes indexed operations on arrays just about as fast as using pointers, and
        >of course only profiling can show you where the real bottlenecks will be.
        I was not concerned so much about runtime. It just doesn't look as
        good as the original code (which is really saying something for C++
        code!).

        The original code just pointed struct pointers into the token stream
        and operated on them in place. Very transparent, and it worked well.
        No copying, and bit-ops were done easily.

        I can't bear to go back to C++ for the rest of the code, so I suppose
        this is the price to be paid. Maybe I will revisit your original
        suggestion for using C++ for just this module later.

        I was hoping that I was overlooking something, but obviously you have
        worked with this type of code a lot. So I'll just accept the penalty
        in this code module. Other than this particular scenario, I can't
        think of anywhere else that I miss C++.

        Thanks again, Jeroen!

        Comment

        • _dee

          #5
          Re: Any good refs on dealing with tightly packed data in C#?

          On Tue, 15 Apr 2008 08:19:00 +0200, Jeroen Mostert
          <jmostert@xs4al l.nlwrote:
          >_dee wrote:
          >The biggest remaining hitch for C# is a simple one: Passing 'pointers'
          >to individual tokens within the byte stream. I've done something like
          >this in the past with 'unsafe' code, but I was trying to avoid that.
          >>
          >The goal is to pass refs to tokens (short groups of bytes) so I can
          >avoid copying bytes IOW, passing a pointer to the middle of a byte
          >array. This is actually the main place where I miss C++. I don't
          >suppose you know any tricks for doing that?
          >
          >A pointer inside an array is just an unsafe way of using an index. All .NET
          >functions that operate on buffers, like Socket.Read(), use the same pattern:
          >
          public void BufferOperation (byte[] buffer, int offset, int count);
          >
          >If you're concerned this is slower than a pointer, don't be. JIT compilation
          >makes indexed operations on arrays just about as fast as using pointers, and
          ....

          Jeroen,

          Just wanted to let you know that I've used the
          function(arrayN ame, offset)
          approach to code a few modules now. While it still seems more obscure
          than a good old-fashioned C++ pointer, it's less aesthetically
          objectionable than I thought.

          My problem was that I have to use this all over the place, so the
          repeated recurrence of those params is ugly. But also more
          error-prone, since I keep forgetting to juggle the offsets correctly.

          One of the few places where I miss C++. But...it works and I'm getting
          used to it. Your comments helped to tip the scales, and I think this
          approach will be better overall than using unsafe code or crossing
          back and forth to C++.

          Thanks again.

          Comment

          • Jon Skeet [C# MVP]

            #6
            Re: Any good refs on dealing with tightly packed data in C#?

            _dee <_dee@nospam.co mwrote:
            Just wanted to let you know that I've used the
            function(arrayN ame, offset)
            approach to code a few modules now. While it still seems more obscure
            than a good old-fashioned C++ pointer, it's less aesthetically
            objectionable than I thought.
            If you're using it a lot, why not create a class to encapsulate it?
            Something like (untested):

            public class ArraySlice<T>
            {
            readonly T[] original;
            readonly int offset;

            public ArraySlice(T[] original, int offset)
            {
            this.original = original;
            this.offset = offset;
            }

            public T this[int index]
            {
            get { return original[index+offset]; }
            set { original[index+offset] = value; }
            }
            }

            You could add lots of other functionality, such as implementing
            IList<Tand potentially taking any IList<Tinstead of just arrays
            (apart from anything else, that would let you then slice a slice...)

            If you're using C# 3 you might also want to consider adding an
            extension method (or at least a static factory method) to let type
            inference do its magic.

            --
            Jon Skeet - <skeet@pobox.co m>
            http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
            World class .NET training in the UK: http://iterativetraining.co.uk

            Comment

            Working...