Optimal Solution

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • spam_list@yahoo.com

    Optimal Solution

    Hi

    I am deciding how to animate some numerical data. So far, I have kept
    the graphics and science(number crunching) codes separate which I
    like.
    The main issue confronting me is how to handle lots of data (usually
    6000 data points, 4 quantities at each point, 24,000 numbers total)
    in terms of optimization vs. effort. The 3 current options before me
    are:

    1) Have the science code write data to a file at each time step, and
    then the graphics read it in at each step.

    Advantage
    1a) Minimal coding with just a little modification of existing code
    1b) Maintains the separation between science and graphics
    1c) Straightforward solution
    Disadvantage
    1a) It seems there is a high overhead. I've done some limited tests
    and it takes about 1 sec to write and then read to the data
    file.
    Each numerical time step takes 1 sec. So I'll be doubling the
    time
    to run, and with 3600 steps, that goes from 1 to 2 hours(And 1
    more
    hour for animation).
    1b) Fragment the drive by repeated write/re-write. Actually, I
    don't
    know if this is an issue, but I worry.

    2) Combine the codes into a big code so that there is no reading or
    writing.

    Advantage
    2a) No input/output overhead
    2b) No time penalty related to 2a)
    Disadvantage
    1a) Lose everything about 1a), 1b), and 1c).

    3) Someone said that if I wrote to stdout and read from stdin, and
    pipe one
    program through another, the data would stay in memory without
    having to
    be written out. In other words:

    science-a.out | animate-a.out

    I did some tests though, and it still seems to take a second to
    write and
    read the data.

    I tried using the code profiler with the "-pg" flag but can't quite
    set it up
    to get anything.

    So my questions are:

    1) Should 3) be giving me better results than I seem to be getting?
    2) Can you think of anything else I should try?

    My machine is a 1.6 GhZ Pentium 4, 1 processor, and I'm coding on GNU/
    Linux.
    Graphics with OpenGL/Mesa, animation with ffmpeg.

    I am guessing that without the read/write overhead, a standard
    problem
    would take 2 hours to run. With the overhead, it would take 3 hours.

    Thanks.

    San Le
    slffea.com
  • Walter Roberson

    #2
    Re: Optimal Solution

    In article <30e0b660-30e9-49ca-b2c2-f7c668eb5529@w3 4g2000prm.googl egroups.com>,
    <spam_list@yaho o.comwrote:
    >I am deciding how to animate some numerical data. So far, I have kept
    >the graphics and science(number crunching) codes separate which I
    >like.
    >1) Have the science code write data to a file at each time step, and
    then the graphics read it in at each step.
    Disadvantage
    1a) It seems there is a high overhead. I've done some limited tests
    and it takes about 1 sec to write and then read to the data
    >file.
    Each numerical time step takes 1 sec. So I'll be doubling the
    >time
    to run, and with 3600 steps, that goes from 1 to 2 hours(And 1
    >more
    hour for animation).
    >My machine is a 1.6 GhZ Pentium 4, 1 processor, and I'm coding on GNU/
    >Linux.
    >Graphics with OpenGL/Mesa, animation with ffmpeg.
    As a practical matter, this may be a case where it makes sense to
    use shared memory. Compute a chunk, writing it into the shared
    array, then send information to the graphics process about
    where in the shared memory the chunk is. In turn, the graphics
    process has to indicate back that it is done with the block of
    data so that the calculation process can reuse the block.

    You would, of course, use at least two buffers, one that you
    are busy writing in, and the other that you wrote in before and
    is farmed out to the graphics process to read from. If the calculation
    side turns out to be faster than the graphics side, then the
    calculation side could (smoothly) run out of buffer space and have to
    wait for the graphics side to finish with a buffer before the
    calculation side could proceed. The communications involved should
    be relatively easy to implement.

    --
    "When a scientist is ahead of his times, it is often through
    misunderstandin g of current, rather than intuition of future truth.
    In science there is never any error so gross that it won't one day,
    from some perspective, appear prophetic." -- Jean Rostand

    Comment

    • James Harris

      #3
      Re: Optimal Solution

      On 23 May, 22:39, spam_l...@yahoo .com wrote:
      Hi
      >
      I am deciding how to animate some numerical data. So far, I have kept
      the graphics and science(number crunching) codes separate which I
      like.
      The main issue confronting me is how to handle lots of data (usually
      6000 data points, 4 quantities at each point, 24,000 numbers total)
      in terms of optimization vs. effort. The 3 current options before me
      are:
      ....
      My machine is a 1.6 GhZ Pentium 4, 1 processor, and I'm coding on GNU/
      Linux.
      Graphics with OpenGL/Mesa, animation with ffmpeg.
      >
      I am guessing that without the read/write overhead, a standard
      problem
      would take 2 hours to run. With the overhead, it would take 3 hours.
      This should probably be asked in comp.programmin g as it doesn't seem
      to be a C question. However, ...

      I'm not sure what about transferring via a file is causing a second's
      delay - CPU power, context switching, physical I/O time, too many
      small writes etc. I take it it is OK to have the graphics happen one
      second after the figures are generated but not to take an extra second
      per 'frame' of processing time. You should be able to overlap
      calculation and writing/reading.

      One option is to look at sockets. On a single machine you can use Unix
      or TCP/IP sockets. If you use the latter the science and graphics can
      be on the same or on different machines with no app changes. I make
      your data rate less than 1 Mbit/s so it's not high.

      Comment

      • Bartc

        #4
        Re: Optimal Solution


        <spam_list@yaho o.comwrote in message
        news:30e0b660-30e9-49ca-b2c2-f7c668eb5529@w3 4g2000prm.googl egroups.com...
        Hi
        >
        I am deciding how to animate some numerical data. So far, I have kept
        the graphics and science(number crunching) codes separate which I
        like.
        The main issue confronting me is how to handle lots of data (usually
        6000 data points, 4 quantities at each point, 24,000 numbers total)
        in terms of optimization vs. effort. The 3 current options before me
        are:
        >
        1) Have the science code write data to a file at each time step, and
        then the graphics read it in at each step.
        If you write your 24000 numbers as text, then yes it could take a while.

        But if the data could somehow be arranged as an array of 24000 elements, you
        can write the entire array (192000 bytes?) and read it in again very
        quickly.

        Fragmentation of the disk I don't think is an issue, especially if you keep
        writing over the same data.

        Synchronising the 2 programs might be a bit of a problem. I would put the
        graphics in it's own module(s), as part of the same program, and call it
        every step as needed. That would simplify a few things.

        --
        Bartc


        Comment

        • Gene

          #5
          Re: Optimal Solution

          On May 23, 5:39 pm, spam_l...@yahoo .com wrote:
          Hi
          >
          I am deciding how to animate some numerical data.  So far, I have kept
          the graphics and science(number crunching) codes separate which I
          like.
          The main issue confronting me is how to handle lots of data (usually
          6000 data points, 4 quantities at each point, 24,000 numbers total)
          in terms of optimization vs. effort.  The 3 current options before me
          are:
          >
          1) Have the science code write data to a file at each time step, and
             then the graphics read it in at each step.
          >
            Advantage
            1a) Minimal coding with just a little modification of existing code
            1b) Maintains the separation between science and graphics
            1c) Straightforward solution
            Disadvantage
            1a) It seems there is a high overhead.  I've done some limited tests
                and it takes about 1 sec to write and then read to the data
          file.
                Each numerical time step takes 1 sec.  So I'll be doubling the
          time
                to run, and with 3600 steps, that goes from 1 to 2 hours(And 1
          more
                hour for animation).
            1b) Fragment the drive by repeated write/re-write.  Actually, I
          don't
                know if this is an issue, but I worry.
          >
          2) Combine the codes into a big code so that there is no reading or
          writing.
          >
            Advantage
            2a) No input/output overhead
            2b) No time penalty related to 2a)
            Disadvantage
            1a) Lose everything about 1a), 1b), and 1c).
          >
          3) Someone said that if I wrote to stdout and read from stdin, and
          pipe one
             program through another, the data would stay in memory without
          having to
             be written out.  In other words:
          >
                 science-a.out | animate-a.out
          >
             I did some tests though, and it still seems to take a second to
          write and
             read the data.
          >
          I tried using the code profiler with the "-pg" flag but can't quite
          set it up
          to get anything.
          >
          So my questions are:
          >
             1) Should 3) be giving me better results than I seem to be getting?
             2) Can you think of anything else I should try?
          >
          My machine is a 1.6 GhZ Pentium 4, 1 processor, and I'm coding on GNU/
          Linux.
          Graphics with OpenGL/Mesa, animation with ffmpeg.
          >
          I am guessing that without the read/write overhead, a standard
          problem
          would take 2 hours to run.  With the overhead, it would take 3 hours.
          >
          File I/O to help code modularity alone is not such a great idea. If
          you design a clean data structure and interface between the data
          generator and the visulaizer, that's good enough. It sounds like a
          simple array will suit your purpose.

          On the other hand, it _is_ a good idea to design visualization systems
          with a recorder/viewer pattern. Your "science code" is the data
          recorder. The viewer reads the recording and renders it. This way
          e.g. you can save recordings to make a virtual lab notebook--
          scientists keep notebooks religiously--and run animations backward and
          forward like instant replay in sports, which can be very useful. Make
          sure to save time hacks with the data frames so it can be obvious
          where in the replay you are at the moment.

          I take it you're talking about 24,000 numbers per frame of an
          animation with thousands of frames. For graphics you may be able to
          do with 4-byte numbers (floats or scaled ints) if feature ratios
          aren't too big. So we have ~100Kb per frame or 1Gb for 10,000
          frames. Of course double this if you need 8-byte numbers.

          As has been mentioned, you can make I/O much faster by fwrite()ing
          your data directly rather than printf()ing it, which entails
          conversion to/from text and explosion of the data size. Naturally
          binary data may not be readable by machines other than the kind where
          writing occurred. You can easily implement a text format for transfer
          between different architectures and a conversion to get the data back
          in binary format on the new machine.

          At 1Gb, you are on the fringe between the data size you'd want to read
          into RAM entirely--an "array of frames" approach--and sizes beyond,
          where some fancier buffering/cache scheme would be needed. The usual
          buffer/cache scheme is to read data blocks from file into blocks of
          RAM, kicking out the least recently used when RAM gets scarce. To
          make the animation smoother, use a separate thread that tries to
          "guess" blocks that will be needed soon and read them in advance,
          especially while the user has paused to look at a frame of interest.

          Comment

          • Malcolm McLean

            #6
            Re: Optimal Solution


            <spam_list@yaho o.comwrote in message
            >The main issue confronting me is how to handle lots of data (usually
            >6000 data points, 4 quantities at each point, 24,000 numbers total)
            1a) It seems there is a high overhead. I've done some limited tests
            and it takes about 1 sec to write and then read to the data
            file.
            >
            It shouldn't take anything like a second to convert 24,000 numbers to ASCII
            and flush them out a buffer. What you are probably seeing is latency - the
            OS gubbins that manages the disk does sweeps every second or so to see if
            there is anything to write, and writes it out.

            So essentially you need to decouple writes from reads. This may be as simple
            as writing a massive trajectory file and then viewing it. First thing is to
            do that and ascertain that the disk can in fact write your data much faster
            than 24,000 numbers / sec (or about 200 K per sec).

            --
            Free games and programming goodies.




            Comment

            • spam_list@yahoo.com

              #7
              Re: Optimal Solution


              Thank you all very much for your responses. They have been incredibly
              helpful.

              Gene wrote:
              >As has been mentioned, you can make I/O much faster by fwrite()ing
              >your data directly rather than printf()ing it, which entails
              >conversion to/from text and explosion of the data size.
              Bartc wrote:
              >But if the data could somehow be arranged as an array of 24000 elements, you
              >can write the entire array (192000 bytes?) and read it in again very
              >quickly.
              This is the solution I am going with. The speed-up is phenomenal
              using
              binary data with fwrite and fread. The I/O file, had my math been
              correct,
              would indeed have been 192000 bytes rather than 450000 for text.
              Thank
              you both for recommending it.

              Gene wrote:
              >On the other hand, it _is_ a good idea to design visualization systems
              >with a recorder/viewer pattern. Your "science code" is the data
              >recorder. The viewer reads the recording and renders it. This way
              >e.g. you can save recordings to make a virtual lab notebook--
              <cut>
              >I take it you're talking about 24,000 numbers per frame of an
              >animation with thousands of frames. For graphics you may be able to
              >do with 4-byte numbers (floats or scaled ints) if feature ratios
              >aren't too big. So we have ~100Kb per frame or 1Gb for 10,000
              >frames. Of course double this if you need 8-byte numbers.
              This animation is for a computational fluid dynamics(CFD) code.
              Currently,
              the only data saved is the final time step supplemented with data such
              as
              streaklines and pathlines which somewhat help illustrate the flow over
              time. Best practices probably dictate recording the data for every
              step,
              but I'm willing to have the animation be the full record of the flow
              to avoid amassing all the data.


              Malcolm McLean wrote:
              >It shouldn't take anything like a second to convert 24,000 numbers to ASCII
              >and flush them out a buffer.
              I incorrectly gave the number of doubles. Sorry about that.
              It really should have been 5X that, or 120,000 doubles. I'm not sure
              if
              it makes sense that writing 120,000 doubles should still take about a
              second though. The profiler says it took .005 seconds but I can tell
              that's not true.

              James Harris wrote:
              >I take it it is OK to have the graphics happen one
              >second after the figures are generated but not to take an extra second
              >per 'frame' of processing time. You should be able to overlap
              >calculation and writing/reading.
              You are right and this hopefully will be another source of
              optimization.
              I got stuck on how the science has to wait while the graphics finished
              reading before writing again, but of course, I could do something
              like:

              system( "mv output_science input_animation ");

              and then write out to the file "output_science " from the science code
              while the animation code is reading. The "mv" should be quick and
              can be done for a few steps at a time.

              Walter Roberson wrote:
              >As a practical matter, this may be a case where it makes sense to
              >use shared memory. Compute a chunk, writing it into the shared
              >array, then send information to the graphics process about
              >where in the shared memory the chunk is. In turn, the graphics
              >process has to indicate back that it is done with the block of
              >data so that the calculation process can reuse the block.
              I looked at shared memory and this is also a good direction to
              go in, but it's probably beyond my coding abilities right now
              so I'll probably go with the simple solutions of "fwrite" and
              "fread".

              Again, my deepest thanks for all the great suggestions. I would
              like to thank you all personally on the project pages for my CFD
              code if there aren't any objections.

              San Le
              slfcfd.com

              Comment

              • Bartc

                #8
                Re: Optimal Solution


                "Malcolm McLean" <regniztar@btin ternet.comwrote in message
                news:Yb6dnbSLff 6lWarV4p2dnAA@b t.com...
                >
                <spam_list@yaho o.comwrote in message
                >>The main issue confronting me is how to handle lots of data (usually
                >>6000 data points, 4 quantities at each point, 24,000 numbers total)
                > 1a) It seems there is a high overhead. I've done some limited tests
                > and it takes about 1 sec to write and then read to the data
                > file.
                >>
                It shouldn't take anything like a second to convert 24,000 numbers to
                ASCII
                On my machine it was taking over 200ms to write the 24000 double values to a
                file. Writing each value to a string only took slightly less time, so the
                text conversion is a bottleneck. (Writing as a single block of binary data
                /seemed/ to take about 1.25msec, but certainly much faster)

                Don't know about reading them back in (I can't use fscanf properly), but the
                totals wouldn't be far off the 1 sec reported.
                >First thing is to do that and ascertain that the disk can in fact write
                >your data much faster than 24,000 numbers / sec (or about 200 K per sec).
                I'm sure that is the case. Unless he's using some unusual media.

                --
                Bartc


                Comment

                • Szabolcs Borsanyi

                  #9
                  Re: Optimal Solution

                  On Sat, May 24, 2008 at 10:19:45AM +0000, Bartc wrote:
                  >
                  "Malcolm McLean" <regniztar@btin ternet.comwrote in message
                  It shouldn't take anything like a second to convert 24,000 numbers to
                  ASCII
                  >
                  On my machine it was taking over 200ms to write the 24000 double values to a
                  file. Writing each value to a string only took slightly less time, so the
                  text conversion is a bottleneck. (Writing as a single block of binary data
                  /seemed/ to take about 1.25msec, but certainly much faster)
                  >
                  Don't know about reading them back in (I can't use fscanf properly), but the
                  totals wouldn't be far off the 1 sec reported.
                  If the platform of the viewer and the recorder is the same, you can use binary
                  files. I'd emphasize the fact that the double precision is seldom relevant for
                  visualisation. So I'd stick to a stream of floats. Double to float conversion is
                  mostly quick, and it is usually half the size, and far more efficient than
                  ascii. (sometimes it is enough to save a number in a byte or two, especially if
                  it is a colour component, or a coordinate. Think of the display's resolution and
                  colour space.)
                  It can be important to be able to seek between frames, which is difficult in
                  ascii, unless you save each frame into a separate file (which, is, I think,
                  quite reasonable, since you can erase old frames without moving new ones.)
                  If I were you I'd design a binary format, (including the passing all kind of
                  header and control info) and that interface will be the only coupling between
                  the two parts of your software.

                  Szabolcs

                  Comment

                  • Szabolcs Borsanyi

                    #10
                    Re: Optimal Solution

                    On Sat, May 24, 2008 at 03:16:03AM -0700, spam_list@yahoo .com wrote:
                    >
                    Thank you all very much for your responses. They have been incredibly
                    helpful.
                    >
                    Bartc wrote:
                    But if the data could somehow be arranged as an array of 24000 elements, you
                    can write the entire array (192000 bytes?) and read it in again very
                    quickly.
                    >
                    This is the solution I am going with. The speed-up is phenomenal
                    using
                    binary data with fwrite and fread. The I/O file, had my math been
                    correct,
                    would indeed have been 192000 bytes rather than 450000 for text.
                    Thank
                    you both for recommending it.
                    24000*sizeof(fl oat)=96000, if you want an other factor of two, see my prev
                    post.
                    James Harris wrote:
                    >
                    I take it it is OK to have the graphics happen one
                    second after the figures are generated but not to take an extra second
                    per 'frame' of processing time. You should be able to overlap
                    calculation and writing/reading.
                    >
                    You are right and this hopefully will be another source of
                    optimization.
                    I got stuck on how the science has to wait while the graphics finished
                    reading before writing again, but of course, I could do something
                    like:
                    >
                    system( "mv output_science input_animation ");
                    You should better rotate the file names, like this:
                    sprintf(output_ science,"scienc e.dat.%d",time_ step%10);
                    Then reading and writing can very well overlap. You will have to
                    tell the visualiser about this naming convention, of course.
                    Walter Roberson wrote:
                    >
                    As a practical matter, this may be a case where it makes sense to
                    use shared memory. Compute a chunk, writing it into the shared
                    array, then send information to the graphics process about
                    where in the shared memory the chunk is. In turn, the graphics
                    process has to indicate back that it is done with the block of
                    data so that the calculation process can reuse the block.
                    >
                    I looked at shared memory and this is also a good direction to
                    go in, but it's probably beyond my coding abilities right now
                    so I'll probably go with the simple solutions of "fwrite" and
                    "fread".
                    fread/fwrite are more portable, anyway. And with shared memory you'll
                    have no record on the past frames, once you have overwritten them.

                    Szabolcs

                    Comment

                    • spam_list@yahoo.com

                      #11
                      Re: Optimal Solution



                      Szabolcs Borsanyi wrote:
                      >You should better rotate the file names, like this:
                      >sprintf(output _science,"scien ce.dat.%d",time _step%10);
                      >Then reading and writing can very well overlap. You will have to
                      >tell the visualizer about this naming convention, of course.
                      Thanks. I'll try it.
                      >fread/fwrite are more portable, anyway. And with shared memory you'll
                      >have no record on the past frames, once you have overwritten them.
                      I was wondering how standard these libraries were across the Unices as
                      well as Windows. It's good to know this may be an issue.

                      San Le
                      slfcfd.com

                      Comment

                      Working...