How to write output on multiple files using multithreading?

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Man4ish
    New Member
    • Mar 2008
    • 151

    How to write output on multiple files using multithreading?

    Hi,

    I am newbie to multi-threading. I am working on one pblm where i need to process 24 files (each size = 3 GB) and write the output onto multiple files(24). Each file takes around 1 hour to process. Is it possible to write data onto multiple files concurrently using multi-threading?
    Thanks in advance.
  • horace1
    Recognized Expert Top Contributor
    • Nov 2006
    • 1510

    #2
    you can process multiple files in a single threaded program so should be no problem in a multithreaded program. You may have limitations on the number of concurrently open files though; this depends on operating system, compiler, configuration at link time, etc.
    Make sure when you attempt to open the files that you check they open OK and if not display an error message indicating the problem.

    Comment

    • Rabbit
      Recognized Expert MVP
      • Jan 2007
      • 12517

      #3
      Won't this make it slower? The hard drive is going to be a bottleneck. It's going to be jumping around like crazy trying to write a bunch of files at the same time instead of writing contiguous blocks.

      Comment

      • horace1
        Recognized Expert Top Contributor
        • Nov 2006
        • 1510

        #4
        I would assume we are talking about a complex data processing task. I have done similar things with Gbytes of satelite data (not 24 concurent files though) and leave it processing for days.

        Comment

        • donbock
          Recognized Expert Top Contributor
          • Mar 2008
          • 2427

          #5
          The C Standard does not require the stdio file access functions to be thread safe. I would not count on being able to access the same file simultaneously from different threads (I would not even have the same file open in different threads at the same time). You could get around this by having multiple processing threads that each enqueue I/O transactions to a single file-access thread; or by using semaphores to make all file access single-threaded.

          Comment

          • Man4ish
            New Member
            • Mar 2008
            • 151

            #6
            Thank you very much for your quick response. I can write to multiple files simultaneously but speed is (processing time for one file)*24 times. There is not much speed improvement. Is there any way to speed up the process?

            Comment

            • Rabbit
              Recognized Expert MVP
              • Jan 2007
              • 12517

              #7
              I thought that would happen. Actually, I'm surprised it wasn't slower than sequential writing. It's a limitation of hardware, not software. I don't see a way around it unless you upgrade your hardware. Faster RPM drives coupled with a raid array.

              Comment

              • Man4ish
                New Member
                • Mar 2008
                • 151

                #8
                Hi,

                I just noticed, it is slower than sequential writing. My hardware configuration is good (24 core).Why it is slower than sequential writing?

                Thanks

                Comment

                • Rabbit
                  Recognized Expert MVP
                  • Jan 2007
                  • 12517

                  #9
                  It doesn't matter how fast your processor is. The choke point is the hard disk. You need multiple hard disks in a raid array.

                  Comment

                  • horace1
                    Recognized Expert Top Contributor
                    • Nov 2006
                    • 1510

                    #10
                    if sequenctial writting gives the expected results why not use it? Is run time critial, cannot you leave it overnight? It can be very difficult to speed up a complex data processing task. Some ideas
                    1. look at the algorithm you are using - can you make it more efficent ?
                    2. check the compilers optimisation level
                    3. use faster disks (see previous contributions)

                    what about using a computing cluster such as Beowulf


                    have a look at


                    however, the hardware/software combination depends upon what you are doing and we don't have sufficent details to give explicity advice.

                    Comment

                    Working...