FFT: Problem with large dataset in memory

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Simon Lu

    FFT: Problem with large dataset in memory

    Hi,

    I am working in a project dealing with a big many of data. Now I have
    a problem with doing the FFT on a very long time trace, a signal with
    over 300 million sampling points. After testing with my computer, I
    realise that I can store only 2**27 points into memory, which will
    need 2 GB RAM. With an array of 2**28 Double_t points the program
    crash ("segmentati on violation"). I tried the Root cern framework, gsl
    and fftw3 library. They all need to load the data into memory. So the
    question is: Are there some mechanisms or algorithms to manage the
    array on a TTree or somewhere else on the hard disk? And then load the
    the data step by step into cache? Something likes a FileArray. Or you
    get a better idea?
    This is really urgent. I am very grateful if I can hear something from
    you!
    THX!

    --
    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.m oderated. First time posters: Do this! ]

  • Christoph Kliemt

    #2
    Re: FFT: Problem with large dataset in memory

    Moin!

    Simon Lu <luqsim@googlem ail.comwrites:
    Hi,
    >
    I am working in a project dealing with a big many of data. Now I have
    a problem with doing the FFT on a very long time trace, a signal with
    over 300 million sampling points. After testing with my computer, I
    realise that I can store only 2**27 points into memory, which will
    need 2 GB RAM. With an array of 2**28 Double_t points the program
    crash ("segmentati on violation").
    [...]
    And then load the the data step by step into cache? Something likes
    a FileArray. Or you get a better idea? This is really urgent. I am
    very grateful if I can hear something from you! THX!
    man mmap

    hth,

    Christoph

    --
    [ See http://www.gotw.ca/resources/clcm.htm for info about ]
    [ comp.lang.c++.m oderated. First time posters: Do this! ]

    Comment

    • joseph  cook

      #3
      Re: FFT: Problem with large dataset in memory

      This is really urgent. I am very grateful if I can hear something from
      you!
      THX!
      >
      This is off-topic and unrelated to C++.

      You can do the FFT in parts in divide and conquer. Do the even
      elements and the odd elements individually. If that's too much,
      divide each of those FFTs into odd and even, then recombine.

      FFT is a classic divide and conquer algorithm. I think the book
      "Numerical Recipes" may be useful here.

      Joe

      --
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.m oderated. First time posters: Do this! ]

      Comment

      • =?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

        #4
        Re: FFT: Problem with large dataset in memory

        On 2008-09-24 21:28, Simon Lu wrote:
        Hi,
        >
        I am working in a project dealing with a big many of data. Now I have
        a problem with doing the FFT on a very long time trace, a signal with
        over 300 million sampling points. After testing with my computer, I
        realise that I can store only 2**27 points into memory, which will
        need 2 GB RAM. With an array of 2**28 Double_t points the program
        crash ("segmentati on violation"). I tried the Root cern framework, gsl
        and fftw3 library. They all need to load the data into memory. So the
        question is: Are there some mechanisms or algorithms to manage the
        array on a TTree or somewhere else on the hard disk? And then load the
        the data step by step into cache? Something likes a FileArray. Or you
        get a better idea?
        This is really urgent. I am very grateful if I can hear something from
        There are many ways to do what you want, and the best way depends on
        your needs. If you need to operate on the whole dataset at once you have
        somewhat of a problem, if you are able to work with just a few data-
        points at a time you should be able to "stream" the data through your
        algorithm and can probably get away with a fairly small memory footprint.

        None of this is however directly topical in this group, is is on a to
        high level and have no C++ specific parts, try asking in a general
        programming group, such as comp.programmin g, instead.

        If you are running on Windows you might also be able to increase the
        virtual memory available for your application to 3GB, for more info:


        --
        Erik Wikström


        [ See http://www.gotw.ca/resources/clcm.htm for info about ]
        [ comp.lang.c++.m oderated. First time posters: Do this! ]

        Comment

        • nobyjos@gmail.com

          #5
          Re: FFT: Problem with large dataset in memory

          On Sep 24, 8:28 pm, Simon Lu <luq...@googlem ail.comwrote:
          Hi,
          >
          I am working in a project dealing with a big many of data. Now I have
          a problem with doing the FFT on a very long time trace, a signal with
          over 300 million sampling points. After testing with my computer, I
          realise that I can store only 2**27 points into memory, which will
          need 2 GB RAM. With an array of 2**28 Double_t points the program
          crash ("segmentati on violation"). I tried the Root cern framework, gsl
          and fftw3 library. They all need to load the data into memory. So the
          question is: Are there some mechanisms or algorithms to manage the
          array on a TTree or somewhere else on the hard disk? And then load the
          the data step by step into cache? Something likes a FileArray. Or you
          get a better idea?
          This is really urgent. I am very grateful if I can hear something from
          you!
          THX!
          Write the whole array into a file (binary format)
          Then link the file to the process using mmap()
          then you can use some sort of indexing to locate the data in the file.


          --
          [ See http://www.gotw.ca/resources/clcm.htm for info about ]
          [ comp.lang.c++.m oderated. First time posters: Do this! ]

          Comment

          • Greg Herlihy

            #6
            Re: FFT: Problem with large dataset in memory

            On Sep 24, 12:28 pm, Simon Lu <luq...@googlem ail.comwrote:
            I am working in a project dealing with a big many of data. Now I have
            a problem with doing the FFT on a very long time trace, a signal with
            over 300 million sampling points. After testing with my computer, I
            realise that I can store only 2**27 points into memory, which will
            need 2 GB RAM. With an array of 2**28 Double_t points the program
            crash ("segmentati on violation").
            Clearly, the most natural solution would be to port your current C++
            program to a 64-bit execution environment. A 64-bit memory model would
            be able to accommodate vastly larger amounts of data in RAM than is
            possible under the program's current, 32-bit memory model.

            Now porting a C++ program to a 64-bit environment might not turn out
            to be as straightforward as one might expect. Issues with C++'s sign
            promotion of integer types - especially - can lead to surprising
            results. Apple's 64-bit transition programmer's guide: (http://
            preview.tinyurl .com/3ns3b5) has some examples.

            Greg


            --
            [ See http://www.gotw.ca/resources/clcm.htm for info about ]
            [ comp.lang.c++.m oderated. First time posters: Do this! ]

            Comment

            • Marco Manfredini

              #7
              Re: FFT: Problem with large dataset in memory

              Simon Lu wrote:
              Hi,
              >
              I am working in a project dealing with a big many of data. Now I have
              a problem with doing the FFT on a very long time trace, a signal with
              over 300 million sampling points. After testing with my computer, I
              realise that I can store only 2**27 points into memory, which will
              need 2 GB RAM. With an array of 2**28 Double_t points the program
              crash ("segmentati on violation"). I tried the Root cern framework, gsl
              and fftw3 library. They all need to load the data into memory. So the
              question is: Are there some mechanisms or algorithms to manage the
              array on a TTree or somewhere else on the hard disk? And then load the
              the data step by step into cache? Something likes a FileArray. Or you
              get a better idea?
              Consider rearranging the input file. The FFT algorithm computes the FFT
              for the even data and the odd data separately and then combines the
              result. Now, what if you'd copy the *even* data points to an "even"
              data file and the *odd* points to an "odd" data file and compute the
              FFT for these separately? If you review the FFT algorithm you'll notice
              how you can easily sweep over these partial results to compute the
              total result.

              You can apply this recursively until you reach a data set size which
              fits into memory.



              --
              [ See http://www.gotw.ca/resources/clcm.htm for info about ]
              [ comp.lang.c++.m oderated. First time posters: Do this! ]

              Comment

              Working...