Loopless syntax for 2d in NumPy (or Numarray)

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • 2mc

    Loopless syntax for 2d in NumPy (or Numarray)

    I am finding out all kinds of ways to do things in NumPy through the
    many suggestions I have received. It's exciting. Thanks to all who
    have replied on my other threads.

    I'm still having trouble with one thing. Let me set a scenario and
    see if anyone has any ideas.

    Assume a multidimensiona l array (2d). This would be like a
    spreadsheet of rows and columns. Further, assume hundreds of 'rows'
    and 3 columns. Suppose I want a running list of the highest value in a
    single column for 20 'rows'. So, starting at 'row' 19, the answer
    would be the highest value from 'row' 0 to 'row' 19. Then, at 'row'
    20, the answer would be the highest value from 'row' 1 to 'row' 20.
    And, so on. Further, suppose I want this value for each 'column'.
    The result would be a 3 'column' array with 19 less rows than the
    source array and would contain a running list of highest values of
    each column for the last 20 rows.

    How would this be done without loops? Or, at least without looping
    through every row.

    If it can't be done, is this something that numarray could do?

    Thanks a million.

    Matt
  • Terry Reedy

    #2
    Re: Loopless syntax for 2d in NumPy (or Numarray)


    "2mc" <mcrider@bigfoo t.com> wrote in message
    news:500a4565.0 310191108.55e52 b1d@posting.goo gle.com...[color=blue]
    > Assume a multidimensiona l array (2d). This would be like a
    > spreadsheet of rows and columns. Further, assume hundreds of 'rows'
    > and 3 columns. Suppose I want a running list of the highest value in[/color]
    a[color=blue]
    > single column for 20 'rows'. So, starting at 'row' 19, the answer
    > would be the highest value from 'row' 0 to 'row' 19. Then, at 'row'
    > 20, the answer would be the highest value from 'row' 1 to 'row' 20.
    > And, so on. Further, suppose I want this value for each 'column'.
    > The result would be a 3 'column' array with 19 less rows than the
    > source array and would contain a running list of highest values of
    > each column for the last 20 rows.
    >
    > How would this be done without loops? Or, at least without looping
    > through every row.[/color]

    Just curious: is this a real problem, or one you made up to stump
    NumPy? Keep in mind that NumPy was written to do typical array
    operations, linear algebra/analysis, ffts, and even some things not so
    typical. A moving maximum is highly nonlinear, unusual, and likely to
    need explicit looping to be done efficiently.

    I think the way to avoid redoing the max from scratch with each move
    of the window is to use a heap augmented by a circular index that
    enables easy replacement of the departing number with the incoming
    number. After the replacement, re-establish the heap property and
    record the max. (For more on heaps, see heapq.py in the lib or
    various algorithm books.)

    Terry J. Reedy


    Comment

    • 2mc

      #3
      Re: Loopless syntax for 2d in NumPy (or Numarray)

      "Terry Reedy" <tjreedy@udel.e du> wrote in message news:<VfqdnZ2_Z 8mclg6iRVn-vA@comcast.com> ...[color=blue]
      > "2mc" <mcrider@bigfoo t.com> wrote in message
      > news:500a4565.0 310191108.55e52 b1d@posting.goo gle.com...[color=green]
      > > Assume a multidimensiona l array (2d). This would be like a
      > > spreadsheet of rows and columns. Further, assume hundreds of 'rows'
      > > and 3 columns. Suppose I want a running list of the highest value in[/color]
      > a[color=green]
      > > single column for 20 'rows'. So, starting at 'row' 19, the answer
      > > would be the highest value from 'row' 0 to 'row' 19. Then, at 'row'
      > > 20, the answer would be the highest value from 'row' 1 to 'row' 20.
      > > And, so on. Further, suppose I want this value for each 'column'.
      > > The result would be a 3 'column' array with 19 less rows than the
      > > source array and would contain a running list of highest values of
      > > each column for the last 20 rows.
      > >
      > > How would this be done without loops? Or, at least without looping
      > > through every row.[/color]
      >
      > Just curious: is this a real problem, or one you made up to stump
      > NumPy?[/color]

      Yes, this is a real problem. Of course, it involves much more than
      this. But, yes, I would like to get a running list of highest values
      within a range of values.
      [color=blue]
      > Keep in mind that NumPy was written to do typical array operations, linear
      > algebra/analysis, ffts, and even some things not so typical. A moving
      > maximum is highly nonlinear, unusual, and likely to need explicit looping to
      > be done efficiently.
      >
      > I think the way to avoid redoing the max from scratch with each move
      > of the window is to use a heap augmented by a circular index that
      > enables easy replacement of the departing number with the incoming
      > number. After the replacement, re-establish the heap property and
      > record the max. (For more on heaps, see heapq.py in the lib or
      > various algorithm books.)[/color]

      I didn't know there was a function called heapq.py. I have 3 books:
      Learning Python, Python Essential Reference, and Practical Python. I
      also have read the Numerical Python manual on the numerical Python
      website. None of these mentioned it.

      I have a program that was written in a programming language called SPL
      (Smart Programming Language). It is a Pascal-like scripting language
      that worked with an older suite of programs called "Smartware 2000."
      It is like Visual Basic for Applications except that it isn't OO. In
      this language I use multidimensiona l arrays of quite large sizes and
      explicitly declare loops to accomplish a lot of what I'm doing. I
      keep the arrays entirely in RAM for speed purposes. But, because it
      was a program designed to work with a suite of applications (though it
      does not have to use any of them - just like Visual Basic) it has a
      lot of overhead that slows the program down.

      So, I want to port this program to something else that will speed the
      process up. In investigating Python one of the comments I read about
      it was that it was slower than other languages but easier to program -
      that is, it was truly a rapid application developer. Because it was
      slow, an extension module was created to provide true multidimensiona l
      arrays with a computational speed only slightly slower than C++. So,
      I thought to myself that I would sacrifice some speed through the
      normal parts of the program for the gain of computaional speed using
      this multidimensiona l array extension in those parts needing it and
      also for the ease of application design. My concern is that I don't
      inadvertently write the program in such a way as to slow it down
      rather than taking advantage of these modules that provide assistance
      with regard to speed. The language is just enough different that I'm
      having trouble thinking in NumPy.

      Your suggestion to use heapq was interesting. Basically, in my
      current program I use something similar. I sort, enter new data into
      oldest data's spot, resort, find the max, and repeat.

      I have a couple of questions about heapq. The website says that it is
      an array but that it works on lists. So, heapq would not work on
      NumPy arrays, right? And, it is then a normal Python array rather
      than a NumPy array, right? Is there a comparable speed increase using
      heapq over explicit Python loops as there is in NumPy over explicit
      Python loops?

      I guess I'm looking for the fastest way in Python or in any Python
      module to accomplish what I want to do.

      Thanks for your response. I appreciate it.

      Matt

      Comment

      • Terry Reedy

        #4
        Re: Loopless syntax for 2d in NumPy (or Numarray)


        "2mc" <mcrider@bigfoo t.com> wrote in message
        news:500a4565.0 310192044.dce70 5b@posting.goog le.com...[color=blue]
        > "Terry Reedy" <tjreedy@udel.e du> wrote in message[/color]
        news:<VfqdnZ2_Z 8mclg6iRVn-vA@comcast.com> ...[color=blue][color=green]
        > > Just curious: is this a real problem, or one you made up to stump
        > > NumPy?[/color]
        >
        > Yes, this is a real problem. Of course, it involves much more than
        > this. But, yes, I would like to get a running list of highest[/color]
        values[color=blue]
        > within a range of values.[/color]
        [color=blue][color=green]
        > > I think the way to avoid redoing the max from scratch with each[/color][/color]
        move[color=blue][color=green]
        > > of the window is to use a heap augmented by a circular index that
        > > enables easy replacement of the departing number with the incoming
        > > number. After the replacement, re-establish the heap property and
        > > record the max. (For more on heaps, see heapq.py in the lib or
        > > various algorithm books.)[/color]
        >
        > I didn't know there was a function called heapq.py.[/color]

        It is a module added to the standard library perhaps in 2.2. Not
        surprised if not in books.
        [color=blue]
        > Your suggestion to use heapq was interesting. Basically, in my
        > current program I use something similar. I sort, enter new data[/color]
        into[color=blue]
        > oldest data's spot, resort, find the max, and repeat.[/color]

        Same idea. I think best resort method should be binary search for
        insertion spot followed by shift-1 movement of off-by-1 block toward
        empty spot and then insertion of new item in proper place.

        The same operation for heaps would be O(logN) instead of O(N), but
        with a higher constant, so it might or might not be faster for N=19.
        (But I suspect it would be.)
        [color=blue]
        > I have a couple of questions about heapq. The website says that it[/color]
        is[color=blue]
        > an array but that it works on lists. So, heapq would not work on
        > NumPy arrays, right? And, it is then a normal Python array rather
        > than a NumPy array, right?[/color]

        The algorithms, with adjustment, can use any type of random access
        array, but there would not be any obvious benefit of shifting. heapq,
        as implied by the q for queue, is intended for heap use as priority
        queue. It has a replace_max function but not a
        replace-arbitrary-element operation. Regardless of what array type
        you use, you would have to write your own replace function. But it
        should be a straight forward adjustment of the current replace
        function -- once you understand heaps.
        [color=blue]
        > Is there a comparable speed increase using
        > heapq over explicit Python loops as there is in NumPy over explicit
        > Python loops?[/color]

        No, heapq is written in Python and uses normal looping. But do keep
        in mind that pure Python code, especially when using ints, floats, and
        loops, can ofter run several times faster when psyco-ized.
        [color=blue]
        > Thanks for your response. I appreciate it.[/color]

        Welcome,

        Terry J. Reedy


        Comment

        • David Eppstein

          #5
          Re: Loopless syntax for 2d in NumPy (or Numarray)

          In article <5tqdnemUQZWYtQ miRVn-tQ@comcast.com> ,
          "Terry Reedy" <tjreedy@udel.e du> wrote:
          [color=blue][color=green]
          > > I didn't know there was a function called heapq.py.[/color]
          >
          > It is a module added to the standard library perhaps in 2.2. Not
          > surprised if not in books.[/color]

          It was added in 2.3.
          The official home of the Python Programming Language


          --
          David Eppstein http://www.ics.uci.edu/~eppstein/
          Univ. of California, Irvine, School of Information & Computer Science

          Comment

          • 2mc

            #6
            Re: Loopless syntax for 2d in NumPy (or Numarray)

            "Terry Reedy" <tjreedy@udel.e du> wrote in message news:<5tqdnemUQ ZWYtQmiRVn-tQ@comcast.com> ...[color=blue]
            > "2mc" <mcrider@bigfoo t.com> wrote in message
            > news:500a4565.0 310192044.dce70 5b@posting.goog le.com...[color=green]
            > > "Terry Reedy" <tjreedy@udel.e du> wrote in message[/color]
            > news:<VfqdnZ2_Z 8mclg6iRVn-vA@comcast.com> ...[color=green][color=darkred]
            > > > Just curious: is this a real problem, or one you made up to stump
            > > > NumPy?[/color]
            > >
            > > Yes, this is a real problem. Of course, it involves much more than
            > > this. But, yes, I would like to get a running list of highest[/color]
            > values[color=green]
            > > within a range of values.[/color]
            >[color=green][color=darkred]
            > > > I think the way to avoid redoing the max from scratch with each[/color][/color]
            > move[color=green][color=darkred]
            > > > of the window is to use a heap augmented by a circular index that
            > > > enables easy replacement of the departing number with the incoming
            > > > number. After the replacement, re-establish the heap property and
            > > > record the max. (For more on heaps, see heapq.py in the lib or
            > > > various algorithm books.)[/color]
            > >
            > > I didn't know there was a function called heapq.py.[/color]
            >
            > It is a module added to the standard library perhaps in 2.2. Not
            > surprised if not in books.
            >[color=green]
            > > Your suggestion to use heapq was interesting. Basically, in my
            > > current program I use something similar. I sort, enter new data[/color]
            > into[color=green]
            > > oldest data's spot, resort, find the max, and repeat.[/color]
            >
            > Same idea. I think best resort method should be binary search for
            > insertion spot followed by shift-1 movement of off-by-1 block toward
            > empty spot and then insertion of new item in proper place.
            >
            > The same operation for heaps would be O(logN) instead of O(N), but
            > with a higher constant, so it might or might not be faster for N=19.
            > (But I suspect it would be.)
            >[color=green]
            > > I have a couple of questions about heapq. The website says that it[/color]
            > is[color=green]
            > > an array but that it works on lists. So, heapq would not work on
            > > NumPy arrays, right? And, it is then a normal Python array rather
            > > than a NumPy array, right?[/color]
            >
            > The algorithms, with adjustment, can use any type of random access
            > array, but there would not be any obvious benefit of shifting. heapq,
            > as implied by the q for queue, is intended for heap use as priority
            > queue. It has a replace_max function but not a
            > replace-arbitrary-element operation. Regardless of what array type
            > you use, you would have to write your own replace function. But it
            > should be a straight forward adjustment of the current replace
            > function -- once you understand heaps.
            >[color=green]
            > > Is there a comparable speed increase using
            > > heapq over explicit Python loops as there is in NumPy over explicit
            > > Python loops?[/color]
            >
            > No, heapq is written in Python and uses normal looping. But do keep
            > in mind that pure Python code, especially when using ints, floats, and
            > loops, can ofter run several times faster when psyco-ized.[/color]

            Thanks for your responses. They have been most helpful. Thank you
            for your kindness.

            I've looked a little bit at Psyco. I have a couple of questions -
            what else is new? :-)

            1. Would a psyco-ized normal Python program that performs calculations
            be slower, faster, or as fast as the equivalent in NumPy?

            2. Can Numerical Python be psyco-ized? I'm supposing it cannot.

            Thanks again. I appreciate it.

            Matt

            [color=blue]
            >[color=green]
            > > Thanks for your response. I appreciate it.[/color]
            >
            > Welcome,
            >
            > Terry J. Reedy[/color]

            Comment

            • Fernando Perez

              #7
              Re: Loopless syntax for 2d in NumPy (or Numarray)

              2mc wrote:
              [color=blue]
              > How would this be done without loops? Or, at least without looping
              > through every row.[/color]

              For those cases when you have no option but looping, the easiest approach out
              there is probably weave.inline(). Sorry I can't give you too many more
              details right now, but if you go here:



              you should be able to get started quickly. For weave itself, go to

              Why SciPy? Fundamental algorithms. Broadly applicable. Foundational. Interoperable. Performant. Open source.


              weave.inline() is one amazing piece of work.

              Cheers,

              f

              Comment

              • Terry Reedy

                #8
                Re: Loopless syntax for 2d in NumPy (or Numarray)


                "2mc" <mcrider@bigfoo t.com> wrote in message
                news:500a4565.0 310211950.519f3 d00@posting.goo gle.com...[color=blue]
                > 1. Would a psyco-ized normal Python program that performs[/color]
                calculations[color=blue]
                > be slower, faster, or as fast as the equivalent in NumPy?[/color]

                This is not an either/or choice. Psyco is best used on a few
                bottlenect functions. (Currently, it greatly expands the size of what
                is works on.) If you did a running max with heaps, that would be a
                candidate. Large array calcs done in NumPy as it is meant to be used
                should be fastest.

                If speed is a problem, also look at Weave as Fernando suggested. That
                might be alternative for all or part of running max.

                For spicific answers for your problem on your system, you will have to
                test time yourself. New time_it make simple stuff pretty easy. We
                have given you about as much general answer as is possible.
                [color=blue]
                > 2. Can Numerical Python be psyco-ized? I'm supposing it cannot.[/color]

                Correct, only the Python code calling it, but there would be little
                point to that probably.

                Terry J. Reedy


                Comment

                Working...