speed up data extraction from large arrays

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • blackdevil
    New Member
    • Mar 2011
    • 2

    speed up data extraction from large arrays

    hi,

    I have three arrays with variable values from a netCDF file. xy contains the vectors for the lon and lat coordinates. I am extracting the coordinates out of the two arrays (lon, lat) according to the vectors in the xy array. So far, I get the correct results, but the speed of the extraction process (for-loop) is extremely poor!!

    Here's my code:

    Code:
    from netCDF4 import Dataset
    import scipy as scp
    import numpy as np
    
    # open netCDF file and create a dataset
    rootfile = "myfile.nc"
    rootgrp = Dataset(rootfile, "r")
    
    # read variable values into arrays
    xy = rootgrp.variables["zipxy"]
    lon = rootgrp.variables["lon"]
    lat = rootgrp.variables["lat"]
    
    # create a new coordinates array
    coordinates = np.array([[],[]])
    
    # loop through the xy array and write the corresponding lon & lat coordinates
    # into the coordinates array
    # NOTE: the vectors in xy begin with [1,1] BUT the index of the values
    # in lon & lat begins with [0,0] -> therefore: y-1, x-1
    for x, y in xy:
    	coordinates = np.append(coordinates,[[lon[y-1,x-1]],[lat[y-1,x-1]]],1)
    The dimensions are:
    xaxis = 1320
    yaxis = 1482
    zip2 = 1080236

    And the shape of the variables:
    zipxy('zip2', 'two')
    lon('yaxis', 'xaxis')
    lat('yaxis', 'xaxis')

    There must be a way to speed up the coordinate extraction (maybe I need another searching method through the arrays...). Until now, I couldn't find any solution.

    Thanks for help!
  • Mariostg
    Contributor
    • Sep 2010
    • 332

    #2
    You should have a read at Python Patterns - An Optimization Anecdote
    . That shall inspire you.

    Comment

    • dwblas
      Recognized Expert Contributor
      • May 2008
      • 626

      #3
      A list/array is slow. Consider using a set or dictionary as they are hashed. It you just want all of the coordinates, a set made up of tuples=(x, y) or ([lat_x, lat_y], [lon_x, lon_y]) will work fine. Your time consumer is probably the lookups here so time the read/create arrays and the for loop separately so you know where the problem is.
      Code:
      # for x, y in xy:
      #     coordinates = np.append(coordinates,[[lon[y-1,x-1]],[lat[y-1,x-1]]],1)

      Comment

      • blackdevil
        New Member
        • Mar 2011
        • 2

        #4
        Thanks a lot for the both answers!
        I will try some possibilities and post the results

        Comment

        Working...