Pandas: Merging Sorted Dataframes

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RockRoll
    New Member
    • Jul 2020
    • 10

    Pandas: Merging Sorted Dataframes

    Hi,

    I have a large (Nx4, >10GB) array that I need to sort based on col.2.

    I am reading my data in chunks and sorting using Pandas. But I am unable to combine the sorted chunks to give me a final large Nx4 array that is sorted on Col.2. Here is what I have tried yet:

    Code:
    chunks = pd.read_csv(ifile[0], chunksize=50000, skiprows=0,
                         names=['col-1', 'col-2', 'col-3', 'col-4'])
    
    for df in chunks:
        df = df.sort_values(by='col-2', kind='mergesort') # sorted chunks
        print(df)
  • SioSio
    Contributor
    • Dec 2019
    • 272

    #2
    The process when reading the file divided is as follows.
    Code:
    import pandas as pd
    df = None
    for tmp in  pd.read_csv(ifile[0], chunksize=50000, names=['col-1', 'col-2', 'col-3', 'col-4']):
        if df is None:
            df = tmp
        else:
            df = df.append(tmp, ignore_index=True)
    
    df_s = df.sort_values(by='col-2', kind='mergesort')
    print(df_s)

    Comment

    • madankarmukta
      Contributor
      • Apr 2008
      • 308

      #3
      Follow standalone syntaxt for sort_values.

      Thanks

      Comment

      Working...