Joining(merging) two files.

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sumittyagi
    Recognized Expert New Member
    • Mar 2007
    • 202

    Joining(merging) two files.

    Hi All,

    I am stuck with one tricky situation here.

    The situation is as follows:-
    I have two files, both files have two columns - space seperated key value pairs.

    Now say files are f1 and f2.
    and columns in f1 are f1c1, f1c2; and columns in f2 are f2c1 and f2c2.

    f2c1 is a subset of f1c1 (say keys).
    f1c2 and f2c2 are different (say values).

    Now I want to merge f1 and f2 into a file f3, so that f3 has three columns (space seperated):-
    f1c1, f1c2, f2c2. (in DB terms I want to perform a left outer join).

    I have no clue how to perform that. Can anybody assist me in this.

    Thanks in advance for your valuable inputs.

    Best Regards,
    Sumit
    Last edited by sumittyagi; Sep 20 '09, 05:48 AM. Reason: spell correction
  • sumittyagi
    Recognized Expert New Member
    • Mar 2007
    • 202

    #2
    I am half way through:-

    I tried to use join command, but it seems to compare in an ambiguous manner:-

    Below is the sample output:-
    file1 is:-
    file1:-
    1 51
    2 52
    3 53
    4 54
    5 55
    6 56
    7 57
    8 58
    9 59
    10 60
    11 61
    12 62
    13 63
    15 65
    16 66
    17 67
    18 68
    20 70
    22 72
    24 74
    ...
    ...
    100 150
    file2 is:-
    file 2:-
    1 2605
    2 1486
    3 2783
    4 2714
    5 26892
    6 2645
    7 2838
    8 84
    9 143
    10 26962
    11 27068
    12 27168
    13 27250
    15 27330
    16 27425
    17 27507
    18 27594
    20 27693
    22 27785
    24 27878
    ...
    ...
    100 3291
    join command gives output for one digit matches only for the key match column as shown below:-
    Code:
    join -j1 1 -j2 1 -o 0,1.2,2.2 -t " " file1 file2
    1 51 2605
    2 52 1486
    3 53 2783
    4 54 2714
    5 55 26892
    6 56 2645
    7 57 2838
    8 58 84
    9 59 143
    Note:- I don't want to use for loops to perform this task because if there are 100 rows in the files then there will be approx. 100X100 iterations, that will severely hamper the performance of the script.

    Comment

    • sumittyagi
      Recognized Expert New Member
      • Mar 2007
      • 202

      #3
      I think arrays will solve my problem...

      Any other suggestions?

      Comment

      • sumittyagi
        Recognized Expert New Member
        • Mar 2007
        • 202

        #4
        Yes, arrays solved it out. it took 3 full iterations (if there are 100 elements then 3X100=300 iterations => two for assignment and one for display).

        Any other improvement suggestion will be appreciated.

        Comment

        Working...