reading and merging two files at the same time

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • simplyme7
    New Member
    • Sep 2007
    • 6

    reading and merging two files at the same time

    hi

    I have two files,
    first file format: (It is 4882 raws)
    userId country operatingSystem


    Second file format is: (It is 400,000 raws)
    userId time


    I need to merge them as follows:
    If userId in both files is the same then the second file shouldd be like this:
    userId time country operating system

    I made the program in matlab but it takes along time and I'd like to know how to do it in perl. is it possible to read from 2 files in the same time in perl?

    Here is Matlab code:

    Code:
    for i=1:4882
        for j=400,000
            if (user(i,:) == Xtrain(j));
               Xtrain(j,:) = user(i,:);
            end
        end 
    end
    Thanks for help
    Last edited by eWish; Mar 4 '08, 05:01 PM. Reason: Please use code tags.
  • eWish
    Recognized Expert Contributor
    • Jul 2007
    • 973

    #2
    I would suggest that you open and read the contents of file 2 and store the data in a hash having the userid as the key and time as the value. Then read the the other file line by line. Split the line into the 3 parts. While looping through the file compare the keys with the first element of the split. Then write it to your file in the format you desire.

    --Kevin

    Comment

    • eWish
      Recognized Expert Contributor
      • Jul 2007
      • 973

      #3
      If you get stuck or have questions on what I posted just let us know and we will try to help.

      --Kevin

      Comment

      • mhalder
        New Member
        • Mar 2008
        • 3

        #4
        [CODE=perl]open (HANDLE2,"+<fil e2.txt");
        my %hash1=();

        while( <HANDLE2>)
        {
        my ($uid,$time)=sp lit /\s/,$_;


        $hash1{"$uid"}= "$time";

        }

        open(HANDLE1,"f ile1.txt");

        while(<HANDLE1> )
        {
        my ($uid,$country, $osname)=split /\s/,$_ ;

        open HANDLE,">output .txt";


        if(exists $hash1{$uid})
        {
        print HANDLE "$uid\t $hash1{$uid}\t$ country\t$osnam e\n";

        }
        }
        close HANDLE2;[/CODE]
        Last edited by eWish; Mar 5 '08, 02:08 PM. Reason: Please use code tags

        Comment

        • mhalder
          New Member
          • Mar 2008
          • 3

          #5
          [CODE=perl]open (HANDLE2,"+<fil e2.txt");
          my %hash1=();

          while( <HANDLE2>)
          {
          my ($uid,$time)=sp lit /\s/,$_;
          $hash1{"$uid"}= "$time";

          }

          open(HANDLE1,"f ile1.txt");

          while(<HANDLE1> )
          {
          my ($uid,$country, $osname)=split /\s/,$_ ;
          open HANDLE,">output .txt";
          if(exists $hash1{$uid}) {
          print HANDLE2 "$uid\t $hash1{$uid}\t$ country\t$osnam e\n";
          #you can also open a new file and put ypor output in that file
          }
          }
          close HANDLE2;
          close HANDLE1;

          _DATA_
          file1:
          3444 india sunsolaris
          3456 japan sun os
          3452 eng windows
          3224 germany ubuntu
          1234 usa linux

          file2:-

          1234 12.3.45
          3452 03.2.23[/CODE]


          ..i think this will work but i wrote it in a hurry so please check with some other combinations before implementing.
          Last edited by eWish; Mar 5 '08, 02:09 PM. Reason: Please use code tags

          Comment

          • minowicz
            New Member
            • Feb 2008
            • 12

            #6
            Given this data:

            Originally posted by mhalder
            _DATA_
            file1:
            3444 india sunsolaris
            3456 japan sun os
            3452 eng windows
            3224 germany ubuntu
            1234 usa linux
            This line is going to cause one minor problem:

            Originally posted by mhalder
            my ($uid,$country, $osname)=split /\s/,$_ ;
            In the case of a $osname that should be "sun os" you will get $osname of simple "sun" and the "os" will be discarded as a the 4th positional return from the split is not captured in your assignment.

            There are probably a few ways to solve this problem. Off the top of my head, the one that comes to mind is something like:

            Code:
            my @temparray = split;
            my $uid = shift @temparray;
            my $country = shift @temparray;
            my $osname = join(' ',@temparray);
            Since you've chosen to split on '\s' instead if ' ' this may alter the $osname if you wish to preserve tabs or other forms of whitespace.

            Another possibility (which I prefer) is to do something like:

            Code:
            my ($uid, $country, $osname) = /^(\d+)\s(\w+)\s(\.*)$/;
            There are problems with both aproaches, depending on how well formed your data is. For instance, I'm still assuming that all countries are expressed with a single word name, not multiple words as in "united states" (since I see you using "usa" already). I also assume that all instances of $uid are expressed as numbers and that all instances of $county are expressed as alphanumeric only. In each case the matches could be more generalized buy using \S in place of either \d or \w.

            Evaluating the assignment inside a conditional of some sort would also allow you to check the format of the file to some degree as you go.

            Code:
            $linenum++;
            unless (my ($uid, $counry, $osname) = /^(\d+)\s(\w+)\s(\.*)$/ ) then {
                 print STDERR "Skipping line number $linenum as it does not seem to conform to 'uid country osname' format expectations.\n";
            }

            Comment

            • simplyme7
              New Member
              • Sep 2007
              • 6

              #7
              mmm in fact yes I'm stuck!

              It's my first time to deal with hashes and I still can't get it!

              Comment

              • eWish
                Recognized Expert Contributor
                • Jul 2007
                • 973

                #8
                Here is a simple script that should do what you are wanting. I have added comments to help explain what is happening.

                [CODE=perl]#!/usr/bin/perl -T

                use strict;
                use warnings;

                my $source_file1 = 'path/to/somefile1.txt';
                my $source_file2 = 'path/to/somefile2.txt';
                my $dest_file = 'path/to/somefile3.txt';

                my %hash = ();

                # Open the file that contains the userid and time data.
                open (my $FILE1, '<', $source_file1) || die "Can't open $source_file1: $!\n";
                while(my $line = <$FILE1>) {

                # Get rid of the newline characters.
                # Split the line into userid and time. Splitting on the tab. Change if not a tab delimited file.
                chomp my ($userid, $time) = split(/\t/, $line);

                # Add the userid to the hash.
                # The key is the userid and then time will be the value.
                $hash{$userid} = $time;
                }
                close($FILE1);

                # Open the file that contains the userid, country and operatingSystem data.
                open (my $FILE2, '<', $source_file2) || die "Can't open $source_file2: $!\n";

                # Open the new file where the data will be stored.
                # This will overwrite the entire file if it exists.
                # If it does not exist it will be created.
                open (my $FILE3, '>', $dest_file) || die "Can't open $dest_file: $!\n";
                while(my $line = <$FILE2>) {

                # Get rid of the newline characters.
                # Split the line into userid, country, operatingSystem .
                # Splitting on the tab. Change if not a tab delimited file.
                chomp my ($userid, $country, $operatingSyste m ) = split(/\t/, $line);

                # Only grab the userid's that match the userid of the first file.
                for ( grep /^$userid$/, keys %hash ) {

                # Create the line of data formatted however it is desired.
                my $data = qq{$userid\t$ha sh{$userid}\t$c ountry\t$operat ingSystem\n};

                # Print $data to the file.
                print $FILE3 $data;
                }
                }

                # Close the two files.
                close($FILE3);
                close($FILE2);[/CODE]
                --Kevin

                Comment

                Working...