how to reduce the time complexity while reading files

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Manogna
    New Member
    • Jan 2008
    • 10

    how to reduce the time complexity while reading files

    hi! all,

    in a directory nearly 10 zipped file are available.
    totally the size of the all files is nearly 15GB.

    i have to retrive the line which dont have the text "ORA" from each file and i have to write this data to a another big file.

    i got it but it is taking the time of nearly 5 minutes to complete the process.But i have to process 7 directories at a time..so totally it is taking so much time..

    i wrote the code as..

    [CODE=perl] !#use/bin/perl
    @filenames=</home/dir/*.gz>;

    open(OUT ,">bigfile") ;
    foreach $file(@filename s)
    {
    open(IN,"gzcat $file|");
    while($line=<IN >)
    {
    next if($line=~/^ORA | ^$/);
    print OUT $line;
    }
    close IN;
    }# end for

    close OUT;
    [/CODE]
    this is only for one directory..like this seven directories r there.

    if any one knows better way to do this..in order to reduce the time comlexity plz help me as i m new to perl.

    thank & regards,
    Manogna.
    Last edited by eWish; Mar 5 '08, 11:26 PM. Reason: Please use code tags
  • minowicz
    New Member
    • Feb 2008
    • 12

    #2
    I realize this isn't exactly a perl answer, but why not simply:

    zgrep -vh ^ORA dirname/*.gz > bigfile

    or if you don't have zgrep:

    gzip -dc dirname/*.gz | grep -v ^ORA > bigfile

    As to having multiple directories, it is not clear if you want each to be processed in sequence and appended to the single bigfile, or if you want them each to be processed in parallel and put into their own bigfile.

    Comment

    • Manogna
      New Member
      • Jan 2008
      • 10

      #3
      thank you very much! for ur response.

      i want to write the data by parellel execution of the respective directory files to thier respective big files.

      i tried ur code its working properly but i am willing to use the regular expressions in that.
      i tried as follows but its is not worikng properly...

      zegrep -vh "^ORA |^\s*$ |read_time|" dirname/*.gz > bigfile

      with in fraction of seconds i must write all the data from each directory to their respective big files.

      and i want to parse the lines to take only first three fields from the about script before writing to the big file..

      is it possible?


      please help me..

      thanks and regards,
      Manogna.

      Comment

      Working...