Using hashes or arrays for file parsing

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • HalfCoded
    New Member
    • Mar 2008
    • 4

    Using hashes or arrays for file parsing

    hi everyone,

    I am kind of stuck and therefore would really appreciate some clues:

    I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
    I need also to keep the data structure
    For this I chose the following strategy:

    -dumping the files into two arrays
    -doing a pattern matching between the two files.
    -if it doesn't matches then remove the line.
    -if the line has a different structure then keep the line

    Here is the part of my script which take the most time
    [CODE=perl]
    foreach my $line(@CDF)
    {

    my $wanted;

    if ($line =~ /^.*?\t.*?\t.*?\ t.*?\t.*?\t.*?\ t.*?\t.*?\t.*?\ t.*?\t.*?\t(.*? )\t/)
    {
    print "repeat again\n";
    $wanted = ($1);
    print $wanted."\n" ;
    foreach my $lineB(@Blast)
    {
    if ($lineB =~ /^($wanted)\s/)
    {
    print $wanted."\n";
    print OUTPUTFILEHANDL E "$line";
    }
    }
    }

    [/CODE]

    It takes hours to run it and obtain my output file.

    Here are my questions:
    Trying to only use subsets from the file instead of the complete 90Mb files
    I have tried to use coordinate using array like this :

    [CODE=perl]

    my @array;
    print $array[0];

    [/CODE]

    and then it ends up here printing the first line of the file...whereas I want 12th element of the line to do the comparison.

    and also tried to understand hashes

    So far I have read that it might be faster to use arrays than hashes therefore

    Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?

    I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck

    I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods

    Any help is very welcome as I've been stuck for a while on this...
    Last edited by HalfCoded; Jun 10 '08, 04:08 PM. Reason: incorrect html tags
Working...