Using hashes or arrays for file parsing

**HalfCoded** · Jun 10 '08, 04:04 PM

hi everyone,

I am kind of stuck and therefore would really appreciate some clues:

I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
I need also to keep the data structure
For this I chose the following strategy:

-dumping the files into two arrays
-doing a pattern matching between the two files.
-if it doesn't matches then remove the line.
-if the line has a different structure then keep the line

Here is the part of my script which take the most time
[CODE=perl]
foreach my $line(@CDF)
{

my $wanted;

if ($line =~ /^.*?\t.*?\t.*?\ t.*?\t.*?\t.*?\ t.*?\t.*?\t.*?\ t.*?\t.*?\t(.*? )\t/)
{
print "repeat again\n";
$wanted = ($1);
print $wanted."\n" ;
foreach my $lineB(@Blast)
{
if ($lineB =~ /^($wanted)\s/)
{
print $wanted."\n";
print OUTPUTFILEHANDL E "$line";
}
}
}

[/CODE]

It takes hours to run it and obtain my output file.

Here are my questions:
Trying to only use subsets from the file instead of the complete 90Mb files
I have tried to use coordinate using array like this :

[CODE=perl]

my @array;
print $array[0];

[/CODE]

and then it ends up here printing the first line of the file...whereas I want 12th element of the line to do the comparison.

and also tried to understand hashes

So far I have read that it might be faster to use arrays than hashes therefore

Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?

I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck

I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods

Any help is very welcome as I've been stuck for a while on this...