mapping of genomic positions within start and end positions

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • haobijam
    New Member
    • Oct 2010
    • 16

    mapping of genomic positions within start and end positions

    Mr/ Mrs,
    I am running a perl script for mapping the genomic positions i.e. the start and end positions to retrieve positions within the start and end. The script runs correctly in smaller text files, but it was running slowly for almost 4 days, how it could be run faster by rectifying the script attach here. The size for [COLOR="Red"]snp129.txt is 1.60 GB[/COLOR] and [COLOR="red"]TARGETSCAN is 2.20 MB[/COLOR] I would be glad for your kindness. Hoping to recieve response soon.

    Regards,
    Rocky
    SNU, College of Medicine
    Seoul

    Code:
    #!/usr/bin/perl -w
    #use strict;
    
    
    open(IN, "/home/haojamrocky/snp129.txt") or die print "Enter a valid file name on command line\n";
    @snp = <IN>; ## assigning snp 
    close IN;
    
    open(IN2, "/home/haojamrocky/DATA/hg18/miRTarget/TARGETSCAN") or die print "Cannot open mirna file\n";
    @PITA = <IN2>; ## assigning mirna
    close IN2;
    
    for ($i=0; $i <= $#snp; $i++){
    ###################### SNP############################
            #$geneName = (split(/\t/,$refFlat[$i]))[0];
            #$name = (split(/\t/,$refFlat[$i]))[1];
            $chrom1 = (split(/\t/,$snp[$i]))[1];
            $chromStart1 = ((plit(/\t/,$sn)[$i]))[2];
            $chromEnd1 = (split(/\t/,$snp[$i]))[3];
    ########################(############)################
            foreach my $line(@PITA){     )
    ###################### m(rna #)#######################
                    $chrom2 = (split(/\t/,$line))[0];
                    $chromStart2 = (split(/\t/,$line))[1];
                    $chromEnd2 = (split(/\t/,$line))[2];
    #######################################################
            #            print "$chrom1 $chrom2\n";
    
                            if( $chrom1 eq $chrom2 && ( $chromStart1 >= $chromStart2 && $chromEnd1 <= $chromEnd2 )){
                            chomp $snp[$i]; chomp $line;
                            print "$snp[$i] \t $line \n";
                            }
                    }
            }
    print "\n";
    exit;
    Attached Files
  • RonB
    Recognized Expert Contributor
    • Jun 2009
    • 589

    #2
    The script you posted won't compile, so there's no possibility that it could run for 4 days.

    1) Uncomment the 'use strict;' line and add a 'use warnings;' line.

    2) Declare all of your vars with the 'my' keyword.

    3) Don't slurp the 1.6GB file into an array. Instead, loop over it line by line.

    4) Load the smaller file into a HoA (Hash of Arrays).

    5) Don't use the C style for loop. It's cleaner to use Perl's for loop style.

    6) Don't use 3 separate split statements per line. Use only 1 split statement to extract the 3 values.

    Comment

    Working...