Checking for bad dna sequences

**eWish** · Dec 15 '07, 10:01 PM

Welcome to TSDN!

Is this homework? If so,please read our Posting Guidlines. Please post your code that you have tried. Also, have a look at CPAN for a module to assist in what you are doing.

--Kevin

**adriaan** · Dec 15 '07, 10:12 PM

thanks for the reply
youre link to

http://serach.cpan.org/

doesn't seem to work so I don't know what you mean with the module thing.
It's not really homework, it's more of an exercise I was advised to try out
on.
this the code i've written to get the evil sequences out of the database,
I don't really have any usefull code yet on the analyzing section as Im still trying to figure out how to do it

[CODE=perl]sub haalDataBaseOp
{

open (DataBase,"data base.txt");

@data = <DataBase>;

foreach $ziekte (@data)
{

# steek alle ziekte codes in een array
if($ziekte =~ m/(\b[ctga]+\b)(.*)/)
{

$code = $1.$2;

# print $code."\n";

}

# haal het nummer en de naam uit de string
if($ziekte =~ m/(\d+)(.*?)(\b[gtac]+\b)/)
{

# nu maken we een hash met het nummer als keyword naar de ziekte naam
$ziektenaam{$1} = $2;

print $1."\n";

print $code."\n";

print $2."\n";

# we maken ook een hash waarbij het nummer verwijst naar de gevonden ziekte codes
$ziektecode{$1} = $code;

}

}

}[/CODE]
oh yes I'm a Belgian, so I nativly speak dutch and use that in my comments

**eWish** · Dec 15 '07, 10:26 PM

Sorry, about the link to CPAN. I have corrected it. There are serveral bioinformatics modules available that would be designed to handle your request. Also, check out BioPerl.org, in the long run it will be a better solution.

--Kevin

**nithinpes** · Dec 24 '07, 09:58 AM

As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works:
[code=perl]
$/ ="//"; ## input record separator: each sequence ends with //
open(DB,"databa se.txt") or die "sorry:$!";
$pos=0;
$line=1;
while(<DB>)
{
## \1 is to back refer pattern inside parantheses, which searches for
# newline followed by digits
while(/\bgcttgtccac\b( \s*\n\d+)?\s+\b atattttatg\b\1? \s+\bagacgcagcc \b/g)
{
$prev=$`; # get the pattern preceeding your match
$line++ while($prev=~/(\n)/g); # increment whenever newline occurs
@pos= split//,$prev;
foreach (@pos)
{$pos++ if(/[atgc]/);} # get the number of residues preceeding match
print "\n line:$line";
print "\n position: $pos";
$line=1; $pos=0; # reinitialize variables
}

}
[/code]
Regards,
Nithin

**numberwhun** · Dec 24 '07, 02:14 PM

Originally posted by nithinpes

As a reply to your initial posting where you wanted to search the pattern:
gcttgtccac atattttatg agacgcagcc (e.g) which can extend across multpile lines and to return the line number and position, the following code works:
[code=perl]
$/ ="//"; ## input record separator: each sequence ends with //
open(DB,"databa se.txt") or die "sorry:$!";
$pos=0;
$line=1;
while(<DB>)
{
## \1 is to back refer pattern inside parantheses, which searches for
# newline followed by digits
while(/\bgcttgtccac\b( \s*\n\d+)?\s+\b atattttatg\b\1? \s+\bagacgcagcc \b/g)
{
$prev=$`; # get the pattern preceeding your match
$line++ while($prev=~/(\n)/g); # increment whenever newline occurs
@pos= split//,$prev;
foreach (@pos)
{$pos++ if(/[atgc]/);} # get the number of residues preceeding match
print "\n line:$line";
print "\n position: $pos";
$line=1; $pos=0; # reinitialize variables
}

}
[/code]
Regards,
Nithin

First, when posting code into the forums, please be sure and use the proper code tags. That way, we moderators don't have to clean up behind you and add them to what you just posted. (As I have done here).

Next, just out of curiosity, have you checked out the bioperl website? I have seen this site referenced to others working with genomics and such and they ahve always said it was very helpful.

Regards,

Jeff

**nithinpes** · Dec 26 '07, 08:41 AM

Hi Jeff,

I'm sorry for that. I have checked bioperl website, that's indeed very helpful in the long run.

Regards,
Nithin

Checking for bad dna sequences

Checking for bad dna sequences

Comment

Comment

Comment

Comment

Comment

Comment