I think in the Perl solution and this is the code:
#!/usr/bin/perl -w
use strict;
=head1 NAME
mapSeq2Chr.pl
=head1 DESCRIPTION
Perl script to find and get the position of each sequence in a
multifasta file into a big sequence (a chromosome).
The comparation is direct, so it only works with sequences in same
direction (typical 5' -> 3').
Output is a simple text file with the name of the sequence, the name
of the big sequence and the positions where match.
Note we match every line with a fasta comment (defined with ">") but
the last sequence is omitted, I add a ">" as last line with:
echo ">" >> fasta
=cut
$ARGV[2] or die "usage: perl mapSeq2Chr.pl CHR SEQ MAP\n";
# Global variables
my ($chr_file) = shift @ARGV; # Fasta file with chromosome
my ($seq_file) = shift @ARGV; # Fasta file with sequences
my ($map_file) = shift @ARGV; # Output file
my ($seq) = ''; # Sequence to map
my ($name) = ''; # Fasta comment
my ($chr) = ''; # Genomic sequence
my ($first) = 'y'; # Flag for first line
print "Reading genome sequences ...\n";
open C, "$chr_file" or die "Cannot open $chr_file\n";
while (<C>) {
next if (/>/);
chomp;
$chr .= $_;
}
close C;
print " loaded $chr_file with ", length $chr, " bases\n";
open F, "$seq_file" or die "Cannot open $seq_file\n";
open O, ">$map_file" or die "Cannot open $map_file\n";
print "Mapping sequences ...\n";
while (<F>) {
chomp;
if ($first eq 'y') {
s/>//;
$name = $_;
$first = 'n';
}
elsif (/>/) {
s/>//;
if ($chr =~ /$seq/) {
print " $name mapped in $chr_file ";
print O "$name\t$chr_file\t";
# We use @- to look for multiple matches
foreach my $match (@-) {
print O "$match:";
print "$match ";
}
print O "\n";
print "\n";
}
else {
print " $name not match in $chr_file\n";
}
$name = $_;
$seq = '';
}
else {
$seq .= $_;
}
}
close F;
close O;
print "Done\n";
=head1 AUTHOR
Juan Caballero (linxe _at_ glib.org.mx)
=head1 LICENSE
Perl Artistic License (http://dev.perl.org/licenses/artistic.html)
=cut
3 comentarios:
Hey Bro!
Then you are able to find the localization (numerical) of your sequences, but it is possible to obtain also a map or graphical position (draw) of your sequences?
I mean, as in mapviewer in maize, you can check the chromosome and the position (graphical view)
I am also need solve the positions of some sequences against a partial sequence (maize genome), I try your script.
See you Bro!
Cesar
Bro:
Yes, you can draw a view using GD or some BioPerl module.
This script is for a perfect match sequence, I don't recommend it for partial sequences (even one base mismatch can be fatal).
Thinking a little more, you can use this code to map some oligos (for PCR or HT sequencing), if you multifasta file include the direct and reverse sequence.
Publicar un comentario en la entrada