GBK parser

May 30, 2008

Last day, le Jacou asks me about the GenBank format, because he is writing a parser for this to integrate in his super-shining-colorfull-visual man-see annotation interface, so he needs a parser to split the file by sequences, each one will be in a separate HTML file. Later he show me his code in C# and after I think how it can be translate in Perl, I suggest:


#!/usr/bin/perl -w

# parseGBK2HTML.pl 
# Juan Caballero @ 2008

$ARGV[0] or die "Usage: parseGBK2HTML \n";

use CGI qw(:standard);
use BIO:SeqIO;

my $file_in = shift @ARGV;
my ($seq_in, $seq_out, $file_out, $seq, $features);

$seq_in = Bio::SeqIO->new(-format=>'genbank', -file=>$file_in);
while ($seq = $seq_n->next_seq() ) {
  $file_out  = $seq->accession_number;
  $file_out .= '.html';
  $features  = $seq->get_SeqFeatures;
  open OUT, ">$file_out" or die "Cannot open $file_out";
  print OUT header, start_html("$file_out");
  print OUT p($features);
  print OUT end_html;
  close OUT;
}

Good: less code, easy to understand, parser is controlled by BioPerl::SeqIO, HTML output is controlled by CGI.
Bad: require 2 modules, maybe modules make slow the routine, can not control the output in get_SeqFeatures, but you can build your own calling the objects inside $seq, it needs some eval points.

That's why I love Perl.

Comments

JacobMay 30, 2008 at 10:24 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies

Add comment

Search This Blog

Linxe's eye

GBK parser

Comments

Post a Comment

Popular posts from this blog

Mandriva 2009

Metric

Motivational posters