Thursday, July 24, 2008

Bioinfo tips

Yesterday I was disturbed of my coding-state to answer a basic problem, which's the best font to display a sequences alignment?

First introduce the concept of sequence alignment, if you want to compare 2 or more sequences in their composition of nucleic bases (for DNA or RNA) or amino acid (for peptids), you must search similar blocks between all the context and try to fix the rest as possible. Many algorithms has been created to perform this task, with different approaches, if you want to know more, please go to Wikipedia or bioinformatics books.

Ok, let's consider we have our alignment, really this is a text file with a special structure to easy visualize the blocks found. Some people make the mistake to cut-and-paste in a text document without change the font style, a font is just an image to represent letters or symbols, some are very artistic but generally the font does not conserve a proportional aspect in size, so the beautiful alignment changes its proportion and visually loose the blocks (there are inside but with different proportions).

Then we need to decide which font is the best, in this case a web service showing the information, so, courier? monospace? None.

The best way to insert an alignment in a HTML file is using the tags pre or code, then you can ignore the typo used by the browser, because you can expect the user's system have not your predefined font.

Remember the LaTex principle of you must be focus on content, the appearance is apart.

Finally a Perl's best practices to work with big sequences. You need to insert a big sequence in other, memory and time are precious. Consider for this example a sequence in $seq, an insert in $ins and the position $pos.

BAD:

my $left = substr ($seq, 0, $pos);
my $right = substr ($seq, $pos);
my $seq = $left . $ins . $rigth;


GOOD:

my $p = substr ($seq, $pos, 1);
substr ($seq, $pos, 1) = $p . $ins;


When you use big sequences you note the difference.

No comments:

Post a Comment