Monday, December 22, 2008

Mac vs PC

The old battle of Mac vs PC, in this video the fight is with Transformers laptops ... cool concept.


By the way, the Linux Foundation has a contest for "I'm Linux" tv-commercial, full rules here.

Friday, December 19, 2008

Perl Life


Wednesday, December 17, 2008

PS3 cluster

So do you have some PS3 and you're boring of play and watch Blue-Ray movies?

You must know the PS3 is the most accessible CELL-based machine, why not use this computer problem in other areas beyond entrainment.

In this PS3ClusterGuide you can build step-by-step your own cluster with PS3, the receipt:

1. Take some PS3.
2. Interconnect it in a LAN.
3. Install Fedora PPC in each one.
4. Install cluster utils.
5. Learn to code in CELL-SDK for a better experience and performance.

I accept some PS3 for scientific probes ... really I just want one to play ...

Sunday, December 14, 2008

Tuesday, December 9, 2008

The Seven Deadly Sins of Bioinformatics

This presentation is a good think-deep-later for anyone in this area, some quotes are fantastic and many points are well focused, others I can't agree. Must watch and think.

Thursday, December 4, 2008

Bioinfomatic in web2.0

DNASIS SmartNote is an on-line service for bioinformatic analysis integration, currently it's like Google Docs but specific for bioinformatic, it looks good and very expandable, it includes some basic tools:

  • Blast - NCBI
  • Codon Usage
  • CpG Islands
  • DNA Stats
  • Hetero Dimer - IDT Oligo Analyzer
  • Oligo Analyzer - IDT
  • PCR Primer Stats
  • Protein Stats
  • Reverse Complement
  • Reverse Translation
  • Translate Tool - ExPASy
  • ClustalW - EBI
  • Multiple Alignment Editor - Jalview
  • Pairwise Alignment - EBI EMBOSS
  • T-Coffee - EBI
  • Blast - NCBI
  • CpG Islands
  • DNA Pattern Find
  • Fuzzy DNA Search
  • Fuzzy Protein Search
  • Neural Network Promoter Prediction
  • ORF Finder
  • Primer Map
  • Protein Pattern Find
  • Restriction Digest
  • Restriction Map
  • TMHMM Server v. 2.0
  • VecScreen - DDBJ
  • xTAG Software
  • ASPE - Probe Design Tool for ASPE
  • Hetero Dimer - IDT Oligo Analyzer
  • Oligo Analyzer - IDT
  • PCR Primer Stats
  • Primer Map
  • Primer3
  • PrimerX
  • siRNA Design - DEQOR
Check the video for a demo.

Tuesday, December 2, 2008

Advice for potential graduate students

I found this text,copy and paste because it's an advice and must be in every biological laboratory.


We currently have room in the lab for more graduate students.

But before you apply to this lab or any other, there are a few things to keep in mind. First, be realistic about graduate school. Graduate school in biology is not a sure path to success. Many students assume that they will eventually get a job just like their advisor’s. However, the average professor at a research university has three students at a time for about 5 years each. So, over a career of 30 years, this professor has about 18 students. Since the total number of positions has been pretty constant, these 18 people are competing for one spot. So go to grad school assuming that you might not end up at a research university, but instead a teaching college, or a government or industry job. All of these are great jobs, but it’s important to think of all this before you go to school.

Second, choose your advisor wisely. Not only does this person potentially have total control over your graduate career for five or more years, but he/she will also be writing recommendation letters for you for another 5-10 years after that. Also, your advisor will shadow you for the rest of your life. People will always think of you as so-and-so’s student and assume that you two are somewhat alike. Finally, in many ways you will turn into your advisor. Advisors teach very little, but instead provide a role model. Consciously and unconsciously, you will imitate your advisor. You may find this hard to believe now, but fifteen years from now, when you find yourself lining up the tools in your lab cabinets just like your advisor did, you’ll see. My student Alison once said that choosing an advisor is like choosing a spouse after one date. Find out all you can on this date.

Finally, have your fun now. Five years is a long time when you are 23 years old. By the end of graduate school, you will be older, slower, and possibly married and/or a parent. So if you always wanted to walk across Nepal, do it now. Also, do not go to a high-powered lab that you hate assuming that this will promise you long-term happiness. Deferred gratification has its limits. Do something that you have passion for, work in a lab you like, in a place you like, before life starts throwing its many curve balls. Your career will mostly take care of itself, but you can’t get your youth back.

If, after reading this, you want to apply to this lab, we would love to hear from you.

By: Sönke Johnsen is deep-sea biologist and visual ecologist at Duke University and still can’t believe that his background in math, art, and writing got him a paying job, let alone one that lets him go down in submarines. In his spare time, he takes pictures (see and works with his daughter to unlock new levels on Mariokart Wii.

Source: The Science Creative Quaterly.

Monday, December 1, 2008

Slower with age ... 4rd. part and last

I'm starting to be bored with this comparisons, I've not report the last one comparing "apples with pears" between Ubuntu vs. OpenSolaris vs. FreeBSD benchmarks, but this is a more realistic comparison, take the new shinning Fedora 10 and Ubuntu 8.10 for benchmarking, the results: nearly identical perfomance in 32 and 64 bits. Not surprises here.

Yes, U-fans claim better hardware support and more user-friendly-desktop, I don't believe that 'cause Fedora is a really good Linux distribution, I had tested and used before.

Whatever, for test and compare distros you can use the Phoronix Test Suite.

I'm still waiting openSuse 11.1 and maybe try gOS, while I'm saving for an Acer Aspire One.

Tuesday, November 25, 2008

Human annotation of sequences

The "Metagenome Annotation Using a Distributed Grid of Undergraduate Students" is a nice article in PLoS Biology, it remind me than the annotation of sequences is a common problem and hard to implement automatically by programs and a human can "decide" better than a machine (what about AI for sequence annotation?).

I like the article tittle using "distributed grid" terms, maybe it is also considered as "heterogeneous nodes". LOL

So, the strategy is to mix some students, computers with internet access, sequences and a control version system (validated by a supervisor), the work flow is:

Step 1: screen a sequence

Step 2. Validate the annotation

And the best part you can resolve 2 problems in one hit: 1. annotate your sequences, 2. teach the students how to use bioinformatic tools and annotation.

Some years ago, I participate in a similar version of "annotathon", more simplistic because was based in the sequence homology in DBs, but it was very fun.

Monday, November 24, 2008

BASIC Stamp supercomputer

The BASIC Stamp supercomputer is a cool project of humanoido, the system is a small portable cluster with batteries, he remarks the advantages:
  • Smaller
  • Lighter
  • Portable
  • Field operable
  • Runs on batteries
  • Has the greatest number of (I/O)
  • Has the greates number of sensors/variety
  • Lower power consumption
  • Lower unit cost
  • Easy to program

The nodes are 12 Parallax Basic Stamp microcontrollers, many wires and some LCD screens. I love the motivations:
  • Learning experiences & challenges
  • Expanding education & knowledge
  • Gaining useful background for career
  • Research Benefits
  • Extending Basic Stamp power
  • Creating new inventions, ideas, applications
  • Own your own, prestige
  • School project, credit
  • Involvement, sense of great accomplishment
  • Psychological relaxation, Symbolic Value
  • Sharing, making new friends
And the video is the coolest part (great sound!):

Thursday, November 20, 2008

Perl universe

I know it!

Original in XKCD:
What's your favorite programmer computer cartoon?:

Wednesday, November 19, 2008

Perl recursion for oligo creation

#!/usr/bin/perl -w
use strict;

=head1 NAME


Perl script to generate all possible combinations of size k
using an alphabet @a, we use function recursion.

My intention is to create all possible oligonucleotides (DNA
alphabet or ACGT) but can be extended to any other field
using a different alphabet.

Output also can be printed in other forms, you can put other
delimiters in the push function or in the final array printed.


my ($k) = 4; # definition of the word size
my @a = qw/A C G T/; # definition of the alphabet
my @words = createWords($k, @a);# main function
print join("\n", @words), "\n"; # print output

sub createWords {
my $k = shift @_; $k--;
my @old = @_;
my @new = ();
if ($k < 1) {
return @old;
else {
foreach my $e (@old) {
foreach my $n (@a) {
push @new, "$e$n"; # add new element
createWords($k, @new); # recursion call

=head1 AUTHOR

Juan Caballero @ 2008

=head1 CONTACT

linxe __a__

=head1 LICENSE

Perl Artistic License v2.0


Tuesday, November 18, 2008


This is an environment like "Minority Report", some actions are just visually stunning but video editing and 3D molding are fantastic. The gloves looks very weird.

g-speak overview 1828121108 from john underkoffler on Vimeo.

Monday, November 17, 2008

I'm a PC ... but I use Linux

Last commercials from Windows show us a "real PC guy", because the "I'm a Mac" from Apple, this Spanish-English video show the PC but with Linux.

Personally, I'm a complex mix of Linux/Windows/Mac/Solaris.

Wednesday, November 12, 2008

The Cray Returns

This week the new fastest and most powerful computer of the world is the NCCS's Jaguar, a big cluster build by Cray and AMD, with a 1.64 petaflops of peak capacity. This means the return of the king of super-computers, Cray Inc. who was the reference for many years of the most powerful systems for computing.

Now, with AMD build a cluster system (with Linux of course) and reclaim the Top 1.

The cabinets also looks cool:

Monday, November 10, 2008

The Matrix runs on Windows XP


Thursday, November 6, 2008

Slower with age ... 3rd. part

Again Phoronix had tested the big U, now the rival is MacOS X. They used a MacMini and installed Ubuntu 32b and 64b with BootCamp.

Ok, some rounds Ubuntu won, like this:

but others MacOS X was the best:

Just one comment, remember than MacOS X is still in 32b, some parts are migrated to 64b but the kernel and base system is 32b native. Should MacOS X in 64b native be a big difference? Only the Snow Leopard knows and I predict: Yes.

Tuesday, November 4, 2008

Simple assembly

Last week my good friend M asks me to create a script to draw some sequences, the main problem is to visually see differences in transcript orientation (sense and antisense), some time ago I created similar tools for mapping short sequences like 454 pyro or siRNAs. Now the big problem is to have a good assembly, many regions can extend a "contig" to the left or the right, so internal coordinates change every time you add a new element, with short sequences you have a target well defined and the extension is relative to it.

After thinking a little, I decide not extend my code to the assembly (yes, I'm lazy) and use Phrap to perform the job. A parser read the output and extract the alignment information.

Second problem was the fasta naming convention, the original fasta file is a mixed names of other sequences, so I decided to create and tag an unique ID, "seq_##".

Extending the problem, M asks me to be possible to add more sequences, based in the sequence homology, or blast output, then I added a small option to blast against another fasta file and add the matches to the assembly sequences.

Lot of work, require GD for graphics (with 1 base = 1 pixel) and executables of phrap/blast, the output are the assemblies with reports in GFF format and pretty images in PNG.

An example:

This is the code:

#!/usr/bin/perl -w
use strict;
use Getopt::Long;



Script to simply assembly, the FASTA input is re-named for unique IDs,
prepared for blast and search blastn against itself, the blast output
is parsed and "contigs" are assembled and a GFF3 file is created.


my ($help); # Help flag
my ($f); # principal Fasta file
my ($a); # secondary fasta file to add
my ($plot); # Plot flag
my ($gff); # Print GFF files flag
my ($phrapdir) = '.'; # where is Phrap ?
my ($blastdir) = '.'; # where is Blast ?
my ($blastprog) = 'megablast'; # What program (megablast | blastall -p blastn)
my ($blastopt) = '-e 0.000001 -D 3 -a 2'; # some Blast options

die "Cannot find Phrap exec !!!\n" unless(-e "$phrapdir/phrap");

usage() if(!GetOptions(
'add|a:s' =>\$a,
usage() if(defined $help);
usage() unless(defined $f and -e $f);

if (defined $a) {
print "Preparinf files in $a to be blasted\n";
die "Cannot find Blast binary !!!\n" unless (-e "$blastdir/$blastprog");
die "Cannot find formatdb binary !!!\n" unless (-e "$blastdir/formatdb");
system ("$blastdir/formatdb -i $a -p F -o F");

print "Blasting $f in $a db\n";
system ("$blastdir/$blastprog -i $f -d $a $blastopt -o blast.out");

print "Adding new sequences to $f";
my %addme = ();
open B, "blast.out" or die "cannot open blast.out\n";
while (<B>) {
next if(/#/);
my ($s, $h, @rest) = split (/\t/, $_);
$h =~ s/lcl\|//;
close B;
local $/ = '>';
open A, "$a" or die "cannot open $a\n";
open F, ">>$f" or die "cannot open $f\n";
while (<A>) {
foreach my $m (keys %addme) {
if ($_ =~ /$m/) {
print F ">$_";
undef $addme{$m};
close A;
close F;

open F, "$f" or die "cannot open $f\n";
open T, "> $f.tmp" or die "cannot open $f.tmp\n";
my %ids = ();
my ($id) = 0;
while (<F>) {
if (/>/) {
s/>/>seq_$id | /;
$ids{"seq_$id"}{'name'} = $_;
else {
$ids{"seq_$id"}{'len'} += length $_;
print T "$_\n";
close F;
close T;

system ("$phrapdir/phrap $f.tmp > $f.phrap_out");

open P, "$f.phrap_out" or die "cannot open $f.phrap_out\n";
my ($active) = 0;
my ($contig) = '';
my %contigs = ();
my %contigs_len = ();
while (<P>) {
if (/^Contig\s+(\d+)\.\s+(\d+)\s+reads;\s+(\d+)/) {
if ($2 > 1) { # More than one read by contig
$contig = "contig_$1";
$contigs_len{$contig} = $3;

elsif ($active >= 1) {
if (/^\s+(\d+)\s+(\d+)\s+(seq_\d+)/) { # Capture forward sequeces
$contigs{$contig}{$3}{'ini'} = $1;
$contigs{$contig}{$3}{'end'} = $2;
$contigs{$contig}{$3}{'dir'} = '+';
#print "$contig $1 $2 $3 +\n";
elsif (/^C\s+(\d+)\s+(\d+)\s+(seq_\d+)/) { # Capture reverse sequences
$contigs{$contig}{$3}{'ini'} = $1;
$contigs{$contig}{$3}{'end'} = $2;
$contigs{$contig}{$3}{'dir'} = '-';
#print "$contig $1 $2 $3 -\n";
else {
$active = 0;
close P;

createGFF() if(defined $gff);
createPNG() if(defined $plot);


=head2 createGFF

Read the sequence info and print a GFF v3.0 file for each contig


sub createGFF {
print "Creating GFF files ...\n";
foreach my $c (keys %contigs) {
open G, ">$c.gff" or die "cannot write $c.gff\n";
my $l = $contigs_len{$c};
print G "##gff-version 3\n##sequence-region $c 1 $l\n";
#print "contig=$c len=$l\n";
foreach my $s (keys %{ $contigs{$c} }) {
my $i = $contigs{$c}{$s}{'ini'};
my $e = $contigs{$c}{$s}{'end'};
my $d = $contigs{$c}{$s}{'dir'};
my $n = $ids{$s}{'name'};
#print "contig=$c seq=$s ini=$i end=$e dir=$d name=$n\n";
print G "$c\tassembly\tblock\t$i\t$e\t.\t$d\t.\t$n\n";
close G;

=head2 createPNG

Read the sequence info and print a PNG map file for each contig


sub createPNG {
use GD;
print "Creating PNG images ...\n";

# define color schemes
my $col_f = 'silver'; # String +
my $col_r = 'gold'; # String -
my $col_t = 'black'; # Text color
my $col_l = 'navy'; # Lines
my $col_b = 'red'; # Bold

foreach my $c (keys %contigs) {
open G, ">$c.png" or die "cannot write $c.png\n";
my $l = $contigs_len{$c};
# Number of sequences to plot
my $nseq = 2; # Base sequence (contig) and the ruler
foreach my $s (keys %{ $contigs{$c} }) { $nseq++; }

# Create image
my $im = new GD::Image($l + 20, $nseq * 20);
my $x = 10;
my $y = 5;

# Allocate some colors
my %colors= ();
$colors{'white' } = $im->colorAllocate(255,255,255);
$colors{'black' } = $im->colorAllocate( 0, 0, 0);
$colors{'red' } = $im->colorAllocate(255, 0, 0);
$colors{'blue' } = $im->colorAllocate( 0, 0,255);
$colors{'grey' } = $im->colorAllocate(190,190,190);
$colors{'silver' } = $im->colorAllocate(191,191,191);
$colors{'maroon' } = $im->colorAllocate(128, 0, 0);
$colors{'purple' } = $im->colorAllocate(128, 0,128);
$colors{'green' } = $im->colorAllocate( 0,128, 0);
$colors{'lime' } = $im->colorAllocate( 0,255, 0);
$colors{'olive' } = $im->colorAllocate(107,142, 35);
$colors{'yellow' } = $im->colorAllocate(255,255, 0);
$colors{'gold' } = $im->colorAllocate(255,215, 0);
$colors{'orange' } = $im->colorAllocate(255,127, 0);
$colors{'navy' } = $im->colorAllocate( 0, 0,128);
$colors{'teal' } = $im->colorAllocate( 0,128,128);
$colors{'aqua' } = $im->colorAllocate( 0,255,255);

# Print the ruler
$im->line($x, $y, $x + $l, $y, $colors{$col_l});
for (my $i = $x; $i <= $x + $l ; $i += 100) { $im->line($i, $y - 3, $i, $y + 3, $colors{$col_l});
$im->string(gdTinyFont, $i + 1, $y, $i - 10, $colors{$col_b});
$y += 20;

# Print the contig
$im->filledRectangle($x, $y, $x + $l, $y + 12, $colors{$col_f});
$im->string(gdTinyFont, $x+5, $y, "$c($l)", $colors{$col_t});
my $arrow = new GD::Polygon;
$arrow->addPt($x + $l, $y - 3);
$arrow->addPt($x + $l + 5, $y + 6);
$arrow->addPt($x + $l, $y + 15);
$arrow->addPt($x + $l, $y - 3);
$im->filledPolygon($arrow, $colors{$col_f});

# Add each sequence
foreach my $s (keys %{ $contigs{$c} }) {
$y += 20;
my $i = $contigs{$c}{$s}{'ini'};
my $e = $contigs{$c}{$s}{'end'};
my $d = $contigs{$c}{$s}{'dir'};
my $n = $ids{$s}{'name'}; $n =~ s/>//g;

# define the color
my $col = $col_f;
$col = $col_r if ($d eq '-');

# Main block
$im->filledRectangle($i, $y, $e, $y + 12, $colors{$col});

# Create the arrow tip
my $arw = new GD::Polygon;
if ($d eq '+') { # to the left
$arw->addPt($e, $y - 3);
$arw->addPt($e + 5, $y + 6);
$arw->addPt($e, $y + 15);
$arw->addPt($e, $y - 3);
else { # to the right
$arw->addPt($i, $y - 3);
$arw->addPt($i - 5, $y + 6);
$arw->addPt($i, $y + 15);
$arw->addPt($i, $y - 3);
$im->filledPolygon($arw, $colors{$col});

# add the name
my $flen = $ids{$s}{'len'};
my $rmat = $e - $i + 1;
$im->string(gdTinyFont, $i+5, $y, "$n($rmat/$flen)", $colors{$col_t});

# make sure we are writing to a binary stream
binmode G;
# Convert the image to PNG and print it
print G $im->png;
close G;

=head2 usage

Print help if no values or aks for help


sub usage {
print <<__HELP__
Usage: -f FASTA [-plot] [-gff] [-add FASTA]

-help | -h Print this screen
-file | -f Fasta file to use
-add | -a Add the fasta file with blast hits
-plot | -p Create PNG images for each contig
-gff | -g Create GFF3 files for each contig
exit 1;

=head1 AUTHOR

Juan Caballero

=head1 CONTACT

linxe __a__

=head1 LICENSE

Perl Artistic License v2.0


Friday, October 31, 2008

Slower with age ... 2nd. part

Phoronix had prepared a second test, now is testing also Fedora 7 to 10 and compare the results with the big U. In my consideration, Fedora became slow in some parts, but there are some results showing big problems with the U like this:

So, if you improve your user-friendly don't means you need better hardware every release.

Testing many version of a same distro can show its evolution. Whatever, I'm happy with my Mandriva 2009.

Tuesday, October 28, 2008


Last day, I need to download some sequences from NCBI GenBank, I have a list of ids, typically I used BioPerl to connect and get the fasta sequence of each one, with a code like this:

#!/usr/bin/perl -w
use strict;
use Bio::DB::GenBank;
my $gb = new Bio::DB::GenBank;
open F, "gene_list" or die "cannot open genes_list\n";
while (<F>) {
my $seq = $gb->get_Seq_by_id($_);
print $seq->seq;

Because Broadcast with Mac OS X 10.5.X cannot compile the BioPerl modules (I found many problems when you try to compile from source code, because many dependencies are broken). So, I take a look to BioPython, install it (with some warnings and missing optional packages), and use the next script:

from Bio import Entrez = "" # Always tell NCBI who you are
f = open("genes_list", "r")
while True:
myid = f.readline()
if not myid: break
handle = Entrez.efetch(db="nucleotide", id=myid, rettype="fasta")

and this works well ...

I know this is not so much efficient because require a call for each id, but some times is better when you are downloading big sequences or so many.

My two recommendations:
1. Dominate one computer language but be familiar with others. Better if you dominate more than one.
2. Don't waste time when you know there are more than one solution, if one fails, try next option.

Monday, October 27, 2008

Slower with age ...

I will not comment this article, I still want to conserve my U-emo-friends, so please read the original in Phoronix: Ubuntu 7.04 to 8.10 Benchmarks: Is Ubuntu Getting Slower? Yes, but still looks nice in brown/gold ;)

Friday, October 24, 2008

I, computer

Are my computers happier?
Source: Abstruse Goose

Thursday, October 23, 2008

X-rays with a sticky tape

This week in one of most famous scientific journals has been published this report:

Camara CG, Escobar JV, Hird JR, Putterman SJ, "Correlation between nanosecond X-ray flashes and stick–slip friction in peeling tape.", Nature 455, 1089-1092 (23 October 2008) | doi:10.1038/nature07378

Relative motion between two contacting surfaces can produce visible light, called triboluminescence. This concentration of diffuse mechanical energy into electromagnetic radiation has previously been observed to extend even to X-ray energies. Here we report that peeling common adhesive tape in a moderate vacuum produces radio and visible emission, along with nanosecond, 100-mW X-ray pulses that are correlated with stick–slip peeling events. For the observed 15-keV peak in X-ray energy, various models give a competing picture of the discharge process, with the length of the gap between the separating faces of the tape being 30 or 300 mum at the moment of emission. The intensity of X-ray triboluminescence allowed us to use it as a source for X-ray imaging. The limits on energies and flash widths that can be achieved are beyond current theories of tribology.

If you peel a tape in vacuum, it's emit X-ray which can be detected with a Geiger or a photo-film.

Please watch the video

Wednesday, October 22, 2008

Björk teaches you about electronics


Tuesday, October 21, 2008

Bioinformatics Career Survey 2008

Bioinformatics Zen had released the results in a text-file of the Bioinformatic Career Survey 2008, the survey include data from ~650 people from academia and industry, it's interesting to take a look in the data, I summarize this in some graphics:



Bioinformatics area

Computer Language

Wednesday, October 15, 2008

Perl BioGolf

Do you know what is a Perl Golf problem? It's a general problem formulated and you try to resolve with a minimal number of characters in a perl script, who writes less win. Some times is a good habit to see, admire and think in this beautiful pearls. Commonly there are a lot in the Perl Monks website.

Today I was looking for a more simple and effective subroutine to translate a DNA/RNA sequence into the corresponding peptide version using the typical genetic code, I used the typical solution with a hash storing the code and call the sequence in block with substr or pop/shift.

I found this solutions in a Perl Golf challenge:

# Typical solution hashing the codes:
sub f0 { #by tadman
my %g = (
# . - Stop
# A - Alanine
# C - Cysteine
# D - Aspartic Acid
# E - Glutamic Acid
# F - Phenylalanine
# G - Glycine
# H - Histidine
# I - Isoleucine
# K - Lysine
# L - Leucine
# M - Methionine
# N - Asparagine
# P - Proline
# Q - Glutamine
# R - Arginine
# S - Serine
# T - Threonine
# V - Valine
# W - Tryptophan
# Y - Tyrosine

# Second solution using the non-specific code.
sub f2{ #by MeowChow
my @r = qw(UA[AG]|UGA GC. - UG[UC] GA[UC] GA[AG] UU[UC] GG. CA[UC] AU[^G] - AA[AG] CU.|UU[AG] AUG AA[UC] - CC. CA[AG] CG.|AG[AG] UC.|AG[UC] AC. - GU. UGG - UA[UC] ^);
((my$t=pop)=~s|..?.?|chr 64+(grep$&=~/$r[$_]/,0..26)[0]|eg);$t=~y/@Z/./d;

# Third solution including regex and substitutions
sub f3 { #by no_slogan
s/[a-z]/uc$&x4/eg;@x=/./g;join"",@x[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]

# Fourth solution similar to 3rd.
sub f4 { #by srawls
join"",(/./g)[map{$x=0;$x=$x*4|6&ord for/./g;$x/2}pop=~/.../g]

# Fifth solution inverse from 3rd and 4th
sub f5 { #by tachyon

# Sixth solution and this is the fastest solution
sub f5{ #by tadman
All solutions have less bytes but I added some break-lines to present a more clear code (really?).

I use the last solution, just change the code for ATGC (DNA code) and not AUGC (RNA code).

That's why Perl rules in Bioinformatic.

Monday, October 13, 2008

Mandriva 2009

Last week, Mandriva released the 2009 version, because I'm a mandriva fan, immediately I downloaded the One-KDE ISO, burn it and installed in my old HP laptop (PIII 1Ghz, 256 Mb RAM, 20 Gb HD, Wi-fi card).

The LiveCD run perfectly, show me the new KDE4 desktop and made a clean install without problems (many others Linux LiveCD have problems just to boot in old hardware like this). I like to use a different partition for the /home, so my partition table looks like:
  • / 5GB
  • swap - 512 MB
  • /geexbox - 100MB
  • /home - rest
Yes, I want to install the GeeXBoX, it's great for watch movies.

Some good points are the new design, fast boot, the best hardware detection and many friendly menus to configure all. Remarkable is the improved URPMI, it is fast, now support simultaneous package download and the best part is the --auto-orphans option, this check for unused or broken packages and suggest uninstall, cleaning the systems even the kernel, removing unused drivers or modules. Before this I need to do manually, now is automatic.

KDE is a little heavy for this laptop, so I install XFCE and LXDE, as alternatives.

sudo urpmi task-xfce
sudo urpmi task-lxde

A bad point is I need to wait for the x86_64 version to upgrade my other laptop with a AMD Athlon 64 X2.

My desktop:

Download Mandriva:
Notes of the release:
A tour of the release:

Wednesday, October 8, 2008

Rules for BioComputing Happiness

Inspired by this article "al3x's rules for computing happiness" of Alex Payne, I want to extend this theme to my areas: bioinformatic, computational biology and systems biology.


  1. Use as little software as possible.
  2. Use software that does one thing well.
  3. Do not use software that does many things poorly.
  4. Try to understand how a software works before to use.
  5. Do not use web applications that should be desktop applications.
  6. Do not use desktop applications that should be web applications.
  7. Do not use software that isn't made specifically for your operating system.
  8. Use a plain text editor that you know well. Not a word processor, a plain text editor.
  9. Do not use your text editor for tasks other than editing text.
  10. Do not use software that's unmaintained.
  11. Do not use software unpublished.
  12. Try to use Open Source code.
  13. Be in touch with the developers or users in forums, mail-list, ...
  14. If you don't have a formal IT department, learn to maintain your systems.


  1. Some basic analysis does not require powerful computers, you can run many locally or in web services. But if you are in a big project or the data is in order of GBs o more, consider to buy a multicore server or build a Linux Cluster.
  2. Use a Mac/Linux for personal computing or development, Windows ... don't waste your time.
  3. Use Linux or BSD on commodity hardware for server computing.
  4. The only peripheral you absolutely need is a hard disk or network drive to put backups on, but get one as big as possible.
  5. Buy as large an external display as you can afford if you'll be working on the computer for more than three hours at a time.
  6. If you'll work with DBs locally, be sure you have an appropriate internet connection.
  7. File Formats

  1. Keep as much as possible in plain text. Not Word or Excel documents, plain text.
  2. For tasks that plain text doesn't fit, store documents in an open standard file format if possible.


  1. Learn and dominate a computer language, C, Perl, Python, Ruby, Java, Bash, but be open to others.
  2. Learn to use the terminal and the commands, don't be afraid.
  3. Comment your code, try to update regularly.
  4. Automatize tasks in case of re-running for a DB update.
  5. Debug your code and try to be modular in designs.
  6. Remember your objetives, don't waste time solving common tasks or re-invent-the-wheel.
  7. First chek all works well, later put a pretty interface on it.
Finally, be prepared for parasites or people who don't believe in bioinformatics.

Tuesday, October 7, 2008

Phishing with Free Software

Last day I received this email, fortunately the spam engine detected it, but is different the content, other times I had receive similar emails for proprietary software, specially MS Office Suite, which is so expensive and many people want a cheap (an illegal) version. But this time the reference is a Free Software office suite, OpenOffice which in few days will release the new version 3.0.

This is the infamous message:

From Suite 2009
Date October 6th 2008 18:55
Subject Download Open Office 2009

Open Office Suite 2009
Open, Create & Edit Your Files

Download Office Suite 2009??Here

Edit Word, Excel & Power Point files- 100% MS Office Compatible.
Read and write PDF files just like Adobe.

Here's how to download Open Office 2009:

1. Go to: Download Page
2. Download Open Office 2009
3. Receive access immediately

This software package is the best way to edit your documents.
Publish all of your documents online in the HTML format.

Thank you for choosing us, the worldwide leader in Open Office 2009.

For More Information Visit our Website

Thank You,

David Matthews
Office Solutions

If you want to stop receiving mail, please go to:

or you may contact us at the following address:

Plaza Neptuno, local #7
Via ricardo J Alfaro, Tumba Muerto
Panama Ciudad
Republica de Panama
Of course than the message include some links to and call a PHP document sending the email account (this is enough to confirm a valid address and add it to a spam list).

If you want OpenOffice just go to the official website:

Friday, October 3, 2008

More Firefox add-ons

Today I install 2 more add-ons, both are to improve my GMail accounts.

1. Better Gmail 2. This utility modify a little (pimp probably) the normal view and use of Gmail, attachments are symbolic images describing the content, colorize the pointed message and more options are available. An excellent job of Gina Trapani from

2. Gmail S/MIME. Talking about privacy, this add-on allow us to sign and crypt messages, a must-have for every-one, you cannot know who's watching in the upper cloud.

Tuesday, September 30, 2008


The Defense Advance Research Projects Agency (DARPA) is offering support for resolve 23 mathematical problems, you can read the full rules here (sorry it's a DOC). This is the problems they want to solve:

  1. The Mathematics of the Brain. Develop a mathematical theory to build a functional model of the brain that is mathematically consistent and predictive rather than merely biologically inspired.
  2. The Dynamics of Networks. Develop the high-dimensional mathematics needed to accurately model and predict behavior in large-scale distributed networks that evolve over time occurring in communication, biology and the social sciences.
  3. Capture and Harness Stochasticity in Nature. Address Mumford’s call for new mathematics for the 21st century. Develop methods that capture persistence in stochastic environments.
  4. 21st Century Fluids. Classical fluid dynamics and the Navier-Stokes Equation were extraordinarily successful in obtaining quantitative understanding of shock waves, turbulence and solitons, but new methods are needed to tackle complex fluids such as foams, suspensions, gels and liquid crystals.
  5. Biological Quantum Field Theory. Quantum and statistical methods have had great success modeling virus evolution. Can such techniques be used to model more complex systems such as bacteria? Can these techniques be used to control pathogen evolution?
  6. Computational Duality. Duality in mathematics has been a profound tool for theoretical understanding. Can it be extended to develop principled computational techniques where duality and geometry are the basis for novel algorithms?
  7. Occam’s Razor in Many Dimensions. As data collection increases can we “do more with less” by finding lower bounds for sensing complexity in systems? This is related to questions about entropy maximization algorithms.
  8. Beyond Convex Optimization. Can linear algebra be replaced by algebraic geometry in a systematic way?
  9. What are the Physical Consequences of Perelman’s Proof of Thurston’s Geometrization Theorem? Can profound theoretical advances in understanding three dimensions be applied to construct and manipulate structures across scales to fabricate novel materials?
  10. Algorithmic Origami and Biology. Build a stronger mathematical theory for isometric and rigid embedding that can give insight into protein folding.
  11. Optimal Nanostructures. Develop new mathematics for constructing optimal globally symmetric structures by following simple local rules via the process of nanoscale self-assembly.
  12. The Mathematics of Quantum Computing, Algorithms, and Entanglement. In the last century we learned how quantum phenomena shape our world. In the coming century we need to develop the mathematics required to control the quantum world.
  13. Creating a Game Theory that Scales. What new scalable mathematics is needed to replace the traditional Partial Differential Equations (PDE) approach to differential games?
  14. An Information Theory for Virus Evolution. Can Shannon’s theory shed light on this fundamental area of biology?
  15. The Geometry of Genome Space. What notion of distance is needed to incorporate biological utility?
  16. What are the Symmetries and Action Principles for Biology? Extend our understanding of symmetries and action principles in biology along the lines of classical thermodynamics, to include important biological concepts such as robustness, modularity, evolvability and variability.
  17. Geometric Langlands and Quantum Physics. How does the Langlands program, which originated in number theory and representation theory, explain the fundamental symmetries of physics? And vice versa?
  18. Arithmetic Langlands, Topology, and Geometry.What is the role of homotopy theory in the classical, geometric, and quantum Langlands programs?
  19. Settle the Riemann Hypothesis. The Holy Grail of number theory.
  20. Computation at Scale. How can we develop asymptotics for a world with massively many degrees of freedom?
  21. Settle the Hodge Conjecture. This conjecture in algebraic geometry is a metaphor for transforming transcendental computations into algebraic ones.
  22. Settle the Smooth Poincare Conjecture in Dimension 4. What are the implications for space-time and cosmology? And might the answer unlock the secret of “dark energy”?
  23. What are the Fundamental Laws of Biology?. This question will remain front and center for the next 100 years. DARPA places this challenge last as finding these laws will undoubtedly require the mathematics developed in answering several of the questions listed above.
So we have 9 / 23 problems involving biology (marked in blue), I think we'll have a new age in theoretical biology in the next years, including bioinformatics, computational biology, evolution and systems biology.

Image from:

Friday, September 26, 2008

Learning in the bus

Today when I was in the bus, a girl close to me was reading, she had a small block of cards, the cards had a hole in one side and a key ring attached together, simple and effective mode to keep the notes, the notes are some chemical formulas, the amino acids charts, one side was the formula, in the reverse of the card was the name, 3 letter abbreviation and the polar class.

Yes, I remember when in college I learned all this information, also I build some molecules with my atomic-kit, to see the structure, and now I just see the ACGT codes, missing the 3D beauty.

Amino Acid Short Abbrev. Side chain Polar pH
Alanine A Ala -CH3 - -
Cysteine C Cys -CH2SH - acidic
Aspartic acid D Asp -CH2COOH X acidic
Glutamic acid E Glu -CH2CH2COOH X acidic
Phenylalanine F Phe -CH2C6H5 - -
Glycine G Gly -H - -
Histidine H His -CH2-C3H3N2 X weak basic
Isoleucine I Ile -CH(CH3)CH2CH3 - -
Lysine K Lys -(CH2)4NH2 X basic
Leucine L Leu -CH2CH(CH3)2 - -
Methionine M Met -CH2CH2SCH3 - -
Asparagine N Asn -CH2CONH2 X -
Pyrrolysine O Pyl

Proline P Pro -CH2CH2CH2- - -
Glutamine Q Gln -CH2CH2CONH2 X -
Arginine R Arg -(CH2)3NH-C(NH)NH2 X strongly basic
Serine S Ser -CH2OH X -
Threonine T Thr -CH(OH)CH3 X weak acidic
Selenocysteine U Sec -CH2SeH - -
Valine V Val -CH(CH3)2 - -
Tryptophan W Trp -CH2C8H6N - -
Tyrosine Y Tyr -CH2-C6H4OH X -

Thursday, September 25, 2008

Firefox add-ons

Currently I use every day the best on the world browser: Mozilla Firefox, in all my computers (1 Linux laptop with dual Vista/Mandriva, other old laptop with Puppy Linux and 1 Mac with Leopard), some add-ons "cannot-live-without" I commonly install for Firefox are:

While Google Chrome doesn't support tools like that (and native Linux/Mac versions), I hardly change the Fox.

Monday, September 22, 2008

killer tux

Killer Tux from Cenek Strichel on Vimeo.

Thanks to Xbit for the link.

man baby


BABY - create new process from two parent processes

BABY sex [ name ]

/usr/5bin/BABY [ -sex ] [ -name ]

The System V version of this command is available with the Sys-
tem V software installation option. Refer to Installing
SunOS 4.1 for information on how to install and invoke BABY.

BABY is initiated when one parent process polls another server
process through a socket connection (BSD) or through pipes in the
system V implementation. BABY runs at a low priority for approximately
40 weeks then terminates with heavy system load. Most systems require
constant monitoring when BABY reaches it's final stages of execution.

Older implementations of BABY required that the initiating
process not be present at the time of completion, In these versions
the initiating process is awakened and notified of the results upon
completion. Modern versions allow both parent processes to be active
during the final stages of BABY.

example% BABY -sex m -name fred


option indicating type of process created.

process identification to be attached to the new process.

Successful execution of the BABY(1) results in new process
being created and named. Parent processes then typically
broadcast messages to all other processes informing them of their
new status in the system.

The SLEEP command may not work on either parent processes for some
time afterward, as new BABY processes constantly send interrupts
which must be handled by one or more parent.

BABY processes upon being created may frequently dump
in /tmp requiring /tmp to be cleaned out frequently by one
of the parent processes.

The original AT&T version was provided without instructions
regarding the created process, this remains in current implementations.

cigars(6) dump(5) cry(3)


FSF version of BABY where none of the authors will accept
responsibility for anything.


baby -sex F -name Samantha Lynn Garriques

Completed successfully at the St. Joseph Medical Center on
November 2, at 3:31 P.M. after about 20 hours of labour.
New Mom Lisa is doing fine and will come home in about 2 days.
More information can be gotten from Dad by e-mail or by
calling the new baby hotline @931-XXXX. Celebrations can
probably begin in about 18 years.

Sun Release 4.1 Last change: Just when I got home from the hospital.


Wednesday, September 10, 2008

Change of role

Now I'm dad of a very cute girl, I'll be off-line the next 2 weeks.

Thursday, August 28, 2008

Maping big sequences

For my work I get some slices of 100 kb from real chromosomes, but I forget to insert in the fasta comment the name and coordinates for each sequences, now I need to know this information for each one, so I tried to map using blast, but in the server a weird error marks "Segmentation fault" (I format the full genome ~3 Gb).

I think in the Perl solution and this is the code:

#!/usr/bin/perl -w
use strict;

=head1 NAME


Perl script to find and get the position of each sequence in a
multifasta file into a big sequence (a chromosome).

The comparation is direct, so it only works with sequences in same
direction (typical 5' -> 3').

Output is a simple text file with the name of the sequence, the name
of the big sequence and the positions where match.

Note we match every line with a fasta comment (defined with ">") but
the last sequence is omitted, I add a ">" as last line with:

echo ">" >> fasta


$ARGV[2] or die "usage: perl CHR SEQ MAP\n";

# Global variables
my ($chr_file) = shift @ARGV; # Fasta file with chromosome
my ($seq_file) = shift @ARGV; # Fasta file with sequences
my ($map_file) = shift @ARGV; # Output file
my ($seq) = ''; # Sequence to map
my ($name) = ''; # Fasta comment
my ($chr) = ''; # Genomic sequence
my ($first) = 'y'; # Flag for first line

print "Reading genome sequences ...\n";
open C, "$chr_file" or die "Cannot open $chr_file\n";
while (<C>) {
next if (/>/);
$chr .= $_;
close C;
print " loaded $chr_file with ", length $chr, " bases\n";

open F, "$seq_file" or die "Cannot open $seq_file\n";
open O, ">$map_file" or die "Cannot open $map_file\n";
print "Mapping sequences ...\n";
while (<F>) {
if ($first eq 'y') {
$name = $_;
$first = 'n';
elsif (/>/) {
if ($chr =~ /$seq/) {
print " $name mapped in $chr_file ";
print O "$name\t$chr_file\t";

# We use @- to look for multiple matches
foreach my $match (@-) {
print O "$match:";
print "$match ";
print O "\n";
print "\n";
else {
print " $name not match in $chr_file\n";
$name = $_;
$seq = '';
else {
$seq .= $_;
close F;
close O;
print "Done\n";

=head1 AUTHOR

Juan Caballero (linxe _at_

=head1 LICENSE

Perl Artistic License (


Wednesday, August 27, 2008

MS Virus in space

If you missing the BSOD at the Olympic Games, today is another Windows bad point takes the press, a computer virus is found in some laptops in the ISS, specifically the virus known as Gammima.AG.

Apparently the source was a memory flash from a digital camera, infected the laptop and try to infect other computers, obviously NASA filter every transmission with the ISS and the virus is aisled, but the laptop doesn't have an antivirus. The laptops are used to register nutritional data for the astronauts and as personal computer to send email to Earth, nothing special.

NASA is very paranoic in the materials which send to space, no dust or particles are allowed, but they need to start checking the computers and other informatic devices too.

Or maybe is that a revenge from the Aliens from "Indepence Day"? You know, Jeff Goldblum infected them using a custom virus from his Mac.


PD: Today Linux isn't secure also, there are an increase in server attacks, do you remember the Debian problem with random routines for certificates? Some people are using this vulnerability to gain a root shell and owned the system. Be watch Debianist and derivatives (Memphis, Ubuntu, ...).


Friday, August 22, 2008

Motivational posters

From :

Friday, August 15, 2008

From Ceyusa's blog:

"A Pythagorean triplet is a set of three natural numbers, a <>, for which, a2 + b2 = c2

For example, 32 + 42 = 9 + 16 = 25 = 52.

There exists exactly one Pythagorean triplet for which a + b + c = 1000.

Find the product abc."

My Perl solution (ok it isn't a functional programming but works):

#!/usr/bin/perl -w
use strict;
my @n = ( 1 .. 1000 );
foreach my $a ( @n ) {
my $a2 = $a * $a;
foreach my $b ( @n ) {
my $b2 = $b * $b;
my $c = sqrt ( $a2 + $b2 );

next unless ( ($a + $b + $c) == 1000 );
next unless ( ($c / int $c ) == 1 );

print "Solution: a = $a b = $b c = $c\n";

Wednesday, August 13, 2008

New paradigms in supercomputing

A first law in building large systems to High Performance Computing is more processors are better, for many years the development of processors was dominated by the x86 family, beating including the PowerPC (more in cost than in performance in my opinion) and new families emerge like the x86_64 and the multicore chips.

Beside the CPU technology advances, other part of the computer had acquired a new task, the graphic card, in early stages of computer evolution, its function was only connect and control the visual output for the system and the monitor. Later computer games demand more intensive usage for the GPU, principally by 3D environments, and new technologies were created to satisfy the demand. Actually a high performance graphic card is like a small but very efficient version of a complete system, including one or more special GPUs, own memory and can share information with the host system or other graphics card very fast (SLI).

How do include this computing power? Because apart from video games there are another problems which require graphical processing or similar calculations (vectorial maths), the GPU is a specific processor to manage this problems. Nvidia has release a specific C extension named CUDA, now you can write or prepare your code to use the power of the GPU and build systems really impressive like this GPU2 Farm or buy a Tesla.

The new paradigm makes us to think in how to code to use the potential of this, even for no-3D.

I need to practice my C-fu.

Tuesday, August 12, 2008

The Perfect Linux

Ok, last time I judged the Ubuntu distribution so hard, later I started to think if there are a perfect Linux.

First thing I believe is the Linux versatility, with the freedom of evolve or changing (forking) you can expect everything, so you can choice what to use as desktop, shell, editor, package system, ... Many distributions are open to change the default installation, or simply to allow to try any program in the repositories or just the code. The options are infinite.

I remember when you need to decide the parts of your system, in the beginning of many distributions the installation system ask you the packages to include, some times you obtain an unstable systems, but many times you have a personal system. Many problems could be resolved reading manual or asking in the LUGs, but some times you needed to wait until the next release and a programmer or hacker fix it.

Later the popularity of LiveCDs allows to install a full system which works fine, because many debuggers tested the system and the distribution try to support as many hardware as possible, now the user can add or remove the packages, and she does not need to configure the system, except for few simple questions. At this point the user can work and enjoy Linux.

Other part to discuss is the release time, the popular distributions are very active and release every 6 months (Ubuntu, Mandriva, etc.), other more wise wait the release is stable and functional (Debian, RedHat, Slackware, etc.), a third class doesn't believe in a release and you can rebuild the distribution anytime (Gentoo, Arch, etc.). This is an important fact to consider, because each release have a lot of bugs.

Besides, many programs are debugged by the people of the distribution, but other are not directly related and the quantity of available programs is very large, so the probability of something wrong is high.

Finally, the users are not equal, many people love Gnome over KDE, other prefer WindowMaker, another likes just the command line. Every one is right, because each one has particular preferences, impossible to satisfy all the world.

Conclusion: The Perfect Linux is the Linux you want, even if it has problems.

Saturday, August 9, 2008

Friday, August 8, 2008

I don't use Ubuntu

Many of my linux-friends well know I don't like Ubuntu, yes the "so popular and friendly linux", this week I had some buddy-support for this bad distro, but these people are fascinating using it!

Basic problems include (in order of appearance):
  1. Compile a C code. Why Ubuntu doesn't configure well the dependent libraries for static? I try the code in other Un*x environment and works well, but in Ubuntu I had to delete the static flag and edit the headers in source code.
  2. Execute a CGI script under Apache (I'm not 100% sure of Ubuntu systems but I guess). If the system install a httpd.conf file with basic rules to work, you expect this work, if it doesn't, I don't like to read the manual to fix something when many other distros like Mandriva do perfectly fine the first time.
  3. Install updates and reboot. Sorry for the similitudes, but this is a Windows feature, why do you need to reboot after the installation of updates? I like to update if I consider necessary but I don't update every new kernel or patch not important to security or better features. Besides many updates to fix something is a bad signal of an immature release (Vista-like).
  4. Keyboard identification. Is possible this day to have a distro where you can't configure correctly you keyboard key-distribution? Bad.
  5. Unexpected crash. Why does the sound system freeze a computer? Unexplainable.
So, what is the good part to use it if it returns you 5 years in time?

Please, if you are considering to use Ubuntu as your linux, before try another better Linux as Mandriva, Fedora, OpenSuse or Debian or ask an Ubuntu-fan to avoid the critics.

Wednesday, August 6, 2008

Linux sizes

Ok, lets get some conclusions about the image:
  1. Time is 9 years, Linux had almost 9 years old in 1999.
  2. I don't now the numerical values of the samples.
  3. In 1999, the average is in medium size (M), in 2008 is extra large (XL).
  4. I think one explanation, many linux users started as very active-skinny young people and later they gain some weight.
  5. But really few people, like me, use Linux in 1999, and after the Ubuntu boom in 2005 many new users are converted every year. Are they bigger in size?
  6. The anti-MS commentary: If Windows crash every 5 minutes, the BSOD makes you move with your frustration, Linux stability makes you be more static in front the screen.
BTW, I'm L since 1995.

Sunday, August 3, 2008

Large Hadron Collider

"The Large Hadron Collider (LHC), a 27 kilometers (17 miles) long particle accelerator straddling the border of Switzerland and France, is nearly set to begin its first particle beam tests. The European Organization for Nuclear Research (CERN) is preparing for its first small tests in early August, leading to a planned full-track test in September - and the first planned particle collisions before the end of the year. The final step before starting is the chilling of the entire collider to -271.25 C (-456.25 F)."

Here a few amazing pictures, more in the website: