Thursday, July 31, 2008

Working with large sequences

I have used Perl for my projects for more than 6 years, when I used the Arabidopsis genome it was easy to load a full chromosome into a string variable, but now I work mining bigger genomes, like human or mouse, and I had the habit to pass direct variable or references to subroutines, bad idea.

The point is when you are using many subs to perform calculations inside the sequences, the memory increase exponentially, one of my scripts takes the sequences, divide into blocks and perform calculation by each block, and store in a hash (O-O programming), first time I made a mistake and create an infinite loop and sucks the entire memory, including the SWAP, our server has 32 GB in RAM and 32GB in SWAP, fortunately the server survive to my error. Next I fixed it, and now the script require about 3 GB, so much ...

After think a little I decide to not pass nothing to the subs, declare globally access for principal variables and this reduces the memory needed to 300 MB for the same process. Nice.

Also I'm learning MatLab, later maybe I compare it to R and other OS maths programs.

Wednesday, July 30, 2008

Thursday, July 24, 2008

Bioinfo tips

Yesterday I was disturbed of my coding-state to answer a basic problem, which's the best font to display a sequences alignment?

First introduce the concept of sequence alignment, if you want to compare 2 or more sequences in their composition of nucleic bases (for DNA or RNA) or amino acid (for peptids), you must search similar blocks between all the context and try to fix the rest as possible. Many algorithms has been created to perform this task, with different approaches, if you want to know more, please go to Wikipedia or bioinformatics books.

Ok, let's consider we have our alignment, really this is a text file with a special structure to easy visualize the blocks found. Some people make the mistake to cut-and-paste in a text document without change the font style, a font is just an image to represent letters or symbols, some are very artistic but generally the font does not conserve a proportional aspect in size, so the beautiful alignment changes its proportion and visually loose the blocks (there are inside but with different proportions).

Then we need to decide which font is the best, in this case a web service showing the information, so, courier? monospace? None.

The best way to insert an alignment in a HTML file is using the tags pre or code, then you can ignore the typo used by the browser, because you can expect the user's system have not your predefined font.

Remember the LaTex principle of you must be focus on content, the appearance is apart.

Finally a Perl's best practices to work with big sequences. You need to insert a big sequence in other, memory and time are precious. Consider for this example a sequence in $seq, an insert in $ins and the position $pos.

BAD:

my $left = substr ($seq, 0, $pos);
my $right = substr ($seq, $pos);
my $seq = $left . $ins . $rigth;


GOOD:

my $p = substr ($seq, $pos, 1);
substr ($seq, $pos, 1) = $p . $ins;


When you use big sequences you note the difference.

Tuesday, July 22, 2008

War Games



Next July 24th a special event marks the 25th Anniversary of the release of War Games, some theaters will have the 1983's movie.

I noticed that a second version was filmed, and after the anniversary will be available but the release is direct to DVD, I'm curious to watch how they actualized the technical material, because now you have wireless networks instead the dial access, or better security schemes to make invisible your network and also high-level encrypted, besides the monitor systems which are hard to break. Ok, Hollywood generally doesn't show reality, but some movies are spectacular (or simply stupid) to show the computer world, some times looks like magic ...

I miss the sound of a modem connecting to the internet, or a monocromatic screen with plain text (well, not so much this last because sometimes my X11 crash and you need to fix in the terminal mode).

$ logout

Friday, July 18, 2008

Impostor


Bioinformatic:
Which is the best blast E-value?

Thursday, July 17, 2008

When visuals gone wrong ...

If you are in biosciences related work many times you have seen a fabulous but meaning-less image in the sci papers, typical error to represent data with wrong visuals perceptions, M. E. J. Newman named "Ridiculograms" (you can see the related video in Youtube).

A ridiculogram can be defined as:
  • Visual stunning
  • Scientifically worthless
  • Published in Nature or Science

After a group discussion and a no so good seminar about text mining for disease terms, some agree that the last point can not be required, I want to extend this concept to other parts in bioinformatics.

Many people are working in the user interaction part, novel technologies have large and heavy output and require some tools to show the results, but many of this tools are redundant (yes, many people want a desktop/web/network interfaces or some programs for Linux/Windows/Mac or versions in Java/C#/Python/Perl) and while the developers think "in the user experience", they forget the concept of the tool: to show an interpretable way for the data.

Besides, this infinite dataset are non-human analyzable, who can read thousand of alignments just to say if well annotate a genomic bank? or just give the information to an army of slaves/students to perform automatic process. Maybe the way in some bioinformatic tools follow endless paths.

I prefer to code for real tools which can give a clue in the biological universe even if this looks ugly.

Wednesday, July 16, 2008

Web reaper


These days I had working on web interfaces for some Perl scripts, the server is an 8-cores Linux box with CentOS with Apache as web server, but some tests didn't work well and some probably fall into an infinite loops. The problem was I can not kill these process, because I must not know the admin password and I can not have access for the web server, the scripts are created in my home and when activate the process is submit by the apache user. So the solution was a script which ask the PID and send a termination signal. Here is the code, maybe someone can use it:



#!/tools64/bin/perl -w
use strict;

=head1 NAME

reaperl.pl

=head1 DESCRIPTION

Web service to stop some uncontrolled process created by the Perl::CGI.
Simply presents a web form to indicate the PID and call a system termination.

=cut

use CGI::Pretty qw/:standard/;

print header;
print start_html(-title=>'reaperl.pl');
print start_form;
print p('Kill this process: ', textfield(-name=>'proc'));
print submit(), reset();
print end_form;

if( param()) {
my $proc = param('proc');
$proc =~ s/\D//g;
my $res = `kill -15 $proc`;
print p("Killed $proc: $res");
}
print end_html;

=head1 AUTHOR

Juan Caballero (linxe _at_ glib.org.mx)

=head1 LICENSE

Perl Artistic License (http://dev.perl.org/licenses/artistic.html)

=cut


Thanks to G for the idea.

Wednesday, July 9, 2008

Metaverse

I suppose you have listen about Second Life or games like WoW, these sites are known as virtual worlds where you can be anyone, even yourself. Few days ago, Sun announce its solution for virtual meetings named Project Wonderland, a evolution of their experiments with Looking Glass (I made an article for GLiB). This approach create a virtual enviroment where people can be reunited with 3D avantars and talk (also you can be a weird floating ball when you are using a phone communication).

Today Google launched Lively a personal "world" (maybe more like a personal room). Lively is like Second Life but small and owned by yourself, the bad point is that requires MS Vista/XP at this moment. The concept is good, take the good things in SL and apply for a minimal space and with Google support, it can be big and soon we'll have many variations or improvements.

When will be we as close as the Metaverse written by Neal Stephenson in the cyberpunk novel Snow Crash?

By the way, Neal said there's not better Spock than the original, I'm not a Trekkie but I agree.

Wednesday, July 2, 2008

Sad

Oh, life is bigger
It's bigger than you
And you are not me
The lengths that I will go to
The distance in your eyes
Oh no, I've said too much
I set it up

That's me in the corner
That's me in the spotlight, I'm
Losing my religion
Trying to keep up with you
And I don't know if I can do it
Oh no, I've said too much
I haven't said enough
I thought that I heard you laughing
I thought that I heard you sing
I think I thought I saw you try

Every whisper
Of every waking hour I'm
Choosing my confessions
Trying to keep an eye on you
Like a hurt lost and blinded fool, fool
Oh no, I've said too much
I set it up
Consider this
Consider this
The hint of the century
Consider this
The slip that brought me
To my knees failed
What if all these fantasies
Come flailing around
Now I've said too much
I thought that I heard you laughing
I thought that I heard you sing
I think I thought I saw you try

But that was just a dream
That was just a dream

But that was just a dream
Try, cry, why try?
That was just a dream
Just a dream, just a dream
Dream

R.E.M.
Berry/Buck/Mills/Stipe

Yesterday I become the last JC.

Tuesday, July 1, 2008

Linux Powered

Which OS is dominating the SuperComputer world?