Thursday, August 28, 2008

Maping big sequences

For my work I get some slices of 100 kb from real chromosomes, but I forget to insert in the fasta comment the name and coordinates for each sequences, now I need to know this information for each one, so I tried to map using blast, but in the server a weird error marks "Segmentation fault" (I format the full genome ~3 Gb).

I think in the Perl solution and this is the code:

#!/usr/bin/perl -w
use strict;

=head1 NAME


Perl script to find and get the position of each sequence in a
multifasta file into a big sequence (a chromosome).

The comparation is direct, so it only works with sequences in same
direction (typical 5' -> 3').

Output is a simple text file with the name of the sequence, the name
of the big sequence and the positions where match.

Note we match every line with a fasta comment (defined with ">") but
the last sequence is omitted, I add a ">" as last line with:

echo ">" >> fasta


$ARGV[2] or die "usage: perl CHR SEQ MAP\n";

# Global variables
my ($chr_file) = shift @ARGV; # Fasta file with chromosome
my ($seq_file) = shift @ARGV; # Fasta file with sequences
my ($map_file) = shift @ARGV; # Output file
my ($seq) = ''; # Sequence to map
my ($name) = ''; # Fasta comment
my ($chr) = ''; # Genomic sequence
my ($first) = 'y'; # Flag for first line

print "Reading genome sequences ...\n";
open C, "$chr_file" or die "Cannot open $chr_file\n";
while (<C>) {
next if (/>/);
$chr .= $_;
close C;
print " loaded $chr_file with ", length $chr, " bases\n";

open F, "$seq_file" or die "Cannot open $seq_file\n";
open O, ">$map_file" or die "Cannot open $map_file\n";
print "Mapping sequences ...\n";
while (<F>) {
if ($first eq 'y') {
$name = $_;
$first = 'n';
elsif (/>/) {
if ($chr =~ /$seq/) {
print " $name mapped in $chr_file ";
print O "$name\t$chr_file\t";

# We use @- to look for multiple matches
foreach my $match (@-) {
print O "$match:";
print "$match ";
print O "\n";
print "\n";
else {
print " $name not match in $chr_file\n";
$name = $_;
$seq = '';
else {
$seq .= $_;
close F;
close O;
print "Done\n";

=head1 AUTHOR

Juan Caballero (linxe _at_

=head1 LICENSE

Perl Artistic License (


Wednesday, August 27, 2008

MS Virus in space

If you missing the BSOD at the Olympic Games, today is another Windows bad point takes the press, a computer virus is found in some laptops in the ISS, specifically the virus known as Gammima.AG.

Apparently the source was a memory flash from a digital camera, infected the laptop and try to infect other computers, obviously NASA filter every transmission with the ISS and the virus is aisled, but the laptop doesn't have an antivirus. The laptops are used to register nutritional data for the astronauts and as personal computer to send email to Earth, nothing special.

NASA is very paranoic in the materials which send to space, no dust or particles are allowed, but they need to start checking the computers and other informatic devices too.

Or maybe is that a revenge from the Aliens from "Indepence Day"? You know, Jeff Goldblum infected them using a custom virus from his Mac.


PD: Today Linux isn't secure also, there are an increase in server attacks, do you remember the Debian problem with random routines for certificates? Some people are using this vulnerability to gain a root shell and owned the system. Be watch Debianist and derivatives (Memphis, Ubuntu, ...).


Friday, August 22, 2008

Motivational posters

From :

Friday, August 15, 2008

From Ceyusa's blog:

"A Pythagorean triplet is a set of three natural numbers, a <>, for which, a2 + b2 = c2

For example, 32 + 42 = 9 + 16 = 25 = 52.

There exists exactly one Pythagorean triplet for which a + b + c = 1000.

Find the product abc."

My Perl solution (ok it isn't a functional programming but works):

#!/usr/bin/perl -w
use strict;
my @n = ( 1 .. 1000 );
foreach my $a ( @n ) {
my $a2 = $a * $a;
foreach my $b ( @n ) {
my $b2 = $b * $b;
my $c = sqrt ( $a2 + $b2 );

next unless ( ($a + $b + $c) == 1000 );
next unless ( ($c / int $c ) == 1 );

print "Solution: a = $a b = $b c = $c\n";

Wednesday, August 13, 2008

New paradigms in supercomputing

A first law in building large systems to High Performance Computing is more processors are better, for many years the development of processors was dominated by the x86 family, beating including the PowerPC (more in cost than in performance in my opinion) and new families emerge like the x86_64 and the multicore chips.

Beside the CPU technology advances, other part of the computer had acquired a new task, the graphic card, in early stages of computer evolution, its function was only connect and control the visual output for the system and the monitor. Later computer games demand more intensive usage for the GPU, principally by 3D environments, and new technologies were created to satisfy the demand. Actually a high performance graphic card is like a small but very efficient version of a complete system, including one or more special GPUs, own memory and can share information with the host system or other graphics card very fast (SLI).

How do include this computing power? Because apart from video games there are another problems which require graphical processing or similar calculations (vectorial maths), the GPU is a specific processor to manage this problems. Nvidia has release a specific C extension named CUDA, now you can write or prepare your code to use the power of the GPU and build systems really impressive like this GPU2 Farm or buy a Tesla.

The new paradigm makes us to think in how to code to use the potential of this, even for no-3D.

I need to practice my C-fu.

Tuesday, August 12, 2008

The Perfect Linux

Ok, last time I judged the Ubuntu distribution so hard, later I started to think if there are a perfect Linux.

First thing I believe is the Linux versatility, with the freedom of evolve or changing (forking) you can expect everything, so you can choice what to use as desktop, shell, editor, package system, ... Many distributions are open to change the default installation, or simply to allow to try any program in the repositories or just the code. The options are infinite.

I remember when you need to decide the parts of your system, in the beginning of many distributions the installation system ask you the packages to include, some times you obtain an unstable systems, but many times you have a personal system. Many problems could be resolved reading manual or asking in the LUGs, but some times you needed to wait until the next release and a programmer or hacker fix it.

Later the popularity of LiveCDs allows to install a full system which works fine, because many debuggers tested the system and the distribution try to support as many hardware as possible, now the user can add or remove the packages, and she does not need to configure the system, except for few simple questions. At this point the user can work and enjoy Linux.

Other part to discuss is the release time, the popular distributions are very active and release every 6 months (Ubuntu, Mandriva, etc.), other more wise wait the release is stable and functional (Debian, RedHat, Slackware, etc.), a third class doesn't believe in a release and you can rebuild the distribution anytime (Gentoo, Arch, etc.). This is an important fact to consider, because each release have a lot of bugs.

Besides, many programs are debugged by the people of the distribution, but other are not directly related and the quantity of available programs is very large, so the probability of something wrong is high.

Finally, the users are not equal, many people love Gnome over KDE, other prefer WindowMaker, another likes just the command line. Every one is right, because each one has particular preferences, impossible to satisfy all the world.

Conclusion: The Perfect Linux is the Linux you want, even if it has problems.

Saturday, August 9, 2008

Friday, August 8, 2008

I don't use Ubuntu

Many of my linux-friends well know I don't like Ubuntu, yes the "so popular and friendly linux", this week I had some buddy-support for this bad distro, but these people are fascinating using it!

Basic problems include (in order of appearance):
  1. Compile a C code. Why Ubuntu doesn't configure well the dependent libraries for static? I try the code in other Un*x environment and works well, but in Ubuntu I had to delete the static flag and edit the headers in source code.
  2. Execute a CGI script under Apache (I'm not 100% sure of Ubuntu systems but I guess). If the system install a httpd.conf file with basic rules to work, you expect this work, if it doesn't, I don't like to read the manual to fix something when many other distros like Mandriva do perfectly fine the first time.
  3. Install updates and reboot. Sorry for the similitudes, but this is a Windows feature, why do you need to reboot after the installation of updates? I like to update if I consider necessary but I don't update every new kernel or patch not important to security or better features. Besides many updates to fix something is a bad signal of an immature release (Vista-like).
  4. Keyboard identification. Is possible this day to have a distro where you can't configure correctly you keyboard key-distribution? Bad.
  5. Unexpected crash. Why does the sound system freeze a computer? Unexplainable.
So, what is the good part to use it if it returns you 5 years in time?

Please, if you are considering to use Ubuntu as your linux, before try another better Linux as Mandriva, Fedora, OpenSuse or Debian or ask an Ubuntu-fan to avoid the critics.

Wednesday, August 6, 2008

Linux sizes

Ok, lets get some conclusions about the image:
  1. Time is 9 years, Linux had almost 9 years old in 1999.
  2. I don't now the numerical values of the samples.
  3. In 1999, the average is in medium size (M), in 2008 is extra large (XL).
  4. I think one explanation, many linux users started as very active-skinny young people and later they gain some weight.
  5. But really few people, like me, use Linux in 1999, and after the Ubuntu boom in 2005 many new users are converted every year. Are they bigger in size?
  6. The anti-MS commentary: If Windows crash every 5 minutes, the BSOD makes you move with your frustration, Linux stability makes you be more static in front the screen.
BTW, I'm L since 1995.

Sunday, August 3, 2008

Large Hadron Collider

"The Large Hadron Collider (LHC), a 27 kilometers (17 miles) long particle accelerator straddling the border of Switzerland and France, is nearly set to begin its first particle beam tests. The European Organization for Nuclear Research (CERN) is preparing for its first small tests in early August, leading to a planned full-track test in September - and the first planned particle collisions before the end of the year. The final step before starting is the chilling of the entire collider to -271.25 C (-456.25 F)."

Here a few amazing pictures, more in the website: