Wednesday, March 14, 2018

Año 2018

Tres años sin escribir, pero me tomo 5 min para desoxidar este blog (desempolvar ya se queda corto).

Muchos proyectos en marcha, todo avanza bien al parecer, sigo igual (o peor) de geek y nerd, ya no doy clases pero me entretengo procesando datos y programando a lo bestia (incluso le he encontrado sabor a Python pero Perl sigue siendo mi fuerte).

En fin, sigamos otro rato.

Friday, January 9, 2015


Pues ya es 2015 (felicidades a quienes deseen recibir una felicitación por el cambio de año). Después de mucho tiempo sin publicar y en espera de que varios procesos acaben, me doy a la tarea de desempolvar un poco este blog.

No puedo detallar mucho de los proyectos que tengo en curso pero son varios y muy interesantes, pero dado que en este blog escribo más sobre Linux, este es mi estado actual de sistemas:

1. Mi netbook cumple 5 años y sigue dando batalla, ayer actualice (más bien reinstale) a Fedora 21 - XFCE, hice varios ajustes a como tenía particionado el disco duro, pero fuera de ahí todo normal. Incluso varias cosas se instalaron más facilmente quen la 20 (Skype, Dropbox).

2. Mi workstation está con un Ubuntu 14.04, sigo sin tener mucha preferencia por esa distro pero para las clases los sistemas en las aulas son Ubuntus, por lo que me facilita mucho que pueda probar los ejercicios. Mi mayor queja es en la instalación poco ortodoxa de algunos paquetes y una alta tasa de error en I/O, me encuentro regularmente con archivos compresos corruptos.

3. Estreno un servidor HPC (20 núcleos, 128 GB RAM, 120 GB disco SSD, 1 TB disco duro, tarjeta Nvidia Quadro-Pro y una Nvidia Tesla), de momento tiene CentOS 7, pero hay cosas que no me gustan (kernel viejo, particionamiento raro, paquetes no disponibles, se ha congelado un par de veces sin razón). Quizas la migre a Fedora en unas semanas después de sacar algunos trabajos pendientes. Además voy a empezar a explorar el uso de la Tesla con CUDA.

Friday, September 28, 2012

Real time hacking

This is a really cool (but scary) real time tracking of hacking intents worldwide by the HoneyNet Project team:

Wednesday, September 12, 2012

Programming challenge - synthetic whole genome vcf

I found in BioStar an nice programming challenge to produce an alternative VCF file from a complete genome sequence (the motivation to have such file is a mystery to me), anyway, I and many others produce a solution in C, Python, Perl and even AWK. As expected the C solution is the faster (but longest code), surprisily Python is really close in speed and really compact. My Perl wasn't bad, but is still a little slow.

Here is my final code after reducing the initial solution:

print join "\t",'#CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO';
print "\n";
%a = ('A'=>'C,G,T', 'C'=>'A,G,T', 'G'=>'A,C,T', 'T'=>'A,C,G');
while (<>) {
  if (m/>(.+)/) { $chr = $1; $i = 0; }
  else {
    @a = split(//, uc $_);
    foreach $b (@a) {
      if ($a{$b}) {
        print join "\t", $chr, $i, '.', $b, $a{$b}, 100, 'PASS', 'DP=100';
        print "\n";

Wednesday, August 29, 2012

Selecting N random lines from a text file (Perl)

In many situations I need to sample a few lines from a big file, the basic approach is to take just the first N lines in the file, but that isn't correct, a real sampling needs a random selection of points to avoid some bias.

Here is my solution in the Perl language:


=head1 NAME


Subsample some random lines from a text file.

=head1 USAGE

perl [PARAM]

    Parameter      Description                Value         Default
    -i --in        Input file                 File            STDIN
    -o --out       Output file                File           STDOUT
    -n --num       Number of lines to sample  Integer          1000
    -t --total     Total lines in file        Integer       1000000
    -f --first     Include first line         Bool               No
    -h --help      Print this screen and exit
    -v --verbose   Verbose mode
    --version      Print version number and exit


    1. Sample 100 lines from a 1000 lines file
    perl -n 100 -t 1000 < > file.out
    2. Sample 1000 lines and the header (first line)
    perl -f -n 1000 -i -o file.out

=head1 LICENSE

This is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with code.  If not, see .


use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;

# Default parameters
my $help     = undef;         # Print help
my $verbose  = undef;         # Verbose mode
my $version  = undef;         # Version call flag
my $in       = undef;
my $out      = undef;
my $num      =   1e3;
my $total    =   1e6;
my $first    = undef;

# Main variables
my $our_version = 0.1;        # Script version number
my %sel = ();
my $ln  =  0;

# Calling options
    'h|help'           => \$help,
    'v|verbose'        => \$verbose,
    'version'          => \$version,
    'i|in:s'           => \$in,
    'o|out:s'          => \$out,
    'n|num:i'          => \$num,
    't|total:i'        => \$total,
    'f|first'          => \$first
) or pod2usage(-verbose => 2);
pod2usage(-verbose => 2) if (defined $help);
printVersion() if (defined $version);

getPos($num, $total, \%sel);

if (defined $in) {
    warn "opening file $in\n" if (defined $verbose);
    open STDIN, "$in" or die "cannot open file $in\n";

if (defined $out) {
    warn "creating file $out\n" if (defined $verbose);
    open STDOUT, ">$out" or die "cannot open file $out\n";

if (defined $first) {
    warn "First line will be included\n" if (defined $verbose);
    $sel{0} = 1;

warn "Extracting lines\n" if (defined $verbose);
while (<>) {
    print if (defined $sel{$ln});

####   S U B R O U T I N E S   ####

sub printVersion {
    print "$0 $our_version\n";
    exit 1;

sub getPos {
    my ($n, $t, $s_href) = @_;
    warn "Selecting $n positions from $t total\n" if (defined $verbose); 
    for (my $i = 1; $i <= $n; $i++) {
        my $p = int(rand $t);
        if (defined $$s_href{$p}) {
        else {
            warn " $p\n" if (defined $verbose);
            $$s_href{$p} = 1;

Friday, August 17, 2012

Moving to Fedora

After many years using Mandriva (even when it was Mandrake), I finally move my personal systems to Fedora. I always consider Mandriva an excellent Linux distribution, really easy to install and use (even more than Ubuntu), but the current situation of Mandriva (please google-it) gave me the signals to move on.

My current situation is developing (Perl, C, R) using my Mac with Snow Leopard, but all my heavy work runs on linux servers (real and cloud) mainly CentOS, so I decided to keep me in the RedHat family. Besides my iMac, I own an Asus 1000HA netbook and an Acer Aspire 5517 laptop. First I tried Fedora 16 in the netbook, the XFCE remix, it works wonderful considering the low power of a netbook. This week I installed Fedora17 in the Acer, after a quick installation I'm using it o write this post. I have no complains with the last Gnome release, actually I like it, it's pretty. I only had a hardware problem, the wireless card, but amazingly after checking the status with dmesg, I see a really descriptive line:

[ 593.564077] b43-phy0 ERROR: Firmware file "b43/ucode15.fw" not found
[ 593.564083] b43-phy0 ERROR: Firmware file "b43-open/ucode15.fw" not found
[ 593.564087] b43-phy0 ERROR: You must go to and download the correct firmware for this driver version. Please carefully read all instructions on this website.

WOW, the system is telling me where can I find the solution, something that few years ago we wished. Following the instruction was really painless to make work my WiFi.

Now I'm installing my usual packages and preparing the machine for the long running jobs we have ahead. And yes, now I'm using a fedora hat too.

Thursday, June 7, 2012


Last weeks I have been working in some user interface to make easy some data integration and quick manual annotation for new repeat families.

After spending some time understanding the problem and learning the annotation path, I reach a point to decide which framework do I need to create a simple GUI. I searched for options, like HTML5, QT, C#, even Java, nothing liked me.

I take a look to the old Tk in particular to the binding in Perl. I always wanted to develop something serious using Tk, but never have a nice project.

In a few days, I can build the prototype in a few lines of code, then after some testing and debugging, the GUI was complete in short time.

Perl::Tk development was really easy, the CPAN documentation and some examples from tutorials were enough to build it. The only "but" I can say is the look-and-feel, it looks "old".