Friday, September 28, 2012

Real time hacking

This is a really cool (but scary) real time tracking of hacking intents worldwide by the HoneyNet Project team: http://map.honeycloud.net/


Wednesday, September 12, 2012

Programming challenge - synthetic whole genome vcf

I found in BioStar an nice programming challenge to produce an alternative VCF file from a complete genome sequence (the motivation to have such file is a mystery to me), anyway, I and many others produce a solution in C, Python, Perl and even AWK. As expected the C solution is the faster (but longest code), surprisily Python is really close in speed and really compact. My Perl wasn't bad, but is still a little slow.

Here is my final code after reducing the initial solution:

print join "\t",'#CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO';
print "\n";
%a = ('A'=>'C,G,T', 'C'=>'A,G,T', 'G'=>'A,C,T', 'T'=>'A,C,G');
while (<>) {
  chomp;
  if (m/>(.+)/) { $chr = $1; $i = 0; }
  else {
    @a = split(//, uc $_);
    foreach $b (@a) {
      $i++;
      if ($a{$b}) {
        print join "\t", $chr, $i, '.', $b, $a{$b}, 100, 'PASS', 'DP=100';
        print "\n";
      }
    }
  }
}

Wednesday, August 29, 2012

Selecting N random lines from a text file (Perl)

In many situations I need to sample a few lines from a big file, the basic approach is to take just the first N lines in the file, but that isn't correct, a real sampling needs a random selection of points to avoid some bias.

Here is my solution in the Perl language:


#!/usr/bin/perl

=head1 NAME

randomLines.pl

=head1 DESCRIPTION

Subsample some random lines from a text file.

=head1 USAGE

perl randomLines.pl [PARAM]

    Parameter      Description                Value         Default
    -i --in        Input file                 File            STDIN
    -o --out       Output file                File           STDOUT
    -n --num       Number of lines to sample  Integer          1000
    -t --total     Total lines in file        Integer       1000000
    -f --first     Include first line         Bool               No
    
    -h --help      Print this screen and exit
    -v --verbose   Verbose mode
    --version      Print version number and exit

=head1 EXAMPLES

    1. Sample 100 lines from a 1000 lines file
    perl randomLines.pl -n 100 -t 1000 < file.in > file.out
   
    2. Sample 1000 lines and the header (first line)
    perl randomLines.pl -f -n 1000 -i file.in -o file.out

=head1 LICENSE

This is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with code.  If not, see .

=cut

use strict;
use warnings;
use Getopt::Long;
use Pod::Usage;

# Default parameters
my $help     = undef;         # Print help
my $verbose  = undef;         # Verbose mode
my $version  = undef;         # Version call flag
my $in       = undef;
my $out      = undef;
my $num      =   1e3;
my $total    =   1e6;
my $first    = undef;

# Main variables
my $our_version = 0.1;        # Script version number
my %sel = ();
my $ln  =  0;

# Calling options
GetOptions(
    'h|help'           => \$help,
    'v|verbose'        => \$verbose,
    'version'          => \$version,
    'i|in:s'           => \$in,
    'o|out:s'          => \$out,
    'n|num:i'          => \$num,
    't|total:i'        => \$total,
    'f|first'          => \$first
) or pod2usage(-verbose => 2);
    
pod2usage(-verbose => 2) if (defined $help);
printVersion() if (defined $version);

getPos($num, $total, \%sel);

if (defined $in) {
    warn "opening file $in\n" if (defined $verbose);
    open STDIN, "$in" or die "cannot open file $in\n";
}

if (defined $out) {
    warn "creating file $out\n" if (defined $verbose);
    open STDOUT, ">$out" or die "cannot open file $out\n";
}

if (defined $first) {
    warn "First line will be included\n" if (defined $verbose);
    $sel{0} = 1;
}

warn "Extracting lines\n" if (defined $verbose);
while (<>) {
    print if (defined $sel{$ln});
    $ln++;
}

###################################
####   S U B R O U T I N E S   ####
###################################

sub printVersion {
    print "$0 $our_version\n";
    exit 1;
}

sub getPos {
    my ($n, $t, $s_href) = @_;
    warn "Selecting $n positions from $t total\n" if (defined $verbose); 
    for (my $i = 1; $i <= $n; $i++) {
        my $p = int(rand $t);
        if (defined $$s_href{$p}) {
            $i--;
        }
        else {
            warn " $p\n" if (defined $verbose);
            $$s_href{$p} = 1;
        }
    }
}

Friday, August 17, 2012

Moving to Fedora

After many years using Mandriva (even when it was Mandrake), I finally move my personal systems to Fedora. I always consider Mandriva an excellent Linux distribution, really easy to install and use (even more than Ubuntu), but the current situation of Mandriva (please google-it) gave me the signals to move on.

My current situation is developing (Perl, C, R) using my Mac with Snow Leopard, but all my heavy work runs on linux servers (real and cloud) mainly CentOS, so I decided to keep me in the RedHat family. Besides my iMac, I own an Asus 1000HA netbook and an Acer Aspire 5517 laptop. First I tried Fedora 16 in the netbook, the XFCE remix, it works wonderful considering the low power of a netbook. This week I installed Fedora17 in the Acer, after a quick installation I'm using it o write this post. I have no complains with the last Gnome release, actually I like it, it's pretty. I only had a hardware problem, the wireless card, but amazingly after checking the status with dmesg, I see a really descriptive line:

[ 593.564077] b43-phy0 ERROR: Firmware file "b43/ucode15.fw" not found
[ 593.564083] b43-phy0 ERROR: Firmware file "b43-open/ucode15.fw" not found
[ 593.564087] b43-phy0 ERROR: You must go to http://wireless.kernel.org/en/users/Drivers/b43#devicefirmware and download the correct firmware for this driver version. Please carefully read all instructions on this website.

WOW, the system is telling me where can I find the solution, something that few years ago we wished. Following the instruction was really painless to make work my WiFi.

Now I'm installing my usual packages and preparing the machine for the long running jobs we have ahead. And yes, now I'm using a fedora hat too.

Thursday, June 7, 2012

Perl/Tk

Last weeks I have been working in some user interface to make easy some data integration and quick manual annotation for new repeat families.

After spending some time understanding the problem and learning the annotation path, I reach a point to decide which framework do I need to create a simple GUI. I searched for options, like HTML5, QT, C#, even Java, nothing liked me.

I take a look to the old Tk in particular to the binding in Perl. I always wanted to develop something serious using Tk, but never have a nice project.

In a few days, I can build the prototype in a few lines of code, then after some testing and debugging, the GUI was complete in short time.


Perl::Tk development was really easy, the CPAN documentation and some examples from tutorials were enough to build it. The only "but" I can say is the look-and-feel, it looks "old".

Thursday, March 1, 2012

I didn't forget

I just realize (after the GLiB was moved to a new home and I remember the Planet), that I'm not writing anymore here. Many of my thoughts are now on the social networks but I try to keep this blog.