Posts

Showing posts from 2012

Real time hacking

Image
This is a really cool (but scary) real time tracking of hacking intents worldwide by the HoneyNet Project team: http://map.honeycloud.net/

Programming challenge - synthetic whole genome vcf

I found in BioStar an nice programming challenge to produce an alternative VCF file from a complete genome sequence (the motivation to have such file is a mystery to me), anyway, I and many others produce a solution in C, Python, Perl and even AWK. As expected the C solution is the faster (but longest code), surprisily Python is really close in speed and really compact. My Perl wasn't bad, but is still a little slow. Here is my final code after reducing the initial solution: print join "\t" , '#CHROM' , 'POS' , 'ID' , 'REF' , 'ALT' , 'QUAL' , 'FILTER' , 'INFO' ; print "\n" ; % a = ( 'A' => 'C,G,T' , 'C' => 'A,G,T' , 'G' => 'A,C,T' , 'T' => 'A,C,G' ); while (<>) { chomp ; if ( m />(.+)/) { $chr = $1 ; $i = 0 ; } else { @a = split ( //, uc $_); foreach $b ( @a ) {

Selecting N random lines from a text file (Perl)

In many situations I need to sample a few lines from a big file, the basic approach is to take just the first N lines in the file, but that isn't correct, a real sampling needs a random selection of points to avoid some bias. Here is my solution in the Perl language: #!/usr/bin/perl =head1 NAME randomLines.pl =head1 DESCRIPTION Subsample some random lines from a text file. =head1 USAGE perl randomLines.pl [PARAM] Parameter Description Value Default -i --in Input file File STDIN -o --out Output file File STDOUT -n --num Number of lines to sample Integer 1000 -t --total Total lines in file Integer 1000000 -f --first Include first line Bool No -h --help Print this screen and exit -v --verbose Verbose mode --version Print version number and exit =head1 EXAMPLES 1. S

Moving to Fedora

After many years using Mandriva (even when it was Mandrake), I finally move my personal systems to Fedora . I always consider Mandriva an excellent Linux distribution, really easy to install and use (even more than Ubuntu), but the current situation of Mandriva (please google-it) gave me the signals to move on. My current situation is developing (Perl, C, R) using my Mac with Snow Leopard, but all my heavy work runs on linux servers (real and cloud) mainly CentOS, so I decided to keep me in the RedHat family. Besides my iMac, I own an Asus 1000HA netbook and an Acer Aspire 5517 laptop. First I tried Fedora 16 in the netbook, the XFCE remix, it works wonderful considering the low power of a netbook. This week I installed Fedora17 in the Acer, after a quick installation I'm using it o write this post. I have no complains with the last Gnome release, actually I like it, it's pretty. I only had a hardware problem, the wireless card, but amazingly after checking the status with dm

Perl/Tk

Image
Last weeks I have been working in some user interface to make easy some data integration and quick manual annotation for new repeat families. After spending some time understanding the problem and learning the annotation path, I reach a point to decide which framework do I need to create a simple GUI. I searched for options, like HTML5, QT, C#, even Java, nothing liked me. I take a look to the old Tk  in particular to the binding in Perl . I always wanted to develop something serious using Tk, but never have a nice project. In a few days, I can build the prototype in a few lines of code, then after some testing and debugging, the GUI was complete in short time. Perl::Tk development was really easy, the CPAN documentation and some examples from tutorials were enough to build it. The only "but" I can say is the look-and-feel, it looks "old".

I didn't forget

I just realize (after the GLiB was moved to a new home and I remember the Planet), that I'm not writing anymore here. Many of my thoughts are now on the social networks but I try to keep this blog.