martes 27 de octubre de 2009

Random lines in a text file

Sometimes I need to sample large data sets, so I randomly select some lines in the file. My files are generally text-records, one record by line, then I wrote this small script to do the task:

#!/usr/bin/perl -w
use strict;

=head1 NAME

selectRandomLines.pl

=head1 DESCRIPTION

Select random lines in a file.

=cut

$ARGV[2] or die "Usage: selectRandomLines.pl TOTAL_LINES_IN_FILE NUM_LINES_WANTED FILE_NAME\n";

my $total = shift @ARGV; # Total lines in the file
my $want = shift @ARGV; # Total lines to select
my $file = shift @ARGV; # The file
my $line = 0; # Line counter
my %select = randomSelect($total, $want); # Hash with selected lines

open FILE, "$file" or die "cannot open $file\n";
while (<FILE>) { print "$_" if (defined $select{$line++}); }
close FILE;

=head1 SUBROUTINES
randomSelect()
CALL: randomSelect(TOTAL_ELEMENTS, TOTAL_WANTED) [NUM, NUM]
RETURN: %s [HASH]
=cut
sub randomSelect {
my $t = shift @_;
my $n = shift @_;
my %s = ();
for (my $i = 0; $i < $n; $i++) {
my $v = int(rand $t);
(defined $s{$v}) ? $i-- : $s{$v}++;
}
return %s;
}

=head1 AUTHOR

JC @ 2009

=head1 LICENSE

This is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with code. If not, see .

=cut

lunes 12 de octubre de 2009

Error in DBI::SQLite

I was coding to query a SQLite database using Perl::DBI, I have a small code like this:
#!/usr/bin/perl -w
use strict;
use DBI;
my $dbh = DBI->connect("dbi:SQLite:dbname=db_file", "", "") or die "error in connection\n";
my $sth = $dbh->prepare($sql) or die "cannot prepare SQL\n";
$sth->execute();
while (my @data = $sth->fetchrow_array()) {
#process data
}
$sth->finish;
$dbh->disconnect;
But, every time I execute this script I obtain this warning:
closing dbh with active statement handles .... blah, blah, blah
I double checked the code, the documentation and ran some debug examples, I obtained the expected result, so why this message? Finally, I found the solution in the PerlMonks website, the problem is a bad status return in the module when you use close() method, so the simple solution is to "undef $sth" at the end:
$sth->finish;
undef $sth;
$dbh->disconnect;

jueves 8 de octubre de 2009

Reading the DNA

Current DNA sequencing technologies are based on sequencing-by-synthesis or similar technics, where enzymatic reactions add one or more nucleic bases marked. But IBM is working in another way to know the composition of a DNA sequence, using nanotechnology and electronics to "read" base by base. No enzymes means faster and cheaper process. We're closer to the $1,000 genome ...



Source: http://www.engadget.com/2009/10/08/ibms-ultra-cheap-dna-transistor-dream-could-lead-to-personalize/

viernes 25 de septiembre de 2009

R.O.B.O.T. Comics


From the author of Ph.D. Comics via WillowGarage.com

miércoles 23 de septiembre de 2009

Tron Tributes

The next video is a recompilation of Tron-like scenes in movies, music videos, tv commercials, even Tron on Ice, turn on the speakers because the video music if a Daft Punk (who else?) remix.



I have the Tron screensaver in my Mac.

lunes 14 de septiembre de 2009

World twit "good morning"

Amazing video, please check the full details in their Vimeo page or in blog.blprnt.com.

GoodMorning! Full Render #2 from blprnt on Vimeo.

jueves 27 de agosto de 2009

Movie Time Travel

Information is Beautiful created this image, amazing and mind-blowing:


after admired it, please read the full description.