Posts

Showing posts from October, 2009

Random lines in a text file

Sometimes I need to sample large data sets, so I randomly select some lines in the file. My files are generally text-records, one record by line, then I wrote this small script to do the task: #!/usr/bin/perl -w use strict; =head1 NAME selectRandomLines.pl =head1 DESCRIPTION Select random lines in a file. =cut $ARGV[2] or die "Usage: selectRandomLines.pl TOTAL_LINES_IN_FILE NUM_LINES_WANTED FILE_NAME\n"; my $total = shift @ARGV; # Total lines in the file my $want = shift @ARGV; # Total lines to select my $file = shift @ARGV; # The file my $line = 0; # Line counter my %select = randomSelect($total, $want); # Hash with selected lines open FILE, "$file" or die "cannot open $file\n"; while (<FILE>) { print "$_" if (defined $select{$line++}); } close FILE; =head1 SUBROUTINES randomSelect() CALL: randomSelect(TOTAL_ELEMENTS, TOTAL_WANTED) [NUM, NUM] RETURN: %s [HASH] =cut sub randomSelect { my $t = shift @_; my $n = shift...

Error in DBI::SQLite

I was coding to query a SQLite database using Perl::DBI, I have a small code like this: #!/usr/bin/perl -w use strict; use DBI; my $dbh = DBI->connect("dbi:SQLite:dbname=db_file", "", "") or die "error in connection\n"; my $sth = $dbh->prepare($sql) or die "cannot prepare SQL\n"; $sth->execute(); while (my @data = $sth->fetchrow_array()) { #process data } $sth->finish; $dbh->disconnect; But, every time I execute this script I obtain this warning: closing dbh with active statement handles .... blah, blah, blah I double checked the code, the documentation and ran some debug examples, I obtained the expected result, so why this message? Finally, I found the solution in the PerlMonks website, the problem is a bad status return in the module when you use close() method, so the simple solution is to " undef $sth " at the end: $sth->finish; undef $sth; $dbh->disconnect;

Reading the DNA

Current DNA sequencing technologies are based on sequencing-by-synthesis or similar technics, where enzymatic reactions add one or more nucleic bases marked. But IBM is working in another way to know the composition of a DNA sequence, using nanotechnology and electronics to "read" base by base. No enzymes means faster and cheaper process. We're closer to the $1,000 genome ... Source: http://www.engadget.com/2009/10/08/ibms-ultra-cheap-dna-transistor-dream-could-lead-to-personalize/