Posts

Showing posts from February, 2009

DNA fractals

Image
Some days ago, I had a talk with I who suggested me to use some mathematical approaches to compare artificial with real sequences (one of my projects). Besides mutual information and SVM classification systems, he told me about fractals presents in the sequences. The method had been called "Chaos game for DNA" ( emboss has a subprogram to do it named chaos ) and the algorithm is very simple: use a square with each corner represents a "letter" (AGCT), start in the middle of the square and read the sequence, you add a point in the halfway of the previous point and the new letter, continue until you finish the sequence This method was previously described in the 90's, recently we have more genome information to continue exploring this characteristic in the DNA sequence, some application include alignment-free comparison and other iterative systems which reproduce this effect. After I plot some sequences, I asked me to test other species to see differences, but a

Tips And Tricks For Bioinformatics Software Engineering

Gus told me about this blog post which is the condensed version of this presentation: Tips And Tricks For Bioinformatics Software Engineering View more presentations from jtdudley . (tags: bioinformaticsprogrammin ... bioinformatics ) Let's discuss some points: Learn UNIX . This is basic, many people comes from the MS-world and wants to do the same tasks as in UNIX, wrong , the UNIX philosophy and design is optimized for heavy task and multiple activities, besides the complete support for programming, servers, administration, ... I say: don't waste your time, use UNIX (Linux, Mac OS, BSD, Solaris, ...). Know many computer languages but be master in ONE . In this part the talk is focused in Java/Python/Ruby, I still suggest Perl. Perl big problem is the object-oriented pragma , but nothing compares the power of regular expressions and text-parsing of Perl. Besides Perl is less grammar strict than Java/Python, it's easy to learn but easy to acquire bad habits. Don’t reinv

Filtering common miRNAs

Last day, M asks me how to compact the microRNA mature fasta-file from miRbase , the problem is many miRNA are conservated across species, so the fasta file with all the mature sequences has a lot of redundancy and he wants unique sequences to screen some sequences. This is a simple Perl solution, the first step is to read and collect the sequences, the trick is to save in a hash using the nucleic sequences as "key" and append the names as "value". After this, simply extract the "keys" and parse a little the "values" to have a common identifier. Code: #!/usr/bin/perl -w use strict; =head1 NAME uniq_mir.pl FASTA =head1 DESCRIPTION Filter the multifasta file of miRbase (mature.fa) into unique miR by sequence. =cut $ARGV[0] or die "Usage: uniq_mir.pl FASTA\n"; my $name = ''; my %seqs = (); open F, "$ARGV[0]" or die "cannot open $ARGV[0]\n"; while ( <F>) { chomp; if (/^>/) { s/>//; my @line =

Command-line Fu

Image
Recently I found this great site Command-line Fu , a site where you can find these beautiful gems for the command line but it's different from other collections: What's this? Command-Line-Fu is the place to record those command-line gems that you return to again and again. Delete that bloated snippets file you've been using and share your personal repository with the world. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on and discussed - digg-esque voting is also encouraged so the best float to the top. So, anyone can share their commands, you can look, learn and use or suggests better ways to do it, and beyond the Bash, also include tags (functions) for one-liners scripts in perl, python, etc. or by specific commands like curl, ssh, grep, ... With Twitter and RSS report systems you can follow the latest commands too (but only report the link and the function, not the command). Sweet. Via: Downloa

1234567890

Image
The next February 13th has a special event, not for everyone, just for people like us (yes, you know, accept it ...) because the UNIX time at 11:31:30pm UTC will read in the terminal in curious form: 1234567890 . Why? Check Wikipedia : "Unix time, or POSIX time, is a system for describing points in time, defined as the number of seconds elapsed since midnight Coordinated Universal Time (UTC) of January 1st. 1970, not counting lap seconds. It is widely used not only on Unix-like operation systems but also in many other computing systems. It is neither a linear representation of time nor a true representation of UTC (though it is frecuently mistaken of both) as the times it represents are UTC but it has no way of repesenting UTC leap seconds (e.g. 1998-12-31 23:59:60)." If you want to check it, or see other values for your favorite number, you can use the Perl one-line: perl -e 'print scalar localtime(1234567890),"\n";' Note: image from Wikimedia showing whe

People loves Win7 ... wait it's KDE4!

ZDNet Australia test some people in the street, they showed a Linux (which one? they didn't mention ...) with KDE4, they activate some fancy effects and told the people "this is the new windows 7", ... many people like it! I think they must tell the true, to see if they want to switch to Linux, anyway the cheaper W7 will be $200 usd (I prefer to buy a new computer with this money, I'm still saving for my AAO, donations accepted).

Bigger, Faster, ...

Image
IBM announced the new big project to build the next #1 supercomputer of the world, codename Sequoia, it's impressive the limit of 20 petaflops ... that's more than the combined power of all top 500 supercomputers (#1 is only 1 petaflops). The Sequoia will be powered by 1.6 million cores (specific 45-nanometer chips in development) and 1.6 petabytes of memory. It will be housed in 96 refrigerators spanning roughly 280 square meters. Of course the owner will be the USA gov, who's else? From: Gizmodo Image from: xkcd

Bash scripting tips

Bash is almost a complete programming language, I use every day (switching from the default tcsh in the servers, in Mac OS X is default) for automatic job launching. Extracted from: http://hacktux.com/bash/script/efficient. Avoid full paths to bash built-ins. Except when you require specific commands, when I use systems with 32 or 64b binaries, I need to specify which one to use, or define an specific directory to use in the PATH or as a variable. Avoid external commands for integer math. Because bash has math evaluation. Avoid using cat . Commonly people use cat and send to other commands with a pipe, but many commands can read files in the parameters. Avoid piping grep to awk . Because awk can filter the inputs, use /pattern/ Avoid piping sed to sed . Multiple filters can be applied to the input with the -e option ( sed -e "s/this/that/" -e "s/old/new/" filename ). Use double brackets for compound and RegEx tests. [[ ]] evaluate the operations between