I found in BioStar an nice programming challenge to produce an alternative VCF file from a complete genome sequence (the motivation to have such file is a mystery to me), anyway, I and many others produce a solution in C, Python, Perl and even AWK. As expected the C solution is the faster (but longest code), surprisily Python is really close in speed and really compact. My Perl wasn't bad, but is still a little slow. Here is my final code after reducing the initial solution: print join "\t" , '#CHROM' , 'POS' , 'ID' , 'REF' , 'ALT' , 'QUAL' , 'FILTER' , 'INFO' ; print "\n" ; % a = ( 'A' => 'C,G,T' , 'C' => 'A,G,T' , 'G' => 'A,C,T' , 'T' => 'A,C,G' ); while (<>) { chomp ; if ( m />(.+)/) { $chr = $1 ; $i = 0 ; } else { @a = split ( //, uc $_); foreach $b ( @a ) { ...