Programming challenge - synthetic whole genome vcf
I found in BioStar an nice programming challenge to produce an alternative VCF file from a complete genome sequence (the motivation to have such file is a mystery to me), anyway, I and many others produce a solution in C, Python, Perl and even AWK. As expected the C solution is the faster (but longest code), surprisily Python is really close in speed and really compact. My Perl wasn't bad, but is still a little slow.
Here is my final code after reducing the initial solution:
Here is my final code after reducing the initial solution:
print join "\t",'#CHROM','POS','ID','REF','ALT','QUAL','FILTER','INFO';
print "\n";
%a = ('A'=>'C,G,T', 'C'=>'A,G,T', 'G'=>'A,C,T', 'T'=>'A,C,G');
while (<>) {
chomp;
if (m/>(.+)/) { $chr = $1; $i = 0; }
else {
@a = split(//, uc $_);
foreach $b (@a) {
$i++;
if ($a{$b}) {
print join "\t", $chr, $i, '.', $b, $a{$b}, 100, 'PASS', 'DP=100';
print "\n";
}
}
}
}
Comments
Post a Comment