Thursday, February 19, 2009

Tips And Tricks For Bioinformatics Software Engineering

Gus told me about this blog post which is the condensed version of this presentation:
Let's discuss some points:
  1. Learn UNIX. This is basic, many people comes from the MS-world and wants to do the same tasks as in UNIX, wrong, the UNIX philosophy and design is optimized for heavy task and multiple activities, besides the complete support for programming, servers, administration, ... I say: don't waste your time, use UNIX (Linux, Mac OS, BSD, Solaris, ...).
  2. Know many computer languages but be master in ONE. In this part the talk is focused in Java/Python/Ruby, I still suggest Perl. Perl big problem is the object-oriented pragma, but nothing compares the power of regular expressions and text-parsing of Perl. Besides Perl is less grammar strict than Java/Python, it's easy to learn but easy to acquire bad habits.
  3. Don’t reinvent the wheel. I agree, why do spend time and money developing something which do the same as other existing tools but in different colors? Code less means faster development.
  4. Learn one text editor really well. You can write in many text editors, but few really support code development and other features which enhance your code-fu.
  5. Control version. CVS and similar tools helps to monitor, share and backup your valuable code.
  6. Don’t be afraid to use more than 3 letters to define a variable. Common mistake in beginners (and lazy programmers), you must have a legible code, always.
  7. Balance architecture and accomplishment. Software development can be elegant, complete, extensible and with order, but this takes time, find an equilibrium.
  8. Automate documentation. Or insert the documentation in the code, I use PerlDoc features in my code, easy way to write the help files.
  9. Kill the flat file. New technologies has high throughput files, so is better to use a DB (or kind-of) to store and retrieve the data. BerkelyDB, SQLite, SOAP/ReST, ...
  10. New ways to do parallel computing. Check MapReduce, cloud systems and support for multiple cores technologies.
  11. Embrace hardware. Read and understand novel technologies in data-crunching, vectorial CPU are back with graphical processors (GPU) and FGPA chips. Many algorithms can be ported and be faster in this chips, but not all of them.
  12. Playing nice with others. Support multiple outputs like XML, JSON, YAML to easy integrate with other tools.

No comments:

Post a Comment