PROGRAMMING and BIOINFORMATICS: Literate and test-driven development, especially for data-processing projects.
One of my concerns about bioinformatics (or computational biology) is the quality of the software efforts. As encouragement for thought on this, I’ve outlined below an approach I take for many of my projects.

"Code Monkey" (Source: wikipedia)
When you’re running bulk data processing, it’s pretty hard to know if the code is really doing what it’s supposed to. It ends up being pretty much a black box, in cases even for the person who wrote the code.
One informal verification approach is to take some input data you are very familiar with and explore the output you get. While I recommend this, it isn’t terribly strong testing. You’re unlikely to put a lot of effort into it (let’s face it, we all only have so much time) and if you’re the developer you’re likely to focus on the same things that might have occurred to you whilst developing the code meaning you’re unlikely to test for ‘unexpected’ things, which is… after all… the whole point of testing.
What I’m leading to more is formalised approaches to testing code, regression tests in particular. In particular, I’m curious as to how bioinformatics developers are developing their code. I’d like to think, or at least wish, that most have formal testing as part of their development process, but then again you don’t see remarks about how the software was tested in most bioinformatics publications so how are we to know? (Were I the editor of a bioinformatics journal, I’d consider making it a requirement that the testing regime be explicitly discussed. Maybe that’d make me an overly harsh editor – ?!)
For some projects[1] my approach is a mix of sort-of literate programming and test-driven development.
Read the rest of this entry »