Developing bioinformatics methods: by who and how

By Grant Jacobs 10/01/2010

"Code Monkey" (Source: wikipedia)
"Code Monkey" (Source: wikipedia)

In my view the best method developers–generalising here–are those who are both an advanced user and developer of the method they are developing.

They are “scratching an itch”, developing the method to serve their own needs.

Before elaborating on this, let me quickly point to a couple of earlier posts that brush up against my views on this.

Developers can be scientists too

Some time ago (October 16) Fabiana Kubke wrote Methods in Neuroscience in which she considered that

There are, in my opinion, two types of scientists. Those who adapt the questions to the methods they use, and those who choose the methods based on the questions they need to ask.

She, of course, meant experimental scientists. I just had to respond by adding two categories of “method developers”, in particular:

2. Those that are focused on a [research] question, but will develop new methods if that’s what is needed to address the question.

As I went on to say, this is the fashion I prefer to work in myself. Start with a biological problem and through that develop methods (if needed). I’ll come back to this.

There is more than computing involved

A post and comments by Sandra Porter raise some points worth considering. From the paper her article is examining she quotes:

The success of bioinformatics software is based not on the elegance of the software design, but rather its utility as a tool for driving and answering biological questions. Consequently it is no surprise that many successful bioinformatics apps are written by biologists who lack formal computer science training, as they undoubtedly put scientific utility ahead of architectural elegance and completeness.

(Source: PLoS Computational Biology: A Quick Guide for Developing Bioinformatics Programming Skills. Dudley and Butte.)

I’m not going to review this paper itself. I’d recommend it to those interested involved, or becoming involved in, bioinformatics. While no one person is going to agree with every portion of advice there, it is well worth reading and most of what is said is worth taking note of. (See also Sandra Porter’s comments on the paper.)

What I’d like to focus on is what skills are needed to develop new methods.

Ada Lovelace, regarded as the first computer programmer (Source: wikipedia)
Ada Lovelace, regarded as the first computer programmer (Source: wikipedia)

I absolutely agree with Sandra that methods must have utility. Over the years I’ve seen many methods developed by computer scientists that, to be frank, were worthless because they flew in the face of biology or addressed problems that no-one needed to be solved.

However I disagree that biologists should be the people developing new methods, nor that in general they will be able to. I don’t think it’s wise to encourage people to a goal that they’re unlikely to be able to attain.

I do think that there is scope for biologists to develop new data pipelines, as opposed to the methods those pipelines use. The skills for the former can be derived from most computing textbooks with a little effort. Developing new methods, what the pipelines leverage, goes beyond that. (I feel Dudley and Butte’s paper should really have distinguished the two, as I have previously.)

People shouldn’t forget the science that underlies most methods. In the case of informatics-based methods, as Steven Salzberg says well in the comments, you’ll need computer science:

Good points, but I’d be careful not to over-emphasize the lack of training in computer science. For some critical applications, you have to have a deep understanding of algorithms and data structures […]

So sure, for quick and dirty solutions, computer science training is not usually necessary. […] But for those students who want to make major advances in bioinformatics, I would tell them to get some serious CS education.

I would also add that for methods that build on a theoretical background in, say, the physics of protein structures, phylogentic theory, etc., you’ll need that background too. As I have written about earlier, this specialist background is a characteristic of many computational biologists.

I suspect a reason behind Sandra’s picking up on these authors writing that “many successful bioinformatics apps are written by biologists who lack formal computer science training” lies in her own words “or like me, biologists who’ve gone digital”, i.e. that she identifies with them having mentioned someone with her own background. A subtlety is that computing-based work and computer-science and theoretical science-based work (and a merger of both) need to be distinguished.

We’re all guilty of this, me too: I identify with being a computational biologist. I also happen to think that from practical experience I have a few useful words to add!

A developer/user split is over-simplistic

Earlier in her article, Sandra writes:

In many ways, it’s easiest to understand what bioinformatics is, and to choose a bioinformatics-related career, by dividing the field’s participants into two groups: the tool builders and the tool users. The tool builders are the programmers, architects, computational biologists, and computer scientists who write new algorithms, create databases, and build software systems. The tool users are the biologists.

As I’ve previously written, I prefer to distinguish computational biologists and bioinformaticians. (See also the links the end of this article.)

She’s simplifying for the purpose of her blog no doubt. I’m forced to do the the same in the interests of time and it’s frustrating sometimes. She obviously doesn’t mean things are so black and white, as she accepts the idea of biologists developing applications.

That said, a simple splitting of those who work in the interface of computing (computer science / mathematics / statistics / etc) and biology into users and developers overlooks the people that do both and what they have to offer. (I’m biased here: that’s is my patch.) It might just make some sense if you limited the computer work to computing, and hence to data pipelines and the like, but even there, there is a need for biological input into the design.

To develop new methods well be both a developer and user

I’d like to take this further and say that the best method developers, in general, are those people who are both developers and users of their own methods. This is the “kernel of truth” of what the paper refers to. It’s a perfectly good point, but I doubt either a biologist with a little computing or a developer with little biology is well-placed to do this. (With exceptions, as always.)

I think method development is best in the hands of people with both the specialist biology and computer science background, and furthermore are interested in the method in question for their own use.

Given choice I prefer to simultaneously:

  • Start with a biological question to address; read the experimental understanding of this, etc.
  • Read the existing bioinformatics methods used to examine this type of data in depth (to pick up the issues and to not re-invent the wheel, something I see too often for my liking)
  • Read the experimental methods behind the data: in order to analyse the data you need to understand what the data actually is in some detail. What are the limitations of the methods? What is actually being measured? And so on.
  • Do a (or several) genuine analyses of data of the kind involved. They must be more than “toy” projects, they must have enough depth and richness that they throw up the issues involved. (This is where the first step comes in: if it is a problem you want to solve, you’ll be more attuned to the issues.)

(Frustratingly, I’m rarely employed in anything like this approach hence preferring a research grant. I appreciate it’s rare to do all of the things I list.)

Apropos of nothing. Really. (Source:
Apropos of nothing. Really. (Source:

You can’t develop tools of any kind properly unless you are an (advanced) user of your own tool.

I’ve seen too many bioinformatics methods developed that are “tested” in, comparatively-speaking, “toy” ways or that belie a real understanding of the biological science involved. My gut feeling is that many of the best methods arise from trying to solve a biological problem rather than floundering around looking for “another project” in the sense of using the applicants’ existing toolset of computational techniques.

Too often I see people “playing with their favourite toys”, for example someone keen on Hidden Markov Models (HMMs) applying them to this, that and the other possible application. It’s a sort of “I have a hammer” approach… some of these people need to take care what they’re trying to hit is in fact a nail.

What I’d like to see more of, and more appreciation for, is people who start with a biological question in mind, wrap their heads around the biological problem thoroughly, determine what theoretical approaches might assist address the biological questions at hand, then moving onto development implementing those theoretical approaches.

I’ll maintain that I think this best done by computational biologists. (But then I’m biased, so call me out if you like.)

Parallel to this, I would like to see better recognition by biologists that computational biology results can stand on their own. Too often I’m finding an insistence that “only” a wet-lab experimental result is considered “proof”. Really this needs another blog post…

Other bioinformatics posts at Code for life:

More on ’What is a computational biologist?’ (and related disciplines)

Retrospective–The mythology of bioinformatics

Bioinformatics — computing with biotechnology and molecular biology data

Computational biology: Natural history v. explanatory models

Bibliographies-why can’t research papers self-document what they are?

0 Responses to “Developing bioinformatics methods: by who and how”