Developing bioinformatics methods: by who and how
In my view the best method developers–generalising here–are those who are both an advanced user and developer of the method they are developing.
They are “scratching an itch”, developing the method to serve their own needs.
Before elaborating on this, let me quickly point to a couple of earlier posts that brush up against my views on this.
Developers can be scientists too
Some time ago (October 16) Fabiana Kubke wrote Methods in Neuroscience in which she considered that
There are, in my opinion, two types of scientists. Those who adapt the questions to the methods they use, and those who choose the methods based on the questions they need to ask.
She, of course, meant experimental scientists. I just had to respond by adding two categories of “method developers”, in particular:
2. Those that are focused on a [research] question, but will develop new methods if that’s what is needed to address the question.
As I went on to say, this is the fashion I prefer to work in myself. Start with a biological problem and through that develop methods (if needed). I’ll come back to this.
There is more than computing involved
A post and comments by Sandra Porter raise some points worth considering. From the paper her article is examining she quotes:
The success of bioinformatics software is based not on the elegance of the software design, but rather its utility as a tool for driving and answering biological questions. Consequently it is no surprise that many successful bioinformatics apps are written by biologists who lack formal computer science training, as they undoubtedly put scientific utility ahead of architectural elegance and completeness.
(Source: PLoS Computational Biology: A Quick Guide for Developing Bioinformatics Programming Skills. Dudley and Butte.)
I’m not going to review this paper itself. I’d recommend it to those interested involved, or becoming involved in, bioinformatics. While no one person is going to agree with every portion of advice there, it is well worth reading and most of what is said is worth taking note of. (See also Sandra Porter’s comments on the paper.)
What I’d like to focus on is what skills are needed to develop new methods.
I absolutely agree with Sandra that methods must have utility. Over the years I’ve seen many methods developed by computer scientists that, to be frank, were worthless because they flew in the face of biology or addressed problems that no-one needed to be solved.
However I disagree that biologists should be the people developing new methods, nor that in general they will be able to. I don’t think it’s wise to encourage people to a goal that they’re unlikely to be able to attain.
I do think that there is scope for biologists to develop new data pipelines, as opposed to the methods those pipelines use. The skills for the former can be derived from most computing textbooks with a little effort. Developing new methods, what the pipelines leverage, goes beyond that. (I feel Dudley and Butte’s paper should really have distinguished the two, as I have previously.)
People shouldn’t forget the science that underlies most methods. In the case of informatics-based methods, as Steven Salzberg says well in the comments, you’ll need computer science:
Good points, but I’d be careful not to over-emphasize the lack of training in computer science. For some critical applications, you have to have a deep understanding of algorithms and data structures […]
So sure, for quick and dirty solutions, computer science training is not usually necessary. […] But for those students who want to make major advances in bioinformatics, I would tell them to get some serious CS education.
I would also add that for methods that build on a theoretical background in, say, the physics of protein structures, phylogentic theory, etc., you’ll need that background too. As I have written about earlier, this specialist background is a characteristic of many computational biologists.
I suspect a reason behind Sandra’s picking up on these authors writing that “many successful bioinformatics apps are written by biologists who lack formal computer science training” lies in her own words “or like me, biologists who’ve gone digital”, i.e. that she identifies with them having mentioned someone with her own background. A subtlety is that computing-based work and computer-science and theoretical science-based work (and a merger of both) need to be distinguished.
We’re all guilty of this, me too: I identify with being a computational biologist. I also happen to think that from practical experience I have a few useful words to add!
A developer/user split is over-simplistic
Earlier in her article, Sandra writes:
In many ways, it’s easiest to understand what bioinformatics is, and to choose a bioinformatics-related career, by dividing the field’s participants into two groups: the tool builders and the tool users. The tool builders are the programmers, architects, computational biologists, and computer scientists who write new algorithms, create databases, and build software systems. The tool users are the biologists.
As I’ve previously written, I prefer to distinguish computational biologists and bioinformaticians. (See also the links the end of this article.)
She’s simplifying for the purpose of her blog no doubt. I’m forced to do the the same in the interests of time and it’s frustrating sometimes. She obviously doesn’t mean things are so black and white, as she accepts the idea of biologists developing applications.
That said, a simple splitting of those who work in the interface of computing (computer science / mathematics / statistics / etc) and biology into users and developers overlooks the people that do both and what they have to offer. (I’m biased here: that’s is my patch.) It might just make some sense if you limited the computer work to computing, and hence to data pipelines and the like, but even there, there is a need for biological input into the design.
To develop new methods well be both a developer and user
I’d like to take this further and say that the best method developers, in general, are those people who are both developers and users of their own methods. This is the “kernel of truth” of what the paper refers to. It’s a perfectly good point, but I doubt either a biologist with a little computing or a developer with little biology is well-placed to do this. (With exceptions, as always.)
I think method development is best in the hands of people with both the specialist biology and computer science background, and furthermore are interested in the method in question for their own use.
Given choice I prefer to simultaneously:
- Start with a biological question to address; read the experimental understanding of this, etc.
- Read the existing bioinformatics methods used to examine this type of data in depth (to pick up the issues and to not re-invent the wheel, something I see too often for my liking)
- Read the experimental methods behind the data: in order to analyse the data you need to understand what the data actually is in some detail. What are the limitations of the methods? What is actually being measured? And so on.
- Do a (or several) genuine analyses of data of the kind involved. They must be more than “toy” projects, they must have enough depth and richness that they throw up the issues involved. (This is where the first step comes in: if it is a problem you want to solve, you’ll be more attuned to the issues.)
(Frustratingly, I’m rarely employed in anything like this approach hence preferring a research grant. I appreciate it’s rare to do all of the things I list.)
You can’t develop tools of any kind properly unless you are an (advanced) user of your own tool.
I’ve seen too many bioinformatics methods developed that are “tested” in, comparatively-speaking, “toy” ways or that belie a real understanding of the biological science involved. My gut feeling is that many of the best methods arise from trying to solve a biological problem rather than floundering around looking for “another project” in the sense of using the applicants’ existing toolset of computational techniques.
Too often I see people “playing with their favourite toys”, for example someone keen on Hidden Markov Models (HMMs) applying them to this, that and the other possible application. It’s a sort of “I have a hammer” approach… some of these people need to take care what they’re trying to hit is in fact a nail.
What I’d like to see more of, and more appreciation for, is people who start with a biological question in mind, wrap their heads around the biological problem thoroughly, determine what theoretical approaches might assist address the biological questions at hand, then moving onto development implementing those theoretical approaches.
I’ll maintain that I think this best done by computational biologists. (But then I’m biased, so call me out if you like.)
Parallel to this, I would like to see better recognition by biologists that computational biology results can stand on their own. Too often I’m finding an insistence that “only” a wet-lab experimental result is considered “proof”. Really this needs another blog post…
Other bioinformatics posts at Code for life:
More on ’What is a computational biologist?’ (and related disciplines)
Retrospective–The mythology of bioinformatics
Bioinformatics — computing with biotechnology and molecular biology data
Computational biology: Natural history v. explanatory models
Bibliographies-why can’t research papers self-document what they are?
0 Responses to “Developing bioinformatics methods: by who and how”
[…] Developing bioinformatics methods: by who and how (for computational biologists; more computational posts lists listed at the end of this post) […]
For those wishing to comment, comments are temporarily open (i.e. no registration required).
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
[…] Developing bioinformatics methods: by who and how […]
Great post, I totally agree with you.
Thanks for sharing.
Cheers.
This is a neat post, but I feel like there’s one extra approach that biologist/developers can take – pair up with a friendly professional when you need to.
Case in point – I am what you might call a biologist developer. I am driven by my interest in biological questions, and occasionally the available software just isn’t up to the task. I’m fine at developing new methods and writing scripts, but writing good software is another question.
Recently, I teamed up with a developer friend (ex professional programmer and software engineer, now philosopher). This is a great combo – he takes care of design issues, we collaborate on a lot of the programming, and I lay down the law on what the software has to do, how fast, and how.
The great thing is that we now have a successful piece of software, that thanks to his good design is extensible and useful in a range of cases we didn’t initially envisage.
I’d encourage other biologists to seek out friendly developers when they need to.
Rob
(First, a caveat: I haven’t time to re-read my old post! Bear that in mind in my remarks below.)
I think that I think the answer depends on what you intend to build.
(Just so you can see where I’m coming from, I’m a computational biology consultant available for hire, etc. I trained in both biology and computer science from undergraduate level onwards.)
What you suggest can work well for projects that can be very clearly defined numerically (e.g. work implementing ‘straight’ statistics) or is a project that is essentially entirely infrastructural (e.g. database work). It also would help to have ample time for communication.
I think that there are many projects where biologists would do better working with a good computational biologist, someone who knows both the theoretical aspects of the biology and the computational biology. (One catch being finding someone fitting the bill, of course.)
Quite a bit computational biology, in general, wants a good understanding of the (theoretical) biology by the person coding and doing the analysis. I’ve seen plenty of efforts by computer science graduates, for example, in bioinformatics that while the code pretty much does what they claim the resulting product can be inappropriate for real-world computational biology.
Extensive communication can help, which I suspect is where you’re winning out over problems in your case. Not everyone would have time, nor the inclination to do that. (Good for you, though! Also, your position may not be typical of biologists – my impression is you are working as someone already familiar with coding and some of the theory to be implemented – ?)
There is also the aspect of developing application software, i.e. considering the differences in research vs. application coding, something I touched on in another post a few years ago following someone’s lead.
I’d elaborate further on this and related things, but I should leave it for another post. This is already getting long enough, it’s a very big topic and I’m running up to a deadline so I can’t tackle it over the next few days at least.
One thing that is a nuisance, though, is that too many academic projects are poorly coded and/or poorly tested. (Don’t get me started on that…! It’s part of what I was referring to in locating a good computational biologist. There’s an argument for seeing computational biologists raise their game too!)