Bioinformatics – QC, reproducible, statistical and sequence-oriented

By Grant Jacobs 26/10/2011 2

For bioinformatics geeks, or biologists interested in it: a few ruminative thoughts and one concern – should more attention be paid to different areas of (computational) biology complementing eachother?

A few weeks ago I attended the bioinformatics satellite meeting of the annual Queenstown Molecular Biology Meetings, locally known as the QMB meeting.

If there was an over-arching theme, it would have been QC (quality control) with some some aspects of reproducibility woven within it, but a discussion at toward the end left me a little concerned.

As someone who has been long being arguing for better QC it was good to see several talks on this (although the repeating of the theme gave the perhaps misleading impression of a sudden rush of enthusiasm for it).

QC in many ways is simply care and attention to detail over your data: ensuring you understand what it is able to, and not able to, convey. Formal QC helps ensure you catch any gaffes before you ship the results. Good stuff, in other words.

Reproducibility, something I’ve been a fan of since my Ph.D. student days, also featured.[1] It’s particularly important with large datasets being made accessible to later (re)analysis, including by other research groups.

So far good, but one concern emerged from listening to an open discussion in which people referred to bioinformatics work: the only examples raised were from sequence analysis, especially high-throughput analyses, and the statistics associated with that.

While it will in part be a reflection of the earlier presentations, in the way that conversations tend to follow earlier leads, I wasn’t especially happy with this as bioinformatics is more than just stats and sequences.

Off the top of my head some other aspects we might include (I don’t mean this list to be comprehensive, but to illustrate that there are other aspect and areas):

  • computational structural biology, which includes use of computational geometry, physics and rotational and spatial statistics (as opposed to linear statistics), including molecular complexes and interactions
  • cheminformatics (and area I’m not very familiar with)
  • biophysics, in the wider sense, an area I feel is vital to future molecular biology, but seems to be treated as an obscure sideline [2]
  • the wider range of molecules, including carbohydrates and lipids that seem the perennial underdogs in modern biology
  • population-level work
  • systems biology (of course).

I could go on, but you get the drift.

The items in the list are obvious, I don’t expect anyone to be surprised by them. (Or I hope not.)

There’s plenty to be excited with what is and will follow from the sequencing work, but to bring these into the wider fold in biology, more is needed to be allied to the sequence data that just direct statistics applied to it.

Part of my own interests here will in part be from having an interest in genomes as high-order structures.  There are clear links to biophysics, for example. There are molecular complexes associated with these that both form these high-level structures and work on, or act upon, them relate to both the DNA sequence, other molecules and systems biology. And so on.

My concern was that a focus dominantly on one aspect at a time (sequencing in this case) detracts from viewing computational biology as a network of interacting aspects, not a series of isolated niches some of which may be more trendy at this time than others.

On that note perhaps it’s worth remembering, too, that some areas in biology temporarily lose popularity but later find relevance again. As one possible example, molecular  cytogenetics might deserve more consideration with respect to the current investigations into the higher-level structure of genomes through the various chromosome conformation capture experiments.[3] Best in my mind to keep these potentially complementary explorations alongside, if possible.

I suspect it’s easy when in the depths of large-scale projects, such as the big sequencing efforts that have been popular in recent years, to become ‘buried’ in them and lose sight of the associations with other aspects.

Perhaps a focus of a future meeting might be the interactions between these different areas, and how they might work together to create better project outcomes?


1. Before I started my Ph.D. I read about systems biology, or what passed for it at that time (the mid- to later 1980s). One of the systems used was based on Simula, a language I’d encountered as an undergraduate and enjoyed playing with. One feature were descriptions of all the components (today we’d call them objects and their properties, etc). Some of the concepts for describing systems in this general way I carried over into my own programs, recording all the inputs defining the work in the output.

2. It reminds me a little of how bioinformatics itself used to be treated, in fact, before the genome projects increased the demand-lead focus of biological research groups.

3. I’ll try write giving an outline of chromosome  conformation capture some other time; I recently written a book chapter on this topic. The example of molecular cytogenetics may not be the best but it’s the one I have in my head as I write!

Other articles on Code for life:

Retrospective: The mythology of bioinformatics

Reproducible research and computational biology

Developing bioinformatics methods: by who and how

External (bioinformatics) specialists: best on the grant from the onset

Research project coding v. end-user application coding

2 Responses to “Bioinformatics – QC, reproducible, statistical and sequence-oriented”

  • I’ve found with the bioinformatics that I’ve experienced in NZ (which I’ll admit is not a huge amount – yet) you tend to get islands of people concentrating on single areas. When I was at the Bioinformatics Institute in Auckland, most (not all) of the work being done by the people around me involved constructing phylogenies.
    it does mean that for my PhD I’m drawing on the expertise of people from wildly different speciality areas – It’s not a situation that I’m unhappy with in any way mind, I quite like the drawing strands from different areas together to create a synthesis.

Site Meter