7 Comments

Models, not just stamp collecting and managing datasets.

It feels good close to the 150th anniversary of the publishing of On the origin of species to note a thread running in some posts that reflects something close to my heart.

A key thing about Darwin’s work and other major achievements in science are that they present models of how something worked, frameworks in which observations could be placed.

I’ve nothing against observations, they’re vital, but there is a stage where you collect them without building something to hang them on, as it were.

I sometimes think that science on the grand scale proceeds in a repeated cycle of a mania of collecting until enough has been gathered that someone with a “model” view on the world sees patterns within the collections and suggests a framework for them.

Darwin’s work did both, of course, and the two parts of Browne’s biography can be see as marking out the two: the first volume the collecting and the second presenting the framework.

Statisticians would express this notion differently I suspect and would probably say that the data needs to reach a quantity such that hypotheses can be tested with significance. (Don’t you just love the dryness of it?) Others, from endeavours that do not quantitate in this manner, might talk of ‘exposing the underlying mechanism’.

I worry that experimental researchers don’t see the “model building” aspect of my field, bioinformatics. In fact it’s a reason I prefer a different title for myself: computational biologist.

Recently I wrote a short submission to a Genomics, medicine and law meeting. In it I included a brief description of computational biology:

Computational biology is the application of theoretical biology using computers and algorithms, statistics, physics–whatever will explain the system at hand–to biological systems. It’s centred on understanding the mechanisms in life. You can think of it as the ’model’ counterpart to data collecting.

Creating computational tools is part of the work, but the focus in on understanding the biological system rather than creating infrastructure per se (which is more associated with bio-IT). Basically, we do what experimental biologists do, but with different tools.

Others will define it differently, but that’s OK.

In the eager rush of the Human Genome Project (HGP), the quiet, almost isolated, little field of bioinformatics was seized upon as the solution to their data management problems. The trouble with this is that it views the field as a sort of service division.

To be fair, there are at least two major activities that go past “mere” data management in the application of bioinformatics in the early HGP: genome assembly and annotation. Both, however, are very much viewed as services.

I’m not saying services are wrong, they are needed and certainly have their place. I’m just saying that they’re not all there is, that bioinformatics shouldn’t be treated as only being about services.

That’s the impression that I guess biologists get when all they see of bioinformatics is the software tools they use and the services (larger) commercial companies offer. They probably don’t see the model-building work because for the most part it’s buried in journals that they don’t read.

This might be an unfair criticism of my colleagues–it probably is–but I’d like to hold onto it, as it illustrates a point that’s important to me.

I’m coloured by what my clients in New Zealand bring to me, of course. I’m particularly coloured given that my original intentions were to stay within one area of computational biology (protein-DNA and protein-protein interactions, with a focus on gene regulation) that I thought had enough application to be spread around several projects (clients).

I’ve long thought there is an (over) emphasis on genetics in this country. (I’ve seen Sir Peter Gluckman express similar sentiments, so I know I’m in good company saying this.) A few things are missing to my mind. There are too many to list here, but a couple that are relevant to my own work include:

  • good methods and work extending genetic data to proteins and their structure and function. In NZ, these are largely considered two nearly isolated areas, with little work extending genomics studies to include the latter.
  • treatment of genomes as structures, in the 3-D sense. Not insertions or deletions in sequence or copy number variations per se, but the chromatin and and nuclear structure of the genome inside the nucleus.

My research interests over the last 5+ years has been the latter. I’d be working on it if it weren’t for the lack of funding. (I still maintain interests in the former, mostly in developing new algorithms.)

In essence, I am interested in the mechanism of genomes. They’re not static things that are “read”, in the way they’re often described in the simple linear models of them as “strings of bases”. Genomes inside the nucleus are dynamic structures that are manipulated to expose or hide regulatory elements and genes.

Those familiar with the biology will realise I view the chromatin proteins as an integral part of genomes. This is not to say that I don’t distinguish the hereditary portion from that that is assembled on it, but that the functioning unit within the nucleus composed of these different components is too tightly integrated for them to be taken separately.

I want to take the masses of genome sequence and other genomics data and work towards a structural model we can place them on.

It’s a grand vision, I know. I’m not expecting to be Charles Darwin and change the world (!), but rather that this larger long-term vision be a focus for my smaller efforts.

To get funding in NZ I can’t help but think that, among other things, I have to overcome is a mindset that bioinformatics† is seen here largely as a service division seconded to collecting and not also it’s role in developing models.

Over the next few weeks or months, I hope to present an introductory series to epigenetics, my main research interest; how epigenetics relates to genomics and biology in general; what computational biology does and how all these play together.

Don’t expect these in a hurry. First I have the genomics + medicine posts to complete (the first and second parts are already available), then far too many loose ends.

General readers and science communicators, don’t worry: there will still be science communication posts and general stuff. The new material will probably only come to one post a week, if that.


† Outside of phylogenetics, which is often treated as a more-or-less separate subfield (I personally wish it were a little more integrated with the rest). And one or two other little niches, nothing is ever as black-and-white as this…