What good is a genome anyway?

By Hilary Miller 30/09/2009

I read an interesting post by Olivia Judson at the New York Times blog a few weeks ago, which asked if you could sequence any genome, what would you choose?  Olivia’s choice was the coelacanth– a worthy choice, given that the coelacanth may represent the ancestor of all tetrapods (the amphibians, reptiles, birds and mammals).  No prizes for guessing what my choice would be…

Anyway, this got me thinking.  What good is a genome sequence?  What is it going to tell us about our favourite organism that good old-fashioned biological enquiry and lab work hasn’t been able to tell us so far?  Whenever the idea of sequencing the tuatara genome is discussed, one of the major questions that comes back (especially from non-geneticists) is “why?  Even though genome sequencing is getting faster and cheaper by the day, it still requires huge resources of time and money and it’s not always obvious why its worth going to the effort.  

By now you may be wondering what sequencing a genome actually means.  The genome of an organism refers to all the DNA that makes up one set of its chromosomes – this includes all the genes, and all the pieces of DNA in between the genes.  DNA is made up of 4 nucleotides, or bases – Adenine (A), Guanine (G), Cytosine (C) and Thymine (T).  So sequencing a genome means determining the entire sequence of As, Gs, Cs and Ts for all of the chromosomes.  Because genomes are so large (most mammalian genomes are a few billion bases long), they have to be sequenced in pieces (usually of a few hundred bases each) then put back together.  Sequencing all the pieces is quick and easy, but putting them together takes huge amounts of computing power – like doing one giant jigsaw puzzle with several million pieces. 

DNA packaged into chromosomes makes up the genome. (Source: Wikimedia commons http://commons.wikimedia.org/wiki/File:Genome.jpg)

DNA packaged into chromosomes makes up the genome. (Source: http://commons.wikimedia.org/wiki/File:Genome.jpg)

Having the DNA sequence is only just the beginning, however.  The DNA sequence itself is meaningless until it is annotated – this involves figuring out which bits are the genes, what these genes are and eventually, what the genes do and how they interact.  Annotation can take years and require huge amounts of bioinformatics manpower (and computer power).  To give you some idea of what we’re talking about here, the cow genome sequence was finished a few months ago and involved more than 300 researchers in 25 countries, including 15 analysis teams to turn the raw data into meaningful knowledge. 

So what do we get out of all this investment of time and money?  Once we have an annotated whole genome sequence we theoretically know all the genes an organism has, and where they are found on the chromosomes in relation to each other.  And even if our genome sequence is only partially annotated, we still have a huge head start on finding the genes we are interested in.  Most of the power in a genome sequence comes with being able to compare its structure with other genomes already sequenced, enabling us to work on a much larger scale than was previously possible and without having to “guess” which genes may be important to investigate in advance. 

This kind of scaling up has revolutionised evolutionary biology, enabling us to spot patterns in genome evolution that wouldn’t be apparent if we were only studying individual genes.  For instance, we can identify parts of the genome that are conserved across distantly related species – these may have some important functional role which means their DNA sequence hasn’t changed much over millions of years, and may even be regions of the genome previously regarded as “junk” or nonfunctional DNA.  We can also study how genes are formed and lost, or identify genes that appear to have evolved faster or taken on a new function in a particular species.  Genome sequences can also help us understand how species and populations are related to each other.  Instead of just having a handful of genes or markers available to build phylogenetic trees we now have literally thousands of markers at our fingertips, enabling more powerful comparisons to be made. 

Whole genome sequences can also facilitate research where we only want to look at individual genes or particular regions of the genome to understand a particular biological trait, for instance resistance to a particular disease.  For this type of more traditional genetics research having a whole genome is unnecessary, but it does provide a very convenient shortcut.  Finding a particular gene or region of a chromosome and then sequencing it can be a laborious task requiring many hours in the lab.  With genome sequencing becoming faster and cheaper by the day, it is fast becoming more cost-effective to sequence an entire genome upfront than to pick off small pieces of it to sequence as individual projects.  

So there are a couple of good reasons to sequence whole genomes.  Genome sequences allow researchers to work on a much larger scale than was previously possible to answer some fundamental questions about evolution; and circumvent the need for a whole lot of laborious labwork to isolate specific genes or genomic regions.  So perhaps the question should be ”which genome should you sequence?”  That might have to be a topic for another post…

0 Responses to “What good is a genome anyway?”

  • Nice piece, not quite sure about this sentence though

    A wor­thy choice, given that the Coela­canth may rep­re­sent the ances­tor of all tetrapods

    Coelacanths came after the first fish-a-pods, and more to the point a living organism can’t be an ancestor! I know what you mean though, looking at a genome from this sister group to tetrapods would likely tell us alot about tetrapod specific evolution.

    The other question is do you need whole genomes to attack the sorts of questions you’re talking about. It seems to me you can do a lot of what is still called genomics at a fraction of the cost by building up a library of 454 sequences and resequencing the bits you are interested in with other technologies. Of course that relies on the sort of “upscaling’ you talked about – if you don’t have close relative to do the annotations from then you are going to be in the dark. (not much good for the Coela­canth).

    • more to the point a liv­ing organ­ism can’t be an ances­tor

      Yes, quite right (I was trying to get this across using the vague word “represent” instead of “is” – but it still probably gives the wrong impression!)

      Thats a good point you make that “genomics” doesn’t have to mean sequencing whole genomes. With the next-gen sequencing technology thats around nowadays you can get a whole lot of information from one sequencing run and this has been a huge boon for doing genomics on non-model organisms. But ultimately sequencing a whole genome might become just as easy. The hard bit (whether its just one 454 run, or a whole genome) will still be the annotation regardless of how cheap sequencing is. This is a big problem for lots of NZ species in particular where there are no close relatives with much existing genome data available. Tuatara is particularly problematic being 250 million years or so removed from its nearest relative!

  • I’ll try very hard not to say too much (!), but as a computational biologist (aka bioinformaticist), I agree that annotation is a big issue, as is extracting the real “knowledge” from the data coming out of genome projects and other high-throughput techniques. (This knowledge is more than just annotating a genome, which really is only a starting point for further work, in my view.). It’s a key reason that I choose a tag line of “from data to knowledge” for my consultancy when I started it about ten years ago.