Genomes galore on the horizon

By Hilary Miller 10/11/2009

When the first human genome sequenced was published in 2003, it represented the culmination of 13 years of work and cost nearly 3 billion dollars to complete.  In the six years since then an additional 55 vertebrate genome sequences have been produced, and the technology has moved on to the extent that sequencing genomes is bordering on routine.  Now an ambitous proposal to sequence 10,000 vertebrate genomes has been launched with an article in Journal of Heredity:

With the same unity of purpose shown for the Human Genome Project, we can now contemplate reading the genetic heritage of all species, beginning today with the vertebrates. The feasibility of a “Genome 10K” (G10K) project to catalog the genomic diversity of 10 000 vertebrate genomes, approximately one for each vertebrate genus, requires only one more order of magnitude reduction in the cost of DNA sequencing, after the 4 orders of magnitude reduction we have seen in the last 10 years . The approximate number of 10 000 is a compromise between reasonable expectations for the reach of new sequencing technology over the next few years and adequate coverage of vertebrate species diversity. It is time to prepare for this undertaking.

The goal of the genome 10K project is to provide a window into vertebrate evolution, by making large scale comparisons across genomes possible.  Over the last 500-600 million years, vertebrates have evolved into a diversity of forms, occupying a vast range of habitats and exhibiting an huge array of differing lifestyles.  A number of biological innovations are unique to the vertebrate line including the adaptive immune system, the multichambered heart, cartilage, bones, teeth and the internal skeleton.  The project aims to enable researchers to investigate the genomic basis of this diversity and innovation:

DNA sequencing has ushered in a new era of investigation in the biological sciences, allowing us to embark for the first time on a truly comprehensive study of vertebrate evolution, the results of which will touch nearly every aspect of vertebrate biological enquiry.

The proposal is still just that – a proposal, put together by a group of museum curators and genomics experts.  Moving forward on the plan will require a large injection of cash (around US$50 million), and for the cost of sequencing a genome to drop to less than US$5,000 a piece – a price tag which is feasible but still some way off.   However the “Genome 10K Community of Scientists” (G10KCOS) has already begun preparing for the project by cataloguing the DNA and tissue specimens already available in labs and museums around the world. 

A potentially more major problem to overcome will be processing and analysing the huge amounts of data that will be produced.  Much of the increase in sequencing power and drop in cost has come from the advent of new short-read but massively parallel sequencing technologies.  These technologies produce several orders of magnitude more DNA sequence in a single run than the traditional technology used to sequence the human genome, but at the expense of the number of consecutive bases sequenced in each read.  The bioinformatics and computing power required to assemble this data has lagged behind the technology so that today the major bottleneck in genome sequencing comes at the sequence assembly stage, not the sequence generation stage.  In an article about the project published in Science, Webb Miller, a computer scientist at Penn State University points out that assembling 10,000 genomes in 5 years will require processing a genome a day. “There’s a real problem here”, he notes.  Even the imminent arrival of “third-generation” sequencers which promise longer reads and even more sequence output will only go some way to simplifying the huge task of assembling and annotating the sequence. 

So what does this mean for New Zealand?  The Genome 10K project aims to sample as widely as possible across the diversity of vertebrates, meaning that many of our iconic fauna will make it onto the sequencing list.  As the sole representative of an entire order of reptiles, tuatara is likely to be high on the list.  For birds, the project proposes to sequence one species from every genus, meaning kiwi, kakapo, a NZ wren (e.g. rifleman), wattlebird (e.g. tui), kea or kaka, stitchbird (hihi) and others will be in line for sequencing.  Similarly our native frogs, geckos, and the lesser short-tailed bat are sufficiently phylogenetically and biologically distinctive that they may be potential candidates for inclusion.  

The Genome 10k project promises to provide a veritable gold mine of data for evolutionary and conservation studies of our native species and many New Zealand researchers will be closely watching developments.  But it also raises some issues:  Where will the samples from our native species come from?  How will iwi concerns about samples being sent overseas, data being released into the public domain and potential exploitation of their taonga be addressed?  Many NZ species are already in zoos elsewhere in the world (tuatara and kiwi particularly) and samples from these animals could conceivably be used for genome sequencing without any input from NZ researchers at all.  No New Zealand based researchers are included in the G10KCOS, and the list of participating institutions from where specimens will be sourced includes only one New Zealand site – the University of Auckland.  The G10KCOS propose a network of around 20 sequencing sites, coordinated by a large data centre to produce the raw sequence data and draft assemblies.  It seems unlikely that New Zealand will house one of these sites but we should take the lead in assembly and annotation of genomes produced from our native species.

Reference: Genome 10K Community of Scientists (2009) Genome 10K: A Proposal to Obtain Whole-Genome Sequence for 10 000 Vertebrate Species. J HeredDOI 10.1093/jhered/esp086.