Mary Gray, PhD Student, Clinical Genetics Research Group
The Human Genome Project began in 1990 and took 13 years and at least $3 billion USD to complete. A biotechnology company called Illumina has recently announced that you can sequence your own genome for $10,000 USD – and it will only take around eight days to complete. The costs of genome sequencing will come down further, with the ultimate goal so far being the $1,000 genome which will probably be available in three to five years. So what has happened to technology since the first heroic attempt to sequence the human genome?
The human genome contains approximately 3 billion nucleotides or bases. The nucleotides in our DNA have four different chemical compositions which are called adenine (A), thymine (T), guanine (G) and cytosine (C). Your DNA is a double helix so these nucleotides must pair up in some way — A is paired to T and G is paired to C. The order of the nucleotides A, T, G and C make up the DNA sequence which ultimately determines your genes and this order of nucleotides is what sequencing is decoding.
The Human Genome Project mostly used a technology called Sanger Sequencing. Sanger Sequencing was first publically reported as a technique in 1977 and is still used today. It has of course been refined and is now an automated process. The sequencing reaction uses the amplified copies of your DNA fragment of interest (usually a maximum of 1000 nucleotides long), an enzyme called a polymerase which adds nucleotides to a DNA strand, some free nucleotides, and modified nucleotides. The modified nucleotides are ‘chain terminating’ chemical variants of the normal ATGC nucleotides, plus a chemically attached fluorescent label. The polymerase adds nucleotides to your melted DNA fragment (the double helix is unravelled and the DNA is now single stranded) until it comes across a modified nucleotide. The modified nucleotide does not have the right chemical linker so no more nucleotides can be added to the chain of nucleotides once it has been inserted. So if you have four mixtures — the difference between them is that you added modified A, T, G or C only, you will then be able to deduce the sequence of the DNA by the size of the resultant fragments when they are separated on a gel. For example in your four lane gel, your A lane fragments were 1 and 3 bases long, no fragments in your T lane, in your G lane fragments that were 2 bases long and in your C lane there was a fragment 4 bases long the sequence would read AGAC. This is now much more automated, but as you can imagine there will be limits on how many fragments of DNA that can be sequenced at once.
What technology now offers us is highly parallel sequencing — that is many thousands of unique fragments of amplified DNA can be sequenced simultaneously. This technology has been dubbed ‘Next Generation Sequencing.’ Currently there are three main companies that offer this service; they are Applied Biosystems (SOLiD), Illumina (Solexa) and Roche (454). They all use different chemistries to perform the sequencing reaction. Roche, the first company to commerciality provide ‘Next Gen Seq’ uses pyrosequencing — a light based system that measures light emitted from an enzyme called luciferase every time a known nucleotide is added to the reaction. Illumina uses a different technique where modified florescent nucleotides are added and detected with a camera. Applied Biosystems use a labelled dual nucleotide and a ‘sequence primer’ — a chain of nucleotides of known length, to interrogate each of the DNA nucleotides twice. All three systems have their advantages — Roche gives you longer reads (nucleotides decoded in a row), Illumina gives you more reads and SOLiD may give your more accurate reads. They do of course have their limitations that are specific to what chemistries are used to amplify DNA fragments — a step required so that light signals are strong enough to be measured.
So where to from here? New technology is being designed so that a single molecule of DNA can be sequenced so there is no more need to amplify fragments of DNA to get enough signals. This will have several implications. Firstly, errors in sequencing due to ‘amplification bias’ will not exist. Some parts of the genome are easy to copy than others so that some regions can be missed using current methods. Secondly, imagine being able to use the single DNA molecule from a single cell — the applications for this technique could provide more information about cancer development, cell differentiation, less invasive genetic screens of unborn children, criminal investigations, more accurate ancient DNA sequencing and the non human work like sequencing a single bacterial cell, or a rare insect without the need to kill it. This technology isn’t too far away from being commercialised.
But what do you do with your human genome once it’s sequenced? You can add yourself the list of already sequenced genomes if you wanted. If you included incomplete genomes (where only genes or only a single chromosome was sequenced) you’d be looking at ~4000 human genomes. You could find out information about your genetic origins and even get a vague idea about what diseases you may be susceptible too. Currently not enough is known about what the individual differences in genome sequences really do. Some things are known, so you would be able to tell if you had two copies of a mutation in the HERC2 gene you probably have blue eyes — though this one you can see in the mirror! Most genetic variants have not been assigned a function; of the ones that have a known function only contribute a small amount to your risks for certain diseases, the way you look or the way you metabolise your food. The whole genome does provide the big picture, but until we find out what it all means we are still far away from, say finding a single cell from a criminal’s eyelash, sequencing their genome and being able to come up with a picture of what they look like. For now, the human application of Next Generation Sequencing allows scientists to find causes for rare genetic diseases where a change in a single or small number of DNA nucleotides has a big negative functional impact on a gene. The technology is taking the time frame for finding these mutations from months/years to weeks/months, increasing the information about human genetics, and most importantly providing answers to families affected by genetic disease. The $10,000 genome, and soon the $1,000 genome will make this technology more widely available for those kinds of projects.