Your questions – epigenetics David Winter Oct 02

1 Comment

It’s time for us to answer another of your questions about tuatara, evolution and the science of genomics. This time, a question from Sciblog’s on Darcy Cowan

“What about epigenetic markers? How important are these when considering if all the genomic information has been captured?”

We had better start answering Darcy’s question by saying what epigenetics is.

What, if anything, is epigenetics?

So far, this blog has focused on discussing how and why we are trying to determining the sequence of the tuatara genome. Of course, there is much more to an organism that its DNA. Indeed, every cell in a tuatara has the same genome* but those cells come in many different forms, and combine in various ways to produce the different organs that keep the tuatara’s day-to-day biology ticking over. Clearly, DNA alone can’t tell us how those differences arise.

In fact, if we want to know how sequences in the tuatara genome translate into tuatara biology we need to know not just what genes tuatara have, but how those genes are used. All organisms have to control the way in which their genes are expressed, in order to direct their development, maintain each cell’s function and react to environmental inputs. Given the importance of gene regulation, it is not surprising that organisms have developed an impressive set of tools to control the process. For genes that make proteins, every stage from the unpacking of chromosomal DNA down to the final production of a protein can be subject to control:

 Mecahanisms by which gene expression is controlle

The term “epigenetics” is a slippery one. In its modern sense it refers to the control of the gene expression, but depending on who is talking it can refer to different subsets of the processes illustrated above. Increasingly, the term is used to describe any process that alters the way a gene is expressed. Most scientists still prefer definitions that are more restrictive. Some save the word for processes that chemically alter DNA and the proteins that pack DNA into chromosomes. Others limit it to those changes in gene expression that can be passed on through cell division or even from parent to offspring.

The last definition, restricting epigenetic effects to those that can be inherited from one cell division to the next, is the most popular one in the scientific literature. Such changes can be brought about by almost any of the processes illustrated above. Proteins and RNAs that control the expression of a gene can set up feedback loops that survive into daughter cells after cell-division. Similarly, some changes in the way in which DNA is packaged into chromosomes can also maintained. The most widely talked about epigenetic process, however, is DNA methylation. In this case, a small molecular “tag” is added to DNA bases in regulatory regions of a gene, resulting in the down regulation of that gene.


That picture make DNA methylation look simple, but it’s important to remember these tags only one part of the complex way in which gene expression is controlled. The presence of DNA methylation is itself altered and maintained by proteins, which are derived from genes, the expression of which are in turn controlled by all of the mechanisms depicted above.

Why is there a buzz around epigenetics ?

Epigenetics is a popular topic in biology these days. Hardly a week passes without someone making a bold claim for the power of epigenetic research to explain disease, change our understanding of evolution or finally tell us whether nature or nurture provide the most important input to our development. Some articles on epigenetics would even lead you to believe that scientists have only recently considered the control of gene expression an important topic, or that the concept of an environmental input to the process is a new idea. Of course, that’s not true. Quantitative geneticists had already developed methods to handle the interaction of genes and environmental influences in the 1920s, long before they knew what a gene was made of. Similarly, the control of gene expression was a hot topic in molecular biology in the 1960s, and has been continuously studied since that time.

There is something new, and genuinely exciting, about modern epigenetic research. It is now abundantly clear the DNA methylation is an important player in the control of gene expression in some organisms, including mammals like us. The gene expression changes brought about by changes in DNA methylation are also relatively stable, so they can help to explain long-term changes in the way organisms use their genes. Just as importantly, modern sequencing technologies make it easy to measure the presence or absence of methylation on DNA bases across whole genomes. That means biologists can directly measure difference in methylation between individuals, between different tissues in the same individual, or between healthy and cancerous cells in the same tissue.

Epigenetic research has shown that cancerous cells often have apparent patterns of DNA methylation, including tags that silence the genes usually responsible for overseeing the repair of DNA. It’s also shown us how honey bees can switch careers in the middle of their lives and how some plants can direct their development to match with the environment that find themselves in. It remains to be seen if epigenetics lives up the hype that surrounds the topic, but, to finally answer Darcy’s question, there is no doubt that epigenetics is important.

So, what are we doing about epigenetics?

Darcy wanted to know how important epigenetic markers would in recording “all the genomic information” for tuatara.  If the goal of a genome project was to completely explain the molecular basis of an organism’s biology, then epigenetic data would be extremely important. The control of gene expression is the key mechanism by which organisms direct their development, and one means by which all organisms react to their environment. Epigenetics, however defined, is an important contributor to gene expression and DNA methylation is one part of that puzzle that can be directly measured.

So, if a genome project was about trying to completely explain an organism’s biology then recording epigenetic data would be very important. But no study, and certainly no a genome study, can answer every question about an animal. Indeed, one of the main goals of the project is to provide a resource that will help scientists ask and answer new questions about tuatara for decades to come. We will not directly measure any eipigenetic markers**, but the draft sequence we are producing will allow other researchers to do so at a whole-genome scale. Even better, we will work to annotate the draft genome with information on the function of the different genes we sequences. Thus, anyone performing epigenetic studies on Tuatara will be able to connect their results to the biological function of the genes that relate to.

If our brief description of epigenetics has left you wanting more details, be sure to check out Grant Jacobs post on the same topic (and answering the same question).  Grant is going to feature a series of posts on the same topic, so stay tuned for more.

*It’s hard to find hard and fast laws in biology, and even this isn’t quite true. Mutations accrue over the course of an organisms’s life, making the DNA in some cells different from others. Some organisms also go out of their way to alter the structure of the genome in some cells, particularly in order to create variation that might help fight diseases.

**Although we may use transcriptome studies to compare expression of particular genes in different tissues)

What we already know – a tuatara transcriptome David Winter Aug 14

No Comments

We are not starting from scratch in our mission to understand the genetics of tuatara. Scientists have been working on these creatures for more than a hundred years, and in that time plenty of researchers have used tuatara DNA to try to understand the world. For the most part, these studies have used DNA sequences as witnesses to evolutionary history, rather than data from which to understand the day-to-day biology of tuatara.

Hilary Miller is one researcher who has taken a genetic approach to understanding how tuatara work. In her postdoctoral research at Victoria University, Hilary sequenced and analysed tuatara MHC genes. These genes play an important role in the immune system of vertebrates,  helping their carriers develop immunity to diseases they encounter during their lives. Populations that contain many different variants for each of the MHC genes are well placed to deal with new diseases, so maintaining MHC-diversity is of great interest to conservation biologists and managers.  Hilary and her colleagues showed that some tuatara populations have relatively low MHC diversity, and what what variation there is does not contribute much to mating behaviour in tuatara (as opposed to some mammals and birds, which actively seek out mates with different MHC genes).

Up until last year, there were a few hundred tuatara DNA sequences known to science. That included all of Hilary’s research, a few other studies that focused the functions of particular genes and all those studies that used tuatara DNA as a witness to the evolution of reptiles. Then Hilary, working with colleagues at Massey University, sequenced 33, 000 more. Hilary’s study was the first to use modern sequencing technologies on tuatara genes, gave us our first look at what the tuatara genome will be like, and will be a great help in making sense of the sequences created in the tuatara genome project. So we asked Hilary a few questions about her work.

The sequences you published make up what’s called “transcriptome” – what does that mean?

A transcriptome is a set of expressed genes in a given cell type.  Every cell contains a copy of the genome in its nucleus, but in any given cell only a subset of the genes in the genome will be active – transcribed into mRNA and then translated into protein.  A transcriptome is built from the mRNA, so it only contains sequences of genes that are expressed, not all the non-coding DNA that makes up a large part of the full genome.



Is there a particular reason you used an embryo for the first transcriptome sequence?

We were aiming to find genes involved in sex determination (which in tuatara is regulated by temperature, not by sex chromosomes), so we used an embryo from the approximate stage when sex is determined.  It was also one of the only ways we could get tuatara tissue that was in good enough condition to extract mRNA without sacrificing an adult animal, which obviously we didn’t want to do.

 Before your study, what did we know about tuatara genetics. How much more do we know now?

Tuatara had mostly only been studied from a population genetics/phylogenetics perspective, so we really only had genetic markers like microsatellites and mitochondrial genes that were useful for those studies.  Only a small number of functional genes had previously been isolated from tuatara – there were something like 60 gene sequences from tuatara in the Genbank database, including the immune genes I previously isolated for earlier postdoc work.   With the transcriptome, we now have about 33,000 gene sequences for tuatara, so we’ve increased the genomic information we have for tuatara 500-fold.  We don’t know what all these sequences are, as about half of them don’t match to any known genes from other species in Genbank, but for about 15,000 of the sequences we have a pretty good idea of what genes they are from.  So this dataset has really improved our knowledge of what the tuatara genome looks like.

 One of the things we’ve talked about on this blog is the challenge of building an unknown sequence from short fragments. You used really short reads in your study – did you have much trouble building up your large sequences?

The gene sequences that make up a transcriptome are usually much shorter and easier to assemble than genome sequences, so short reads aren’t so much of a problem, as long as you have enough of them and have read-pairing information.  A bigger problem was verifying that the sequences I’d assembled were correct.  Most transcriptome studies rely on a closely related genome to verify sequences are assembled correctly, and of course for tuatara we don’t have anything closely related enough for comparison.  So we had to do a lot of careful checking of the assemblies to make sure the reads had been put together properly, and compared them as best we could to more distantly related species for which there is a lot of genomic information.

What was the most exciting result to come from the transcriptome study?

Well, we didn’t find many sex-determining genes that we’d hoped for, but we did find some genes that could be involved in regulating genes in response to temperature changes, like heat-shock proteins and a cold-inducible RNA-binding protein.  Now that we have these gene sequences, we can begin to study their function.  For example, looking at whether there is a difference in their expression when eggs are incubated at different temperatures.  There’s a whole raft of studies just waiting to be done using the transcriptome data – from investigating the evolution of various gene families, to looking for genes that might be involved in local adaptation of different tuatara populations.  And the transcriptome sequences will hopefully also help with assembling and annotating the full genome sequence.


Hilary wrote the Chicken or Egg blog at sciblogs, in which she detailed her own work on tuatara and other topics in conservartion genetics. She now works for Biomatters, a bioinformatics company in Auckland, which is most famous for it’s popular sequence-managing software geneious

Your questions – How many species? David Winter Jul 19


Here’s the first in our series answering your questions about the project. We start with a question from Maggy Wassilieff, who wants to know about the tuatara population living on North Brother Island in the Cook Strait:

Could you comment on the current understanding of the Brothers Is Tuatara? Is it a separate species? How long has the Brothers Is tuatara been isolated from other nearby populations?

Just what is a species anyway?

If we want to know how many tuatara species there are, and whether the North Brother Island population is distinct from other tuatara, we first need to know just what we mean by “species”. It may come as some surprise to learn that there isn’t a simple answer to this question. Biologists have spent at least 150 years grappling with what we now call the “species problem”, and in that time we have managed to define at least 26 sets of rules for determining where one species ends and another starts.

Thankfully, we don’t have to review all 26 of these “species concepts”, or the endless arguments they elicit, to know whether the tuatara living on Brothers Island are a distinct species. Instead, we can use our knowledge of where species come from to help us define what a species is. We have been studying the origin of species (speciation) for a long time now, and we have a pretty good idea how it works. New species start to form when populations stop sharing genes with each other. Once that process starts, changes in one population can’t affect what’s going on in others. That is, each population has its own evolutionary trajectory and is free to change and, in time, become distinct from all other populations.

If we take “species” to mean a population that is capable of maintaining its own evolutionary trajectory, then the job for biologists is to find evidence for this sort of independent evolution. There are lots of different sources of data we can use for this process, and the history  of the North Brother Island tuatara is a lovely example of the way new data has changed the way scientists understand the world.


In the beginning there was one

It took scientists a while to realise just how important tuatara are. They were first introduced to the scientific world in 1831, after John E. Gray found a tuatara skull sitting on display at the Royal College of Surgeons in London (it’s not clear how the skull made it to the UK). Gray recognised the skull was interesting, but didn’t fully grasp what he was looking at. He dedicated all of 9 lines in the first volume of his Zoological Miscellany to its “peculiar structure” while confusing the tuatara for a lizard. Crucially, Gray decided the skull represented a new genus (the taxonomic rank above species) and gave it the name in Sphenodon* (meaning wedge-toothed).

A tuatara skull, like the one Gray examined. I guess the teeth are pretty wedge-like. Image is courtesy of Te Papa Collections Online, and provided under a CC ND-NC license.

Ten years after he described the tuatara skull, Gray was sent a complete tuatara specimen by Johann Dieffenbach. Again, Gray didn’t quite get what he was looking at. As he didn’t dissect the specimen,  he didn’t see the unique skull and, so, didn’t realise he was looking at another Sphenodon. Instead, the poor old tuatara was again confused for a lizard and given a new name: Hatteria punctatus. Even if science hadn’t yet worked out where tuatara fitted in the diversity of life, scientists had at least recognised one species.

It wasn’t until 1867 that Albert Gunther connected the skull called Sphenodon to the animal called Hatteria punctatus and realised the tuatara represented a distinct order of reptiles separate from lizards. By the rules of taxonomy, the first name given to a creature is always the preferred one, so in this case the one tuatara species was moved into the older genus and called  Sphenodon punctatus.

Gray, Gunther and other European taxonomists had described tuatara based on the few specimens that made their way to the Old World. Naturalists in New Zealand, who could see tuatara on many different off-shore islands, started noticing that some of the animals they found seemed a bit different than the one described under that name S. punctatus. In particular, in 1877 Walter Buller decided the tuatara living on North Brother Island in the Cook Strait where different enough that they represented a distinct species and named them S. guntheri, in honour of Albert Gunther.  There were also names given to the endemic population on Hauturu /Little Barrier Island and to an apparently extinct species known only from sub-fossil bones in Northland.

It seems, however, that no one paid much notice to these other species. When tuatara were legally protected in 1895 only S. punctatus was named in the law. For the next hundred years or so, most New Zealand biologists and conservationists treated tuatara as a single species.


The first wave of molecular data

Charles Daugherty headed the team that first used genetic data to determine how many tuatara species there are

As genes are passed down from one generation to the next, they record the evolutionary history of the populations through which they move. Over time, the forces of mutation, natural selection and plain old chance combine within populations to make genetic change inevitable. If two populations subject to these changes are not connected by gene flow, those changes will take populations in different directions and each will end up with a distinct genetic make-up. For this reason, genetic data is invaluable for scientists trying to determine whether apparently distinct populations represent separate species with separate histories and diverging futures.

It wasn’t until 1990 that biologists could get enough genetic data to test the idea that tuatara should be considered a single species. Charles Daugherty from Victoria University headed up a team that collected blood samples from tuatara populations on 24 off-shore islands (Daugherty et al, 1990, doi: 10.1038/347177a0).

In those days, the genetic data evolutionary and conservation biologists could get their hands on came from a technique called isozyme electrophoresis. Many of the enzymes that keep the chemical processes in your cells ticking over come in several subtly different forms. As these forms have different shapes, and different electrical charges, these variants can be separated by loading samples of blood or tissue onto a matrix (like a gel), and applying a charge that moves each enzyme-variant according to its own charge, size and shape.

Daugherty and his colleagues looked at 25 different enzymes, and found a striking pattern. The North Brother population, the one that Buller had given a special name to, stuck out as being different from all the others. Given this genetic difference, and a number of subtle morphological differences, Daugherty and his team recommended that the North Brother population should be recognised as distinct species and the name S. guntheri was resurrected.


Summary of the 1990 study – in which the N. Brother population stood apart from all others. (Click to enlarge)


The second wave of molecular data

For 20 years or so the North Brother tuatara were recognized and managed as a separate species. In that time, the ability of geneticists to create datasets grew enormously. So, in 2010 a new team, including Daugherty and headed up by Jennifer Hay who was also an author on the 1990 paper, got together to apply new data to the old question of tuatara taxonomy (Hay et al, 2010. doi: 10.1007/s10592-009-9952-7). This time, the biologists could determine the sequence of specific parts of a tuatara DNA, and use a type of short repetitive DNA marker called “microsatellites” to compare populations.

When all this new data was added to the picture, it became clear that the North brother Island tuatara weren’t quite as distinct as the isozyme study suggested. If there is a major split in the genetics of tuatara it corresponds to the geographic split between populations in Northern New Zealand and those in the Cook Strait. The North Brother population fits within the Cook Strait group.


Relationships among tuatara populations, as revealed by mitochondrial DNA sequences. With these data, the N. Brother population falls within the diversity of the Cook Strait population

Faced with new evidence, Hay and her colleagues did what scientists do, and updated their picture of the world. They concluded the North Brother tuatara population doesn’t represent a distinct species. That’s not to say the earlier work on these populations was wrong – it’s clear from both studies that he Cook Strait tuatara contain genetic diversity not present in populations from Northern New Zealand. This difference  probably arose in part because the many populations that used to exist between Cook Strait and Northern New Zealand are now extinct. Even so, it is important for the conservation of tuatara that the genetic diversity of these species is maintained. The 2010 paper removes species-status from S. guntheri, but makes the case that the conservation of the Cook Strait populations should be managed separately from that of the Northern tuatara.

A third wave?

Between the isozyme study, and the DNA sequences and microsattelites examined in the 2010 paper, biologists have now used about 0.001% of the tuatara genome to uncover the history of these populations. Just as different populations have their own histories, different genes can take slightly different paths through the populations that they move through. The tuatara reference genome sequence that we are creating will make it much easier for researchers to study many more genes from each remaining tuatara population, and perhaps from ancient DNA in sub-fossil bones. In addition to helping researchers to build a better picture of the recent history of tuatara in New Zealand, the reference sequence should enable research on genes relating to immune response, sex determination and other functions directly related to survival in tuatara.

*In fact, to compound his other mistakes, Gray misspelled the word as “Sphænodon”. Pretty much everyone realised Gary’s error, so the correct spelling took over and is now officially recognized.

Hillary Millar, who is a tuatara biologists herself, wrote about the taxonomic history of the tuatara on her blog. You should check it out, and stay tuned for an update from Hillary on her most recent tuatara genetic paper.





Interview with Graeme Hill David Winter Jul 18

No Comments

Neil’s take-over of the New Zealand media scene continues apace.

This weekend Neil and Graeme Hill from Radio Live had a chat about the project , Allan Wilson’s impact on evolutionary biology and where the tuatara fits into the New Zealand fauna.

Tuatara genome project in the news David Winter Jul 05


Neil was on Radio New Zealand National this morning, discussing the tuatara genome with Kathryn Ryan

That’s not the first bit of media interest in the project, so here, for those of you who just can’t get enough tuatara news in your life, is a list of stories on the project from other sites:

We’ll keep an updated list of media stories here, and point everyone to them as they come in via this blog’s twitter account.

Any questions? David Winter Jul 03


This blog is very much for you, the reader.

We’ve been really pleased with the reaction to our opening few posts, and especially happy that readers have asked us questions about the project. I’ve forwarded those questions on to the people that can best answer them, and hope to dedicate a post to those answers next week. There is still time for more. If there is something you’ve always wanted to know about tuatara, some aspect of genetics or genomics you’ve never quite understood or if one of our opening posts left you wondering about something let us know. You can comment on this post, use the contact form to send a message privately or tweet to us at @tuataragenome, however you send your question we’ll do our best to have an expert provide you an answer.

An update from the boss David Winter Jul 01

1 Comment

It’s about time this blog moved from the abstract to the concrete. Now you know why we are sequencing the tuatara genome, and have an idea about how we’ll go about doing it, it’s time to meet the boss and hear how the project is going so far.

Neil Gemmell is the leader of the tuatara genome project. Neil is a professor in the Department of Anatomy at Otago University, where he and his lab study the biology of  reproduction from the level of genes all the way up to the consequences of reproductive biology on ecology, evolution, conservation and economics. I asked him a few questions about the project, and how it’s going

How did you end up heading the tuatara genome project?

This happened through a rather fortuitous sequence of events. I was at an international meeting of scientists in Santa Cruz, California in 2011, with about 100 other scientists who were proposing to sequence the genomes of 10,000 vertebrates; the Genome10K consortium. An initial list of priority species had been produced, the top 100 if you like, and there at the top was tuatara. I innocently asked who was leading the project, whether they had samples or sequence already, and importantly whether they had had discussion with iwi. It turned out that although a team were willing to commit resource to getting the sequence, none of that crucial work had been undertaken. I offered to find out what would be needed, talked to a variety of people and slowly but surely became more involved in the project as I realised that it could, and indeed should, be undertaken in New Zealand. Via a collaboration with Ngatiwai iwi, the Allan Wilson Centre for Molecular Ecology and Evolution, together with initial support from New Zealand Genomics Limited and Illumina, we are well on our way to producing a draft sequence of tuatara.

Is there any one thing you’d really like to learn from the genome sequence, or are you happy to be surprised?

I’m hoping to be surprised. There are a few things I’m particularly interested in though. These include why the genome is so big, roughly 5 Gbp, thus 70% bigger than human. Often such large genomes contain numerous repetitive elements, but the only work undertaken prior to the genome sequencing project suggested tuatara might have relatively few repeats. If the large genome size is not due to repeats this would be unusual. Another area of considerable interest to me is around the genes involved in sex determination. In humans and almost all other mammals the presence of a Y chromosome containing a gene called Sry, is the trigger for male sexual development. In tuatara sex is govern by the incubation temperature of the eggs; when the temperature is high embryos develop as males , while embryos develop as females when the temperature is lower. Lots of reptiles have this form of sex determination, but we remain rather ignorant of the mechanism. It maybe that the tuatara genome, together with comparisons to those of other animals, may enable us to start to identify likely mechanisms of sex determination that we can subsequently test.


What stage is the project up to? What’s been done and what’s happening at the moment

The project breaks into three main steps: sequencing (collecting the raw data), assembly (putting it together, ideally in the right order) and annotation (trying to determine the function to each component of the sequence). The first step is as you’d imagine, obtaining a vast quantity of raw sequence and we’ve done plenty of this over the past year. We now have roughly 70 fold coverage of the genome, which means that on average each base pair of the genome has been sequenced 70 times. What we now trying to do is to produce an genome assembly. This involves taking the equivalent of paper confetti, perhaps created from a great work like Darwin’s Origin of the Species, and trying to piece together every word, sentence, paragraph, page and chapter in the right order to reproduce the book. Thus far we have managed to rebuild the equivalent of ‘pages’ of the tuatara genome, but we don’t yet fully know how these are ordered. We are now using some clever sequencing technology, called mate-pair jump libraries, to help us figure out how many letters and words lie between two point in our book to help us better order or scaffold these sequences.


How many people are working on the project at the moment, and where do they come from?

We have had a team of four working on collecting the sequence data: Becky Laurie, Rob Day and Aaron Jeffs (NZGL Otago) and Lorraine Berry (NZGL Massey). On the assembly side we have both local, national and international collaborators: Kim Rutherford (Otago), Ross Crowhurst (Plant and Food), Steven Salzberg (John’s Hopkins) who have led the majority of this work together with members of Ross and Steven’s teams. We also have had support from Nicky Nelson (VUW), Scott Edwards (Harvard), Bob Macey (Berkley), and Pieter de Jong (CHORI) who have contributed resources and know how to the project thus far, plus numerous other offers for collaboration and assistance. Last, we have had excellent support of this endeavour from Ngatiwai, through the efforts of Clive Stone.


What will the next steps be?

The next steps are to complete the genome assembly. This is a complex task requiring lots of computer resource and expertise and is highly iterative. Thus to speed up this process we will be establishing a genome assembly challenge with the view that we might be able to encourage some of the best labs in the world at assembly, including Steven Salzberg’s, to have a go at producing the best assembly they can from our data with their favourite programs and algorithms. We’d hope to start that in the next month or two. In addition we will be looking to obtain data on RNA expression patterns from a variety of tissues to help us with the final task of annotation as well as start to explore the genome sequence data in more detail.

First find your tuatara (or how to sequence a genome) David Winter Jun 25


So, now we’ve told you why we’re so keen on sequencing the tuatara genome you might want to know exactly how we are going to do it. As the project goes on we will get the experts working on each stage to describe exactly what they’re up to, and how it’s going. But we also want to give you a broad overview of how the project will proceed. Here then, in five simple steps, we present a guide to sequencing a genome:

1. First, find a tuatara


Genome sequencing projects are easier if you use DNA from a single representative of the species you are studying. For tuatara, getting that representative isn’t entirely easy. Although they once lived pretty much everywhere in New Zealand, tuatara are now almost entirely restricted to offshore islands (the only tuatara living on the mainland these days are in sanctuaries or, like the fine looking fellow above, in  museums) .

Quite of few of the islands that retain tuatara populations are in the rohe of Ngatiwai, who act as kaitiaki or guardians of the animals on these islands. Ngatiwai are partners in the tuatara genome project, so, in 2011 representatives of the iwi, along with Dr Nicky Nelson and PhD student  Lindsay Mickelson from Victoria University, collected a small blood sample from a large male tuatara on Motumuka (Lady Alice Island) in the Hen and Chickens group.

2. Put your tuatara back

It should  go without saying, but, as tuatara are an endangered species none will be harmed in this project. Sequencing a genome only requires a little bit of blood, and reptiles like the tuatara require even less blood than most since their red blood cells contain DNA (unlike mammals, where red blood cells lose their DNA before they circulate). Once we had taken about two mL of blood,  and about two minutes of time, from our tuatara we let him get back to his life on Motumuka.

3. Prepare your DNA

Once you have your blood sample, you need to prepare DNA from that sample for sequencing. This really is just a matter of following a recipe very carefully. By adding chemicals, heating your sample and spinning it at great speed you can break the cells in your sample down into their chemical components, and isolate DNA from that mixture. This, like almost everything you ever do in a molecular biology lab, will get you a tube with a very small volume of clear almost colourless liquid*.

That small volume of liquid  will have millions of DNA molecules, each one being a long chain built from four different nucleotide ‘bases’, which we usually refer to using the abbreviated names ‘A’, ‘C’ ,’T’ and ‘G’. Your goal in sequencing the genome is to work out the order of those chemical bases in the DNA molecules.



The first step toward determining the base pair sequence is to smash the long chains of DNA into shorter, more manageable chunks. There are lots of different ways of doing this. For the tuatara DNA we blasted the DNA with very high frequency sound. Once you have your DNA sufficiently small you need to prepare it for the sequencing reaction. In this case we had to add some extra DNA bases to each of the fragments – these so called “adapter sequences” are crucial for the next step.

4. Sequence at room temperature for two weeks

The machines that work out the base sequence of DNA molecules are incredible wonders of modern engineering. For the tuatara genome, all the steps described below are going to be controlled by an Illumina HiSeq 2000 which looks like this:



All the DNA prepared above will be loaded in one of these flow cells, which is about the size of a microscope slide.



Each of the eight lanes in those flow cells is coated with millions of very short DNA molecules, which are designed to catch on to the adapter sequences you added to your DNA fragments in Step 3.  Once your  DNA fragments attach themselves to the flow cell, they are copied to create a tiny population of cloned molecules. In time, millions of DNA fragments will attach to the flow cell and create clusters of identical DNA molecules:


To work out the base sequence of each fragment, the DNA is copied one more time. This time, the replication is carefully controlled so that only one base is added to each fragment at a time. The bases added at this step have fluorescent labels, with a different colour for each of the four possible bases.



At the end of every one base-pair step of this reaction the flow cell is scanned by a laser and the light shining out from each cluster on the flow cell is recorded as an image. By keeping track of the colour (and therefore the base being added) at each cluster the sequence of hundreds of millions of DNA fragments can be determined in a single run**



5. Assemble your genome

Once your sequencing machine has finished its run you will have all the data you’ll need to build your genome. But none of it will make any sense. Because the fragments that are sequenced are so small (about 100 bases) and the genome is so big (about 5,000,000,000 bases) it’s very hard to know how to put all the fragments together.

Imagine trying to piece together a sentence in English if all you had were these fragments and no context as to what they might mean:


'ave postulat', ' have postul', 'aped our not', 'not escaped ', 
'airing we ha', 'escaped our ', 'ave postulat', 'It has not e', 
' immediately', 't escaped ou'


If you look carefully, you might notice a few of those fragments seem to overlap with each other. By linking up the overlapping segments, you can reconstruct some of the words in the original sentence. In this example we can reconstruct two chunks of the original sentence:

airing we ha
          have postul
           ave postulat
           ave postulat
It has not e
       not escaped
         t escaped ou
           escaped our
              aped our not

Now you have a new problem. You have two fragments of the original sentence, but no idea how they relate to each other. This is a common problem in assembling genomes too. One way to get around this problem is to sequence some longer DNA fragments. Although we can’t sequence all the way across these larger fragments,  we can sequence both ends of them and, if we prepare them carefully we can know how long they are.  Combining these bits of information can help us join up unconnected sections of DNA. Here’s the two sentence-chunks connected by some of these “paired-end” fragments:

         ot escaped o---------------the specific
            escaped our ---------------specific pa
                       notice that ---------------iring we hav
                                    the specific---------------e postulated

With enough short and long fragments  you should be able to cover the whole genome (or sentence!) multiple times and reconstruct what you started out with. In our example, it’s the line which Watson and Crick famously described the importance of their discovery of the structure of DNA:

It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.

Of course, assembling a genome is much harder than putting together a sentence. You will have millions of reads, each made of only four bases, and there will be no sentence structure, syntax, or even words to provide clues as to how they might go together. Because assembling a genome is such a hard task, lots of different software packages have been developed to do just this, and scientists that build the tuatara genome plan to test different assembly software to find the one that does the best job with this dataset.

It’s important to note, even though the software (and the scientists) that  build up genomes are very good at what they do, there will always be some uncertainty in a genome sequence. The sequence you produce will be a draft, with most regions very well understood and a few others will remain murky for some time

Steps 6 – infinity

Congratulations. By the time you finish step 5 you will have a a draft genome sequence. In many ways, this is just the start of the project. Now you will want to know what all those bases mean. Which genes do tuatara share with other reptiles, and which seem to be unique to the species? Can we work the genetic basis of the tuatara’s unique biology? Which genes might be of interest to conservation biologists trying to manage the species?

To answer these questions you’ll need to compare the tuatara genome sequence to DNA sequences fromother organisms.  We will answer some of these questions as part of the tuatara genome project, more importantly, the draft sequence we produce will be available for any scientists that wants to work on any question .

*Rob Day, who prepared the sample, tells me it had a yellow-ish tinge
**There are a couple of nice animations showing how the sequencing process works on Youtube. One from Illumina themselves and another from Aiden Flynn.
The large DNA image is a composite of “Chomosome ” by wikimedia user KES47 and “difference DNA RNA”  by Sponk. Our image is provided under a CC-BY-SA license.
The image of the flow cell is courtesy of the DOE Join Genome Institute and is CC-BY-ND-NC.
Other images produced for this post are CC-BY.

Why sequence the tuatara genome? David Winter Jun 17



Sequencing and making sense of the tuatara genome is going to be a big project.  It will take the skill and dedication of many scientists, help from our partner organisations and probably several hundred thousand dollars on top of that.

We can only put so much time and effort into this project because we know a tuatara genome sequence will provide a unique insight into how evolution works, and serve as a valuable resource to other scientists. So, why are we so confident that the tuatara genome is going to repay our investment?

The tuatara’s unique position in evolutionary history

To understand why a tuatara genome is such a tantalising prospect for scientists you need to know how the tuatara relates to other reptiles. All life on earth is connected by a shared evolutionary history. When biologists try to organise the diversity of life on earth, we reconstruct that history by finding groups of species that all descend from a shared common ancestor.  Here’s what the history of the reptiles looks like


Evolutionary relationships among reptiles. See footnote for attributions for these figures.


You sometimes hear people mistakenly call tuatara “living dinosaurs”.  In fact, as you can see in the figure above, tuatara are much more interesting than that. If you want to study a living dinosaur you only need to look out the nearest window. Modern birds descend from one branch in the diverse group we call dinosaurs, but each of those ten thousand species are dinosaurs. The tuatara, on the other hand, are the only living members of a lineage that separated from other reptiles more than 200 million years ago.

By placing modern organisms in the context of their evolutionary history, we can work out which traits were present in ancestral species, and reconstruct the changes that gave rise to modern ones. As the tuatara is the only living witness to hundreds of millions of years of evolution, its genome sequence will be immensely valuable in understanding the genetic changes that have allowed reptiles to evolve and diversify.

In fact, even the tiny amount that we already know about tuatara genetics has helped us understand not just the evolution of reptiles, but how mammals (like us) have evolved.


Doing it in New Zealand

Given the unique position the tuatara has in the history of life, it was always going to be a target for a genome sequencing project. Indeed, when our partners the Genome10k Project drew up a list of the most important genomes to sequence they put tuatara at the top.

The tuatara also has a unique position in New Zealand culture. It is considered a taonga or treasure by Maori and is an important part of a natural heritage that all New Zealanders are justifiably proud of. Given the spiritual and cultural importance of tuatara, we thought it was important that its genome be sequenced in New Zealand and with the help of those iwi who have special relationships with the species. Ngātiwai, kaitiaki(guardians) of tuatara within their rohe in Northern New Zealand, are partners in the project and hope the knowledge this project generates can inform their conservation efforts

Doing the genome project in New Zealand also means we can focus on the tuatara themselves, and not just their importance in understanding evolution more generally. The tuatara is an endangered species, and learning more about its biology may help us protect it. Populations with lots of genetic diversity are better able to overcome threats, such as climate change and disease, which may put species at greater risk in the future. By producing a single reference genome we will make it much easier to get genetic data from tuatara populations, and help the Department of Conservation and Ngātiwai manage the species in way that increases its chances of survival.

First the tuatara, then the world

As well as providing us with new knowledge, the tuatara genome project will help New Zealand scientists develop the skills required to deal with the massive amounts of data created in genome projects. The capacity to undertake such initatives has been recently developed in New Zealand through the establishmnet of New Zealand Genomics Limited, a collaborative infrastructure that provides genomics technology and bioinformatics services to underpin research in a broad range of areas, including medicine, agriculture and the environment. Much of the work will be carried out by PhD students and postdoctoral researchers (recent PhD graduates), many part of  the Allan Wilson Centre for Molecular Ecology and Evolution who support the tuatara genome project as part of their strategic initiatives. Over the course of the project, the skills and knowledge they and others develop will help to build the skills and infrastructure required to tackle future genome projects in New Zealand, whether they be focused on species of economic, ecological or evolutionary importance.

Why blog about sequencing the tuatara genome?

For all the reasons described above, we are excited to be working on the tuatara genome project. We hope this blog will help share our enthusiasm, and give readers an insight into how a genome sequence is produced.

The Tree diagram from this post is derrived form “Tuatara Cladogram“, by wikipedia user Benchill. Two extra lineages are illustrated thanks to PhyloPic The (non-avian) dinosaur is Dryptosaurus aquilunguis as illustrated by contributor Conty, and the turtle is from Scott Hartman. Our figure is released under a CC BY-NC-SA 3.0 license.

Stand by David Winter Jun 14

No Comments

You’re too quick!

We are going to use this blog to tell you all about the sequencing of the tuatara genome, but we still have a few ‘i’s to dot ‘t’s to cross before we launch. Come back on Monday June 17th when you can learn why we are so excited about being able to sequence these amazing animals.

Network-wide options by YD - Freelance Wordpress Developer