Answering a question from the Tuatara genome blog and starting a new series, Not Just DNA.
We are entering the age of the genome.
Our genomes tell us about ourselves. Genome sequences show our evolutionary relationships with other species. They reveal the origin and migration of humans. Unwelcome changes to our genomes can bring disability or illness; understanding genetic variation may bring personalised medicine. We can peer at the genetics of a yet-to-be-born child. DNA sequences are evidence for courtrooms and can track the source of an infectious disease outbreak.
But do we need to know more than the DNA sequence of a genome to understand it? It may be a question with a self-evident answer, but it’s important with so much effort being put into genome sequencing.
Plans are well advanced to sequence over a hundred thousand human genomes and the genomes of thousands of different species. Genomics England are set to sequence the genomes of 100,000 Brits over the next five years. (The 1000 genomes project was started in 2008 and published last year.) Personal genome sequencing is to found internationally. Genomes of individuals include an Australian aborigine, from various tribes in Africa, the southern tip of India and different Asian groups among many others including the Neanderthal genome. In New Zealand, we have a pilot project for the use of genomics in medical diagnosis. Organised efforts are working through a long list of species whose genomes are being sequenced: the Genome 10K project aims to sequence 10,000 vertebrates, roughly one for every vertebrate genus; the i5K initiative is to do the same for 5,000 insects and arthropods.
There are many more projects like this – and this is just the beginning. It’s much more that the human genome sequence and a few things that happened afterwards.
Knowing our genome
Our genomes are central to who we are and are often described as a blueprint for us.
A better description would be that our genomes code for a collection of parts that our biology can choose from to use to build us.
As our bodies grow, it’s the choice of what parts to use, when, and the interaction of those parts that make us.
Those interactions include interacting with the genome itself.
How do these bits used together to make us?
Does the DNA sequence by itself code for all of this?
Not just for our genomes, but also the genomes of the thousands of different species being sequenced.
“What about epigenetic markers? How important are these when considering if all the genomic information has been captured?”
The wider question is what makes a genome ‘work’?
It’s more than just DNA.
We grow from a single fertilised egg into a complex body of many cells.
Different types of cells use different subsets of the components our genomes code for. What components each cell uses are determined by systems that control which of our genes are used in each cell.
A genome sequence, in and of itself, won’t tell us which genes will be used in what types of cells, ‘just’ that particular genes (and their control regions) exist.
Part of what controls genes are epigenetic marks, chemical modifications of the DNA or chemical modifications of the proteins that package DNA in our cells like those in the cartoon to the left. (Don’t worry about the details, we can turn to these in later posts in the Not Just DNA theme.)
All of our cells carry the same genome (with a few exceptions); by contrast the epigenetic marks and structures that control what genes are used differ in different cell types.
This is a central point in thinking about Darcy’s question: with a genome sequence in hand you have the genome for all cells in the body; the epigenetic state of a genome, however, is particular to the cell type or mixture of cell types they were recorded from.
Remembering that there are mixtures of cells is important when researchers interpret data about genomes. If a mixture of different types of things are recorded, the results are either an ensemble of the different things or, when taken as one value, the average or the most common (median) value. There are efforts to study single cells in response to this.
While epigenetic marks are particular to each cell type, surveying all the epigenetic marks found in the genomes of a mixture of cells can still be useful, for example to try identify locations in the DNA that proteins that regulate genes are likely to bind to. If the cells are a mixture of types, researchers won’t know which cell types use what regulatory region, but they will know that a region of DNA is likely to be a regulatory site in some cell types.
In fair Verona, where we lay our scene,
Thus runs the second line of the prologue for Romeo and Juliet, laying out the scene and action for one of Shakespeare’s best-known plays.
At this point many writers would define the modern notion of epigenetics, describe the chemical modifications of DNA and the proteins that package DNA in the nucleus, and dive into other elements that are considered to be part of epigenetics.
Described by their elements in isolation accounts of genomes and epigenetics are a bit like the story of the blind men describing an elephant from the parts each felt, made worse by that we don’t yet know all the parts.
Like studying an animal’s behaviour, or their special adaptations, if we want to understand our genomes in action we need to look at it’s natural habitat and how it interacts with and is interacted on by things there.
Let’s first tour the natural setting of our genomes, the scene Not Just DNA will explore.
Inside our cells, our genomes are stored within the nucleus, an organelle devoted to storing and processing that precious cargo.
To the right are three oblong-shaped HeLa cells stained with the dye Hoechst 33258, which colours nuclear DNA bright blue. You can see how the nucleus holds the DNA of each cell.
You’ll have seen DNA shown as a string of letters, repeated lines of A, C, G and T like those in the book pages shown earlier.
It’s easy think of DNA as a linear thing, encouraged by the linear DNA sequence. In reality DNA in a cell is arranged in the space of the nucleus, associated with different bits of molecular machinery found there and packaged around proteins.
Not only that, but DNA is not static but is moved about within the nucleus, with inactive genes held near the outer surface and active genes moved towards the interior. On a finer scale, the packaging that holds our DNA is re-organised depending on how a gene is to be used in each kind of cell in our bodies. (The chemical modifications Darcy referred to are part of this re-packaging.)
Looking inside the nucleus of our cells, they are crammed full of things besides our genomes. You can see some of the larger or better-known ones in the illustration below.
What affects our genomes are more than the chemical modifications Darcy asked about. If you were to look in the research literature you’d find that pretty much everything inside the nucleus has has an impact on gene regulation one way or other.
Our genomes are organised within the space of our cell nuclei.
Each of our 46 chromosomes occupies (more-or-less) non-overlapping regions of the space of the nucleus, with active use of genes mostly occurring at the edges of these chromosome territories. Specific stretches of DNA are anchored to fibres near the inside of the membrane of the nucleus. (These fibres are known as nuclear lamina; diseases from ‘broken’ lamina are called laminopathies.)
Inactive genes are packed away and held to near the periphery of the nucleus. Genes that become active have their packing relaxed and are pulled into the interior of the nucleus. The control of the packing of the DNA, tight for unused genes and more accessible for those being used is dynamic, changing in response to stimuli that encourage use or disuse of particular genes.
There are organised structures or specialised regions within our cell nuclei. Not all of these contain DNA, but they all contribute to how our genome work. (Don’t panic at the details; just take in the flavour of the thing.)
Genes coding for the RNAs that are used when reading genes coding for proteins (tRNAs) are within the nucleolus, the location used to make ribosomes, the large RNA and protein ‘machines’ used to translate the RNA copies of protein-coding genes into proteins. Just what paraspeckles do isn’t entirely known but they are likely to be related to making long RNAs that are involved in gene regulation. Cajal bodies, sometimes known as stress bodies, are typically found in cells that are especially active (have high metabolism). PML bodies, named for the tumour cells they were first found in – promyelocytic leukemia cell – are still something of a mystery, too, but are associated with a chemical modification of proteins inside the nucleus called sumoylation. Nuclear pores serve as gateways, controlling what goes in and out of the nucleus.
Many regulatory factors and bits of molecular machinery are involved in making a genome work. Some proteins or RNAs control the rate a gene is used. Whether the DNA bases that a protein (or RNA) would bind to have a methyl group chemically added or not can affect if the regulatory factor can bind it’s intended site on the DNA. In this way chemical modifications of the DNA can control how a gene is used.
DNA is almost never ‘naked’ in a cell. (Yes, biologists call DNA on it’s lonesome naked DNA.)
Our DNA is wrapped around small bobbin-like protein complexes (nucleosomes) made of histone proteins. These histone proteins can be chemically modified affecting how they pack against one another, in turn affecting the compactness and accessibility of a gene. Even prokaryote—bacterial—DNA is wrapped on proteins.
Molecular machinery of all kinds act on our DNA. There are molecular ‘machines’ to release stress built up in the DNA (topoisomerases); to load histones onto DNA to form nucleosomes—the bobbin-like structures our DNA is wrapped around—or shift nucleosomes to a new position; to transcribe a gene to an RNA copy. Other proteins enzymatically modify the DNA or the histone proteins DNA is wrapped around.
This is just a small sampling of some of the things inside a cell nucleus. There is a truly enormous amount of activity going on the nucleus of your cells. The nucleus isn’t a place where DNA is stored as a passive thing lying in there; our genomes are actively pushed and pulled around and modified to make them work. It’s more than just the DNA that makes our genomes work, it’s also all the stuff around them, interacting with them.
The Not Just DNA theme
As part of this topical series we’ll drop in on some of the things involved in making our genomes work and see what they get up to, sometimes looking at something that is very much about the DNA itself, other times at things that work with our DNA for different purposes.
The wider view may be looked at in some articles. What happens to genomes in cancer. Do our cells have mixtures of different genomes (chimerism, mosaicism). How do regulatory proteins (or RNAs) get to the DNA sequences they use to regulate genes.
A related theme is to think about genomes physically, physically both in the sense ‘things’—a molecule, the environment of the genome, complexes of molecules—and in the sense of physics. Just as we are entering a genome era in terms of what genomes can tell us about ourselves and other species, (some!) genome scientists might consider we are entering an era of the physical genome.
The Not Just DNA theme will extend to things outside the immediate biology of DNA. How genetic diagnostics are being put to use. Possibly legal issues. How do computational biologists build up a genome from the pieces of DNA they sequence. Some articles will talk about other species.
If you are interested in following this theme, a tip: read the footnotes! These will contain points that are relevant but do not easily fit in the flow of text for a reader wanting a quicker read of the topic at hand.
Genomes are a theme of our times. They are part of what define us and what makes each species different. With Not Just DNA we can explore part of what makes us, us.
David Winter has posted his answer to Darcy Cowan on the Tuatara Genome Project blog.
Considering the role of the nucleus and it’s architecture in thinking about regulating (mammalian) genomes isn’t new—it’s been around a very long time—but it still feels like a perspective that has never fully integrated into the wider genome and genetics research community. I rarely see aspects of it discussed in genome sequencing papers.
For those wishing to pursue a deeper reading, the CSH Perspectives collection of nucleus-related papers is one useful all-at-one-site resource (now ~3 years old). I believe this collection is gathered in Misteli & Spector’s book The Nucleus. A word about the age of reference material: this field is very active, so material can date quite quickly as better understanding is developed.
(For those interested in the book, I note the printed book is currently heavily discounted (75%), but for US buyers only. [No hope for New Zealanders!] Similarly, the book Epigenetics is currently 50% discounted. These two books together would provide a decent introduction to the field.)
There are focus issues in a number of the better journals, for example the March 2013 edition of Nature Structural and Molecular Biology.
There are too many sources for individual aspects, hence my not listing them: you’ll have to do the usual thing of drawing them from the sources they’re found in.
An older paper (2005) by Tom Misteli, Concepts in nuclear architecture, is available on-line (PDF file). For research purposes this is now getting long in the tooth, but it lays out many elements well.
1. As introduced earlier, it’s probably better to think of this as a topic rather than a series. I won’t be writing a serialised book (as I might if someone was paying me to!), but approach topics as they appeal.
2. Scientists working on genomes would say they have already entered the age of the genome; I’m talking about the general public and the medical community who’ve only seen the beginning of what routine genome analysis might offer.
3. For one source of completed genomes try GOLD, the Genomes On-Line Database. As I write they report over 28,000 genomes at various stages of completion. (This will underestimate of the total number of genomes being done worldwide.)
“What about epigenetic markers? How important are these when considering if all the genomic information has been captured? Should we get to the point of creating artificial sequences for organisms larger than microbes will we need to concern ourselves with these?”
David Winter, the author of the Tuatara genome project blog, replied,
“If you don’t mind, I might split into two sub-questions for answers. (a) How much information does the genome sequence itself give us, and will understanding epigenetic tags (and other data on top of the genome sequence) be important?”
I asked if David minded me jumping in and making a few comments on the first aspect.
My research interests include the detailed structure and bioinformatics of epigenetics and gene regulation.
You’ll see that I have only addressed the question briefly and have not explained the epigenetic ‘tags’ at all. As I want Not Just DNA to be an on-going theme I feel obliged to lay out the larger scene first. This is particularly important to me as I’d like to explore the physical nature of genomes whereas the question refers only one part of this.
Just as I was adding the final illustrations, David posted his article on epigenetics on the Tuatara Genome Project blog.
5. There are around 200 types of cells identified in the human body. A short discussion and a list of 210 cell types can be found in The Molecular Biology of the Cell. Another list can be found at Wikipedia; note the provisos in defining cell types in the previous reference, though.
6. Some cells rearrange a few genes, combining them to form the one used. The best known example of this is the immunoglobin genes that code for how to make antibodies, the proteins that our immune systems use to recognise foreign matter in our body. Immunoglobin genes start as an array of gene snippets that are reorganised to produce a single gene coding for one of many kinds of antibodies. (It’s this that this year’s plenary speaker at the Queenstown Molecular Biology meetings, Susumu Tonegawa, won his Nobel Prize for.)
A few cells have no DNA at all. Our red blood cells—that carry the oxygen in our blood—are like that. No DNA. Other species have more unusual examples, but I can tell you about these in another article.
7. You don’t have to work only from epigenetic marks or directly measuring where protein bind a genome (ChIP-seq analysis) to look for gene regulatory regions. Because genes that are actively being used are copied into RNA, researchers can measure what genes are being used in a cell by identifying what RNA copies of genes have been made, then study the DNA sequences around the genes that are being used to try identify regulatory features shared by genes used under similar conditions or in similar cell types.
The computational biology of gene regulation is a long-established area, one I am involved in. I may try write a little about this under the Not Just DNA theme – time and distractions permitting. (Expressions of interest—or disinterest—welcome.) In brief, gene regulatory proteins—transcription factors—recognise short DNA sequences. Researchers can either use catalogs of previously-identified transcription factor binding sites or take their collection of genes known to be used in a common situation and look for short DNA sequences before (or after) those genes. Like all niches in biology, there are many subtle issues in what otherwise might seem a simple task.
8. For those who must—and, yes, I must—here’s the full prologue as delivered wonderfully by Mark Williams (as Philip Henslowe’s tailor, Wabash) in the movie Shakespeare in Love, struggling to overcome his character’s stutter on the opening ‘T’:
Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
Where civil blood makes civil hands unclean.
From forth the fatal loins of these two foes
A pair of star-cross’d lovers take their life;
Whose misadventured piteous overthrows
Do with their death bury their parents’ strife.
The fearful passage of their death-mark’d love,
And the continuance of their parents’ rage,
Which, but their children’s end, nought could remove,
Is now the two hours’ traffic of our stage;
The which if you with patient ears attend,
What here shall miss, our toil shall strive to mend.
Mark Williams will probably be better known to most as Arthur Weasley in the Harry Potter movies; he also stars in the BBC series, Father Brown.
9. I don’t even pretend this represents all that is in the nucleus, there are just too many things you could point to. The idea is to just give the general nature of the beast, so that non-scientist readers might have some general idea of the environment genomes sit within.
On the subject of defining epigenetics, there are many definitions of epigenetics with some bending themselves out of shape trying to define the thing. A number of these, even from otherwise worthy sources, are wrong-headed and some of these inaccuracies are spreading to places you wish they wouldn’t. This needs an article to itself; it’d just muddy the waters here.
A very short introduction to chemical modifications of DNA and histone proteins can be found on The Scientist website as a large JPEG image.
10. HeLa cells are named after the person they were obtained from, Henrietta Lacks. HeLa cells are an ‘immortal’ cell line derived from cervical cancer and have been widely used in biology. Their providence and lack of approval sought at the time has been the source of some attention, especially on the sequencing of the HeLa cell genome earlier this year. Rebecca Skloots has written a best-selling book, The Immortal Life of Henrietta Lacks (recommended). I’ve presented some of Skloots’ thoughts on creative non-fiction and the book in an earlier post, Rebecca Skloot on writing creative non-fiction.
11. Chromosome territories have been argued for a long time, for example from the work of Cremer & Cremer (brothers Thomas and Christoph) showing that localised damage to the nucleus tends to affect the same chromosome, rather than a mixture of several chromosomes. A brief account is given in Box 1 of this review, Chromosome territories (PDF file).
12. Recent work work examining the structure and use of genomes in single cells suggests active use of genes—transcription—occurs at the edge of chromosome territories. With luck I may cover this paper in an up-coming edition of Not Just DNA.
13. There’s some fantastic work slowly building up a high-resolution model of the nuclear pore complex. One page to browse is the University of Illinois Nuclear Pore Complex page. (Links to other research groups involved are at the end of this page.) Other large-scale efforts of complexes within the nucleus include the work on the initiation complex of the polymerases, the large complexes that transcribe genes. Similarly, the work on the ribosome, the protein and RNA machinery that translates a RNA copy of a gene (messenger RNA) into a protein, is a major effort, one that has been rewarded with a Nobel Prize.
14. I confess to an ulterior motive here. I’m a big fan of the physical genome and believe this is a key part of the future of genomics, of understanding how genomes work.
Other genome- and epigenetics-related articles on Code for life:
A little more challenging: