1 Comment

Aside from the fun of peering at the stuff that codes for the parts that make up us, knowing how a chromosome is arranged inside a cell nucleus might tell us a lot about how our genes work. Understanding the structures of genomes might well be what is needed to make sense of the genetics of complex diseases.

So just what does a chromosome look like?

Are our genomes stuffed into the cell nucleus like loose string jammed into a bag or are they arranged in an organised way? Is the arrangement of genes that are being used in the cell different from those that are not being used? Are genes that are controlled in similar ways near each other? Do different chromosomes interact with eachother?

Mitotic chromosomes (via: The Pavellas Perspective)

Mitotic chromosomes (via: The Pavellas Perspective)

You’ll have seen images of chromosomes looking like two rods pinched together in the middle, like in the electron micrograph to the right, or perhaps as part of a karyotype (karyogram or idiogram)- showing the collection of chromosomes in your cell stained to revealed a banding pattern shown below.


Karyotype of person with Down Syndrome - note the extra chromosome 21 (See Footnote 1 for more).

Karyotype of person with Down Syndrome – note the extra chromosome 21 (See Footnote 1 for more).

Karyotypes, like that of a person with Down syndrome below, are typically used to investigate loss of large parts of chromosomes (deletions), or swops of large portions between chromosomes (rearrangements), that can be associated with different diseases or syndromes. Small changes can happen and affect genetics, too, but are too small to be seen this way.

Chromosomes can also change shape, depending on what genes are being used and what stage in the cell’s lifecycle they are in.

Just like us, our cells have a lifecycle, growing and producing offspring (daughter cells[2]). The chromosomes seen in karyotypes and the classic ‘X’ are from cells that are from cells about to divide into two daughter cells. In this stage of a cell’s lifecycle, chromosomes aren’t doing the work of a growing cell, but are tightly packed and aligned up against each other ready to be pulled apart into the two daughter cells.[3] (Also, these cells have been broken apart so that the chromosomes can float free.)

During the ‘growth’ part of the cell cycle, the cell does the chemistry that type of cell gets up to. If you were to look down a microscope into these cel, you couldn’t see how chromosomes are organised, even if you added chemicals that stain DNA or the packaging proteins (histones) that our DNA is wrapped around. What you’d see is an opaque mess that you couldn’t make much sense of.

Staining for specific features can tell scientists if those features are near the edge (periphery) of the nucleus or the middle, or if the features seem to be clustered together inside the nucleus, but you couldn’t really make sense a whole chromosome this way, it’s just too confusing.

This is a classic problem in science: trying to create a picture of something you can’t see or make sense of directly.

What to do then?

One idea is to measure what points of the genome are close to one-another, then use computers to build models consistent with that collection of points being close together.

Methods to apply this approach to ever-larger parts of genomes have been developed, starting from looking at single genes  to averaged results of whole genomes from large collections of cells. Some scientists (including me) have been a concerned that large collections of cells might have many differently-shaped genomes within them and that trying to find averaged structures within them may not be that meaningful.[4]

Just recently a variant of this approach has been developed to look the genomes of individual cells, one cell at a time.

(Original from Luger lab, sourced from Biomedical Beat.)

A model of an array of nucleosomes. The ‘tails’ that are chemically modified are indicated with dashed lines. (Original from Luger lab; sourced from Biomedical Beat.)

DNA is a very skinny, long, molecule. DNA in cells doesn’t dangle around on it’s own, but is wrapped around histone protein cores packed together to form bobbin-like structures (nucleosomes), like those shown to the right. The ends—‘tails’—of histones are flexible strands stretching out from the bobbin-like core.[5]

Proteins are amino acids linked together in a chain. The order of the amino acids is set by the genetic code in the gene coding for the protein. One amino acid found in proteins is lysine. The fixative formaldehyde reacts particularly with nitrogen atoms, like those found at the end of the lysine amino acids found on the flexible ‘tails’ of the histone proteins, and the start of each histone protein.[6]

Outline of steps in a Hi-C method (original from draft materials for Jacobs, 2013 Exploring the interactions and structural organisation of genomes in Springer Handbook of Bio-/Neuroinformatics)

Outline of steps in a Hi-C method to locate portions of two regions of chromosomes (red, blue) that are close together inside the cell (from draft materials for Jacobs 2013, Exploring the
interactions and structural organisation of genomes in Springer Handbook of

By adding formaldehyde to cells, scientists can chemically bond together portions of your chromosomes that are very close together in space. The DNA is then cut so that each chemically-linked complex is left with short(er) pieces of DNA attached to it.

The cut DNA ends are joined (ligated) together, linking the ends of the DNA in each protein-DNA complex to form a circle of DNA.[7]

The histone proteins are removed from the DNA circles by unlinking the formaldehyde-induced bonds, then the DNA is pulled out of the mixture and sequenced.[8]

Sequencing the joined ends can be read as identifying that ‘this first piece of DNA is close in space to this second DNA sequence inside the cell nucleus’.

The short DNA sequences can be compared to the DNA sequence of the genome to find out what parts of the genome were close together and trapped by the chemical linking formaldehyde formed.

Computational biologists can create 3-D models of the genome consistent with the collection of locations in the genome identified as close together in space so we have a 3-D picture of what a genome might look like in a growing cell.[9]

Earlier attempts used data from many cells at once.

A key problem is that there may be many arrangements, differing in different cells because of the structure of a genome changing over time, varying with different environmental conditions and in different cell types. (For the curious I’ve mentioned a few of the issues in Footnote 9.)

Peter Fraser’s group, who work at the Babraham Institute that sits amongst open fields of a former manor farm a little south of Cambridge (UK), developed a variant of the Hi-C method that works on single cells. allowing them to explore the variation of genome structure in different cells of the same cell type and between cells of a different cell type (or different environmental conditions).

They chose to look at mouse helper T cells, a type of white blood cell that is part of immune systems.[10]

Rather than model the structure of the whole human genome, they have first tried to model the structure of the X chromosome.

Our cells contain two of every non-sex chromosome – the autosomes – you can see these in the karyotypes shown near the beginning of this article.

As for the sex chromosomes, you’ll know that women have two X chromosomes, like in the karyotype shown above, and men one X chromosome along with a male-only Y chromosome. Like us, mice have X and Y sex chromosomes, but have 19 pairs of autosomes rather than the 23 pairs we have.[11]

One advantage of studying the X chromosome is that in males there’s only one, so the data isn’t confused by information from the ‘other’ counterpart of the same chromosome. (You can also compare with what the data is like when you have both, as females do.)

Genes have several states depending on the extent they are being used. They found that parts of the genome with the same gene expression status tended to be found near to one-another.

Chromosomes have territories, regions on the cell nucleus to themselves. Each territory tended to interact with only a few neighbours. This raises a few questions, one being how a chromosome can interact with perhaps a dozen others, if each territory tends to only interact with a few others. One possibility is that the edges of each territory are ‘rough’, allowing parts to interact with different things. Another is that the interacting chromosome surfaces might change in response to a signal, like a hormone.

Media reports have seen focused on that the structure of the chromosome in the working cell is not the classic ‘X’ shape of the dividing cell, but that isn’t surprising or new.


The ‘real’ results of this work are more subtle. That might seem disappointing, but this type of work is exploratory; refinements and improvements will come later.

A key thing they are looking at is variability between different cells. How consistent is the structure of the X chromosome from cell to cell.

Fig 3 of extended data, Nagano et al., 2013.

Fig 3 of extended data, Nagano et al., 2013.

The X chromosome differs in shape from cell to cell, as you can see in the six models shown to the right.

The X chromosome is about 160 million DNA bases (megabases) long. When you compare regions about a megabase or so long, they show similar organisation between cell – the differences in the chromosomes are in flexible arrangements of smaller domains that seem to keep some similarity.

Genes that are actively being were mostly found on the edges of chromosome territories. (It may be that the specific overall shape isn’t so important as moving the active genes to the ‘surface’ of the chromosome.)

It’s been known for a while that active genes are moved to the inside of the nucleus.[12] This work refines this a little, putting active genes on the edges of the chromosome territories and offering an early (tentative) model of what the X chromosome looks like.

Research is usually like this.

The media reports may tout that some ‘discovery’ has been made—often offering something is either wrong or something that was previously known and re-affirmed (and hence not really a discovery)—when in practice the work is extending on previous findings, and often in a tentative way.

This work continues earlier work examining what chromosomes in working cells look like and introduces a new technique demonstrated first on the X chromosome that looks promising. It shows that the overall structure varies from cell to cell. Recent research from others have shown that what genes are used differ from cell to cell, too. These things are likely to be related.

Like others following this field, I’ve been feeling that single-cell methods are needed to examine what genomes look like and I’m looking forward what will come next.


The three-dimensional structures of genomes brings together several of my interests, closing a circle in many ways. I started out in computational biology focusing on protein structure and function in DNA-binding proteins, including molecular simulations. That work is very much about three-dimensional aspects and interactions through space. Since then I’ve worked on genome sequence oriented work. The 3-D structures of genomes neatly pulls together old themes in a new setting. (I also have an interest in a few of the families of proteins involved in organising the spatial arrangement of genomes in the nucleus and computer algorithms in these areas.)

Looking at a whole genome structure draws in all the things that genomes interact with within the nucleus. That, in turn, is part of the motivation of the Not Just DNA series — looking to the wider scene of how genes work within the cell nucleus.

As part of their press release information, the Babraham institute put out a short video, What a chromosome really looks like:

YouTube Preview Image

(For the record, any similarity with my article is a coincidence: I wrote my article before viewing this!)

More details on the methods used to investigate the three-dimensional arrangements of genomes and some of issues can be found in my review, Exploring the interactions and structural organisation of genomes, chapter 8, Springer Handbook of Bio-/Neuroinformatics (Ed. Nikola Kasabov) 2013. (Springer-Verlag, 01/2013; ISBN: 978-3-642-30573-3.)

1. The Down Syndrome karyotype is taken from the Palomar website. They also offer a short summary of Down Syndrome,

The most well known and most common autosomal abnormality is Down syndrome. This is a mild to severe form of mental retardation accompanied by distinctive physical traits. People with Down syndrome have an irregularity with autosome pair 21. In most cases, there is an extra chromosome (i.e., trisomy 21). More rarely (3-5%), there is a structural modification in this chromosome. Specifically, there is a translocation of all or part of chromosome 21 to chromosome 14 or 15. The actual genes on chromosome 21 that are responsible for Down syndrome are now being identified. It is thought that there are at least 350 genes involved. About 2-4% of the people with Down syndrome are genetically mosaic. That is, some of their cells have chromosome 21 trisomy while others do not, resulting in generally milder symptoms. The translocational type of Down syndrome also usually has less severe symptoms.

One case to do karyotyping is prenatal testing, using DNA from the amniotic fluid. A recent very interesting alternative is a Down Syndrome test that draws on the unborn child’s DNA in the mother’s blood. A patent for this has been turned down by the US patent office, following a similar ruling on Myriad’s patents for testing breast cancer. (If I can find time, I’d like to explore this story as it’s one I have followed since the first attempts, in part as it requires different bioinformatics than other genome sequencing efforts. For the impatient, Science-Based Medicine have covered this and a review article (PDF file) on the approach is available.)

2. To biologists, the offspring of cell division are always daughters. Sorry guys.

3. If you want to learn more about the cell cycle and how our chromosomes are replicated there are plenty of websites explaining this. As just a couple of examples, the University of Arizona has an older tutorial page with a simple account with nice illustrations; for a slightly more complex explanation including some of how scientists worked this out, this Nature Education’s Scitable page looks at the five stages of mitosis and cell division (click on the figures to see larger versions of them).

A paper has just come out, yesterday, on the structure of the mitotic chromosome using the same type of techniques I describe here (pay-walled, but you can read the abstract). This and perhaps a dozen papers of closely related studies would be interesting to cover, as well yet another paper just out on bacterial chromosome structure (also pay-walled) – if I ever found that sort of time.

4. You can try learn some general architectural features within a genome or chromosome, it’s just that taking a big mixed collection and trying to hold up a single model risks producing something that is a bit meaningless, especially given that these models aren’t really built from that many interactions given the size and flexibility of the thing (DNA in chromosomes) being modelled. (See also Footnote 9.)

5. These tails that are chemically modified reflecting the activity state of the gene. Epigenetics features chemical modifications of the DNA and also of the proteins that DNA is wrapped around (as well as chemical modifications of other proteins that interact with DNA). There are a large number of chemical modifications of the histone tails, informally known as ‘the histone code’ in the hope that this might represent a ‘code’ defining the state of the neighbouring gene in a similar way that the genetic code defines each amino acid of a protein that a gene coding for a protein does. This is a very intricate story; for those that want to explore, try searching for ‘histone code’.

6. Ligation / cross-linking.

Proteins always start with a nitrogen group at the very beginning.

Histone tails have several lysine amino acids. Lysine amino acids have a long tail ending in a nitrogen.

Formaldehyde (HCHO) can cross-link protein-protein and protein-DNA interactions within ~3Å. ***Rate of cross-link formation, does it favour the periphery of the nucleus, etc.

Form methylene linkages between lysines and also (less stable) linkages to DNA, in particular to DNA guanine bases.

e.g. (Representative open-access references only; technical readers are welcome to suggest better sources! Study of the chemistry of formaldehyde goes back a good number of years, I’ve chosen two more recent examples.)

Molecular binding of formaldehyde to DNA and proteins

ProQuest Dissertations and Theses, 2009 Dissertation Author: Kun Lu


J Am Chem Soc. 2010 March 17; 132(10): 3388–3399. doi: 10.1021/ja908282f PMCID: PMC2866014 NIHMSID: NIHMS181814 Structural Characterization of Formaldehyde-induced Cross-links Between Amino Acids and Deoxynucleosides and Their Oligomers Lu et al.


7. This is an important ‘trick’ that’s central to all of this.

The chemical linking (ligation) of the DNA ends must favour DNA from the same histone-DNA complex, and not join together different histone-DNA complexes. This is done by using a weak dilution of the enzymes that join the DNA ends (DNA ligation enzymes). Low concentrations should favour joining DNA ends that are close together in space more often than DNA ends that are further apart and hence more likely to be part of different histone-DNA complexes.

8. The thing that lets the Hi-C method look at all the places a genome comes close together is the ‘trick’ of adding a chemical, biotin, to the ends of the cuts. Biotin can be used to filter out molecules by washing mixture through a filter than traps biotin-containing molecules, letting the rest pass through.

9. If readers are interested (or perhaps even if they’ve not!), I may write about this in more detail in other post as I’m interested in the methods used to construct these models. (It’s also why I’ve written a review paper on this topic.)

A couple of quick points worth mentioning here is that the experimental data only records a small fraction of the total number of spatially-close regions of the genome and that the interaction data for these models are very sparse, unlike data used to build models of proteins in a similar way from nuclear magnetic resonance (NMR) data. While there are parallels with work on building models of proteins, there are major differences too. It’s also worth remembering that the data is still a collection of different conformations, despite efforts to restrict the data to cells from similar genomes, so that the data really needs to be treated as an ensemble of models, rather than a single model.

A related issue is that, looped-out portions of the model won’t have much information about how they relate to the rest of the genome, but must occupy some space. Because they have no modelling constraints, they are free to vary or move, but they still have to occupy space in the model.

10. I need a cell biologist to help me out here, but I strongly suspect a reason for choosing helper T cells was to get cells they could easily isolate and collect one particular kind of cell. Blood samples are straight-forward and the cells in blood can be ‘sorted’ and particular cell types isolated (e.g. using antibodies to CD4+, which is on the surface of T-helper cells). It’s important for this work that the cells be really of a particular type, not a mixture of cells of different types.

11. The mouse X chromosome has some regions flipped around (inverted) compared to the human X chromosome, as you can see in this dot plot. While most genes are conserved between man and mouse X chromosomes consistent with the genes of the X chromosome being the same in mammals, as predicted by Susumu Ohno in 1967, there are regions with genes with multiple copies – some of these look to be specialised to each species as examined in the full paper (pay-walled, but there appears to be a copy on-line).

12.  Where are gene is in the cell nucleus when genes are expressed or not has been explored for quite a while, for example in a research paper titled Relation of chromosome structure and gene expression by Mirkovitch, Gasser and Laemmli published in 1987 (Philos Trans R Soc Lond B Biol Sci. 1987 Dec 15;317(1187):563-74) They conclude that there may be “a structure-function relation between chromosome organization and gene expression.”


ResearchBlogging.orgNagano T, Lubling Y, Stevens TJ, Schoenfelder S, Yaffe E, Dean W, Laue ED, Tanay A, & Fraser P (2013). Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature, 502 (7469), 59-64 PMID: 24067610

Other articles on Code for life:

Is a genome enough? (Not Just DNA #1)

Animating our DNA

Sea stars and mosaics

Epigenetics and 3-D gene structure

What is a gene?