Boney lumps, linkage analysis and whole genome sequencing

By Grant Jacobs 06/07/2010

We all have our lumps, the quirky features we develop with time.

Some of these are bone spurs, extra growths of bone.

These can be caused from damage to joints, like the lumpy joints seen in elderly people with arthritis. Bone spurs from differing causes can develop in many parts of the body, spine, toes, heel and hands.

Most bone spurs are associated with damage and old age, but some have genetic origins.

Figure 1A from Sobreira et al. (see References)
Figure 1A from Sobreira et al. (see Reference)

Metachondromatosis is a rare disorder that affects bone growth, where benign bone tumours produce lumps, mostly on the hands and feet.*

These lumps develop in children, with some of them reducing or resolving over time, others persisting.

Nara Sobreira and her colleagues set out to find genes that might cause this disease using a new approach that exploits sequencing of the whole genome of one patient.

Genetic changes that cause a disease can be as small a changing a single base in the roughly three billion bases in our DNA.

We have many, many differences that make us unique.

The art of locating the cause of a genetic disease is to determine which of those many changes from a lot of DNA is the one that has a role in causing the disease.

It’s a big job.

Locating genes that cause a disease is usually a long slow process** and only a small number of genetic diseases have been resolved.

Gregor Mendel (Source: Wikimedia Commons.)
Gregor Mendel, 1822-1884 (Source: Wikimedia Commons.)

Diseases that are inherited in the classical genetic fashion are called Mendelian disorders, after the monk Gregor Mendel famous for recording the inheritance of peas. As the authors (Sobreira and her colleagues) note the genetic causes of around 2,400 Mendelian disorders have been identified, slightly more than one-tenth of the genes we have (you might expect most genes to have variants associated with a disease). Thousands of genetic disorders have yet to be resolved.

This is held back by how hard it is to locate the genetic cause of a disease.

Traditionally it involved locating many people from many large families with the disease.

Family pedigrees are then drawn up, carefully noting the disease status of each person. (These pedigrees are a type of genogram using a standard notion for sexes, twins, affected individuals, carriers, and so on.)

Figure 2B, et al (see Reference)
Figure 2B, Sobreira et al (see Reference)

DNA is collected from the families and stored.

This DNA is then screened using genetic markers to locate regions that might be associated with the disease.

Throughout the genome are DNA sequences that are unique to one part of the genome. They aren’t found anywhere else.

They can be used as markers to track the inheritance of the DNA around them through the pedigree (family tree) compared to the presence of the disease.

Linkage analysis – looking for what regions of chromosomes are ‘linked’ with a disease – calculates the statistical likelihood of a region being associated with the disease given the kind of genetic inheritance involved and a modern understanding of genetics.

The markers are quite far apart, so this only tells you that a quite big region of a chromosome might contain the small genetic change that causes the disease.

Figure 1B, et al (see Reference)
Figure 1B, Sobreira et al (see Reference)

Because of this traditional linkage analysis often involves a second round, looking closer at promising regions using a series of markers that are increasingly close together until a region small enough to be DNA sequenced is found. By sequencing the DNA, the exact changes can be seen. Was is the loss of a few bases, changing a single ‘A’ to a ‘G’, or some other change?

Today much larger pieces of DNA can be sequenced easily. In fact, routine sequencing of the whole genomes of people isn’t far away.

Scientists are anticipating this, and new techniques for locating the cause of genetic diseases exploiting high-speed DNA sequencing are changing the name of the game.

One limitation with the traditional approach is that you need large families or a lot of families to reveal the regions in the chromosomes that might be associated with the disease.

Sobreira and her colleagues complemented a cut-down version of the traditional process with sequencing the whole genome of a proband, a person with the disease.

Next they compared the proband’s genome to the human genomes that have already been sequenced, which don’t have the disease, and noted what differences were found.

Figure 1C, Sobreira et al (see Reference)
Figure 1C, Sobreira et al (see Reference)

This gives a very large number of differences, so these differences were narrowed, by estimating what changes are likely to make a functional difference.****

Small families on their own are not enough to detect what regions are statistically associated with the disease, but by combining this with the whole genome sequence, the differences associated with the disease could be identified.

The weaker linkage statistics were enough to focus attention on particular genetic changes seen in the whole genome sequence that were checked by sequencing that region in all members of the family.

They did this twice, on two different families, finding two different genetic changes affecting the same gene, PTPN11 (protein tyrosine phosphatase SHP-2). Research can now focus on this gene, the protein it makes and the molecules the work with this protein.

The approach they developed may help find the cause of other genetic diseases.

New bioinformatics tools might support this general type of approach, such as better methods to work out if a genetic change is likely to affect the function of a neighbouring gene. Databases of function of neighbouring genes can suggest if the function of a neighbouring gene is something that might be associated with the disease. For example, you might expect a gene associated with metachondromatosis to be active in bone tissue.

I am left pondering if this type of approach might find use in locating the different sub-types of a diseases that have many different genetic causes such as autism, by looking for causes family by family.


* Those on the hands and feet are osteochondromas, outgrowths of cartilage and bone. Those that occur on where long bones develop into knobs at their ends (the metaphysis, where new bone growth takes place) and at the outer edges of the pelvis bones (the iliac crest) are enchondromas, originating from cartilage.

** I should know; I was involved in a linkage analysis project looking for regions of chromosomes associated with vesicouretic reflux, where squeezing the bladder (e.g. when urinating) can cause urine to be squeezed back into the kidneys, damaging the kidneys.

*** Recent approaches use the exome, all the protein-coding portion of the genome. These authors sequenced all the genome. Many genetic alterations that cause disease result from changes in regions around genes that control the use of the genes (gene expression), rather than the protein-coding portions of the genes.

**** The details of how they did this are not yet published but no doubt they will involve the frequency of the changes (SNPs) in the normal population and if the change affects a gene directly.

On chromosome pairs, alleles and segregation

You will have learnt in school that we get half of our genes from our mother and the other half from our father. (I’m ignoring what happens in sex chromosomes for simplicity.)

Recombination (Source: Wikimedia Commons.)
Recombination (Source: Wikimedia Commons.)

We also have two copies of each chromosome, we are diploid with 22 pairs of non-sex chromosomes (autosomes).

Each chromosome has a matching set of genes. Variants of genes are called alleles. Each chromosome will usually have the same genes, but they may have different variants of the gene. One allele is on one chromosome, it’s counterpart on the matching chromosome of the pair.

Occasionally portions of matching pairs of chromosomes swop, DNA on one is swapped for DNA on the other.

This swaps the alleles on this portion of the chromosome: the DNA from Dad gets swopped with that from Mum. The DNA of the two chromosomes were recombined.

If were now to go along that portion of that chromosome, we’d see a new order of alleles. Those that remained side-by-side co-segregated.

Linkage analysis looks for regions of co-segregation that are associated with the presence of the disease.


Sobreira NL, Cirulli ET, Avramopoulos D, Wohler E, Oswald GL, Stevens EL, Ge D, Shianna KV, Smith JP, Maia JM, Gumbs CE, Pevsner J, Thomas G, Valle D, Hoover-Fong JE, & Goldstein DB (2010). Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS genetics, 6 (6) PMID: 20577567

Other articles in Code for life:

Blogimmuniqué: who are you?

Describe your fantasy institute

Deleting a gene can turn an ovary into a testis in adult mammals

Genetic tests and personalised medicine

Monkey business, or is my uncle also my Dad?

0 Responses to “Boney lumps, linkage analysis and whole genome sequencing”