1 Comment

Two-time Nobel Prize winner and pioneer of techniques that brought on modern biology, Frederick Sanger, died yesterday, aged 95.

His DNA and protein sequencing techniques are early steps to the world of high-speed DNA sequencing we have today, with whole genomes reported frequently and diagnosis of genetic components of disease, to mention just two applications. Knowing the sequences of molecules lets scientists learn what changes are associated with disease and to build evolutionary trees showing the ancestry of present-day species. Molecular sequences are central a part of modern biology.[1]

Fred Sanger is one of the few people to hold two science Nobel Prizes and one of the ‘greats’ of the MRC Laboratory of Molecular Biology.[2] (The others to have held two science Nobel Prizes are Marie Curie in physics, for work on radiation,[3] and chemistry, for her discovery of radium and polonium, and John Bardeen in physics for the invention of the transistor and for superconductivity theory.)

His first Nobel Prize was awarded in 1958 “for his work on the structure of proteins, especially that of insulin”, involving a method to sequence proteins; his second Nobel Prize was awarded in 1980 for the DNA sequencing method named after him, ‘Sanger sequencing’. This is sequencing method used in automated form to sequence the human genome.[4]

It might surprise non-biologist readers, but molecular biology didn’t start with DNA. A lot of the focus of early work was on proteins. This work on proteins is a key to the start of my field, bioinformatics or computational biology.[5]

A common theme between the two sequencing methods he developed is reading the order of the units in molecules made of chains of repeated units.

Three of the key molecules of life—DNA, RNA and proteins—are polymers, chains of repeated units. Each repeated unit is one of small collection of possible units of that type found in living organisms.

You’ll know best DNA and it’s repeated units, the DNA bases A, C, G and T (adenine, cytosine, guanine and thymidine).

CGCATTCCGTTTCGCGAAGATAGCGCGAACGGCGAACGC

A DNA sequence is the string of bases that make up a portion of DNA, like the one above. (Those familiar with the genetic code might like to apply it to this sequence. For the ‘solution’ see Footnote 6.)

Fred Sanger’s methods involve working out what was at the end of the chain combined with a way of making pieces of different sizes. The relative size of fragments can be found by moving molecules by placing them in an electric field. The mix of repeatedly cutting at the end, sorting the molecules by size and working out what unit was at the end can be used to work out the sequence of the units in the molecule.

It sounds very simple put like that, but it involves some cleverness to work out a possible method to read chemical chains like that and patience testing possible methods.

J Schmidt, source Wikipedia.

J Schmidt, source Wikipedia.

For DNA, two main ‘tricks’ were used: chain termination and electrophoresis.

DNA usually has two strands, the famous double helix. Each DNA base in the two strands is a complementary match to the one opposite in the other strand. Base ‘A’ always complements base ‘T’; base ‘G’ complements base ‘C’.

If you split the two strands of DNA apart to form a single strand, then add to the mixture a small piece of DNA that binds to a part of the long single strand this can be used to start the enzyme DNA polymerase filling in the missing second strand by drawing each next matching base from the mixture and adding it to the matching strand it is building. If you include in the mixture of DNA bases the polymerase is adding to make the matching strand a small amount of DNA bases that the polymerase cannot add more bases to, the growing matching strand will be stopped at random points. Across the mixture of DNA in the sample, it will be stopped at each base in turn, leaving DNA fragments of different sizes.

The DNA sequence is ‘read’ by separating the ‘stopped’ fragments by size using an electric current that draws the molecules through a gel at different speeds depending on the their size, like the one shown on the right. By adding each of the ‘stopping’ bases separately (the four columns in the picture to the right) and separating the stopped matching strands by size (the vertical rise in the picture to the right), the DNA sequence can be read.

Fred Sanger’s earlier sequencing method was for proteins, like insulin – the small protein hormone important to controlling the absorption of glucose from our blood.

The molecules that carry many of the chemical reactions in our bodies, act as hormones and help structure parts of our cells are proteins, chains of amino acids joined together.

Proteins start with an amide group. The Sanger method adds a small chemical, DNFB (or other similar compounds), to amide groups in the protein, then the protein chain is broken up, leaving the end-most amino acid attached to a DNFB compound. In this way the end-most amino acid is ‘tagged’ so that it can be identified separately from other the amino acids in the protein so that, in turn, the amino acid is attached to it can be identified. By repeating this, matching it against the size of the protein fragment, you can eventually work out what the order of amino acids that make up the protein are.

Footnotes

While I writing this, a formal acknowledge of his passing was published on the MRC LMB website. Their article closes with a nod to modesty, “He declined a knighthood as he preferred not to be called ‘Sir’.”

Other reports can be found in the Cambridge News and the Nature News blog.

I should emphasise that there were several methods as Sanger and others (e.g. Maxim and Gilbert) explored different approaches to sequencing DNA. The one I’ve briefly described is formally called the dideoxy chain termination method. Walter Gilbert shared the 1980 Nobel Prize in Chemistry with Fred Sanger and Paul Berg.

1. Three-dimensional structures of molecules matter too — these help understand how a molecule works. It’s also useful to know what molecules interact, when and do what. There’s more to life than just DNA sequences. (Or protein sequences. Hence also the title of my series, Not Just DNA.)

2. Sanger retired in 1983.

3. It’s interesting to read in the context of today’s science that the Nobel Prize enabled the Curies to hire their first laboratory assistant. Today it’s rare that PIs (at universities, at least) do hands-on work, something that the nature of grant application criteria re-enforce and I dislike.

4. The sheer success of DNA sequencing has given some people the impression that bioinformatics (computational biology) started with the genome projects, but practice it has much older roots as I examined in an early article, The mythology of bioinformatics.

5. Large-scale DNA sequencing today uses a variety of ‘NextGen’ (next generation) approaches.

6. Ed Yong offered this as a fine tribute: “CGCATTCCGTTTCGCGAAGATAGCGCGAACGGCGAACGC :-(” Breaking this in to triplets you get: CGC ATT CCG TTT CGC GAA GAT AGC GCG AAC GGC GAA CGC. These triplets should translate to amino acids, in order: Arg Ile Pro Phe Arg Glu Asp Ser Ala Asn Gly Glu Arg. In single-letter code, we have: RIP FRED SANGER (H/T to Ed Yong for this one; I’ve done similar in the past in a Christmas card, in turn inspired by a card I received as a graduate student at the MRC LMB.)


Other articles on Code for life:

Loops to tie a knot in proteins?

What does a chromosome look like? (Not Just DNA #2)

Coiling bacterial DNA

Is a genome enough? (Not Just DNA #1)

How to spot a badly-drawn DNA helix