The cellular phone book

By Vic Arcus 30/09/2009


The phone book for a city is a dull read. The surnames, initials and addresses for everyone who lives in a city. For the truly bored, you might find some prurient entertainment in some of the more descriptive surnames. It’s not even complete, given that a significant number of the inhabitants choose not to have their names published. However, with a bit of analysis, a phone book turns out to be quite interesting. If you were a demographer or a town planner, you could analyse the data in a phone book and combine it with a map and make some inferences about population density. You might even plot the growth of various suburbs by tracking this data over time – compare 1990’s phone book with that of today. If you collate first names with time and watch them change, it is even possible to write a chapter in a freakishly successful book! So there are hidden gems in a collection of names for those who choose to spend their time mining such information.

So it is with the genome of an organism which is even more boring than a phone book at first glance. Here is a section of a genome from a small bacterium which lives in hot salty pools on Italian beaches – I know, its a difficult life being Pyrobaculum aerophilum which is the name of this organism:

ATGCCCGTTGAGTACCTAGTGGACGCCTCCGCGCTATACGCCCTCGCGGCCCATTACGAC
AAGTGGATCAAACATAGGGAGAAACTGGCCATTCTGCACTTGACCATATACGAGGCAGGC
AACGCGTTGTGGAAAGAGGCGAGGCTCGGGAGAGTGGACTGGGCCGCCGCGTCTCGGCAT
TTGAAAAAGGTGTTGTCCAGCTTCAAGGTGTTGGAGGACCCGCCCCTAGACGAGGTCTTG
AGGGTGGCCGTGGAGCGGGGCTTGACCTTCTACGACGCCAGCTACGCCTACGTGGCGGAG
TCCTCCGGACTAGTCTTGGTGACGCAAGACCGCGAGCTACTGGCCAAGACGAAAGGCGCT
ATAGACGTCGAAACTTTACTGGTAAGGCTGGCGGCACAATAA

These are 402 letters of the bacterial genome which has a total 2,222,430 letters carrying all of the genetic information for Pyrobaculum. Adding to the boredom and drudgery is the fact that the biological alphabet has just four letters – A, T, G and C which denote the four “nucleotides” or molecules which are linked together head-to-tail to make up the genome. Thus, the genome is just one long string of letters, in the case of Pyrobaculum, 2,222,430 letters to be precise. But just as the phone book for a city has hidden treasures for genealogists and demographers, so too the genome gives up its secrets for those inclined to spend their time analysing this code. If we split the sequence above into groups of three, the first line reads like this:

ATG CCC GTT GAG TAC CTA GTG GAC GCC TCC GCG CTA TAC GCC CTC GCG GCC CAT TAC GAC

And then we can use a table to translate the code into amino acids (each group of three nucleotides codes for one amino acid):

M P V E Y L V D A S A L Y A L A A H Y D

These amino acids are then linked together to form a protein (402 nucleotides translates into 134 amino acids)  which has a beautiful structure…

a protein structure

Thus, from the ostensibly boring collection of 402 letters (above) and given some anlaysis (genetics plus biochemistry) we have arrived at the structure of a protein which is encoded by these letters. This structure tells us about the function which this protein performs in the cell and the function of a protein is the work it does to keep the cell alive (in the hot salty waters at the beach).  A bit like the yellow pages of the city phone book which gives the work addresses for people in the city.

Via this decoding process we can take the Pyrobaculum aerophilum genome of 2,222,430 letters and divide this into 2,706 genes which encode 2,605 proteins and viola! We have the Pyrobaculum phone book. I guess that we should say that Pyrobaculum is really a village in this context as it only has 2,605 inhabitants.

In the last post, I talked about the phenomenal complexity of the cell, but this all seems quite straightforward – take the genome, split it up into its various genes and decode the genes into proteins. Each protein has a specific function and collectively, the 2,605 proteins work together to keep the cell alive. Just as the inhabitants of a village work on specific tasks and the sum of this work makes the village thrive. However, if your aim were to describe the city or village, its rich history, its buildings, its evolution, it people and the principles by which it thrives (see the last post), could you achieve this armed with just a phone book? Obviously not. The phone book gives up some secrets but is just a very small component of the picture which constitutes a complete description of the village or city.

Modern genomic technology has been extraordinarily successful and to date, we have the complete genomes (phone books) for more than 1000 organisms including bacteria, fungi, plants and mammals. But the phone book is just the beginning, and to describe a cell, we need much more than just a list of its protein inhabitants.  Ekistics is the science of human settlement which spans the beautiful and complex history, evolution, physical description, social interaction and heirarchical organisation of villages and cities. Molecular biology is the ekistics of the cell. By all means start with the phone book and see what you can glean from it, but then take up history, cartography, evolution and economics and see where that leads you.