Scientists are surveying the genomes found in environments, a modern way to find what micro-organisms live there.

There has been an explosions of “-omics” over the last decade. Metagenomics is the moniker given to genomic, or DNA sequence, surveys of environmental samples.

Basically metagenomics involves researchers collecting DNA samples from, say, the ocean, beach sand, bioreactor sludge (ugh!), a lake, someone’s mouth, gut or lungs. The list of metagenomes at NCBI  includes some unusual, and rather clever, examples. There’s even a fossil metagenome (a topic for another article, hopefully).

A related term is the microbiome, the microbes, genetic elements (genomes), and interactions in a particular environment. The human microbiome project aims to find out about the microbes that live on and in us.

Some places microbes live in and on us

Some places microbes live in and on us

The organisms collected are all the little things that you usually need a microscope to see, microbes like bacteria, fungi, archaea, protists; microscopic algae; tiny animals such as plankton and so on. (I imagine one source of contamination will be the seeds or eggs of larger organisms.)

Having taken their environmental sample, the researchers then extract the DNA from the samples. They don’t try isolate each microbe and get the DNA of each species separately, they simply extract all the DNA for all of the organisms in the sample all at once.

By doing this, there is no need to culture the microbes, that is, grow the microbes in a laboratory. Culturing microbes can be difficult. Metagenomics provides another way to find new species of organisms, one that doesn’t involve painstakingly culturing micro-organisms, trying to find what conditions (think: temperature, food, etc.) the organisms need to grow. It has been estimated that fewer than 1% of microbes can currently be cultured; by avoiding culturing microbes the researchers are able to find many microbes that they haven’t been able to study before.

The DNA is sequenced to give a survey of parts of the genomes of the microbial community sampled. Metagenomics exploits the power of high-throughput, where large amounts of DNA can be sequenced very quickly. If you have time, this animation explains DNA and DNA sequencing. There are many other animations explaining DNA sequencing, such as those listed in reference 1.

We have several centres providing fast DNA sequencing in New Zealand such as the one at Otago University and the Allan Wilson Centre.

Bioinformatics scientists like myself (PDF link) compare these DNA sequences to DNA sequences where it is known what species the DNA is from. Most of the organisms sampled are new, that the DNA sequences obtained have no significant similarity to DNA sequences from known species. We really only know about a fraction of the microbes that exist.

We can apply this approach to all sorts of questions. What microbes are typically found in good soils compared to poor soils? Do obese people typically have different gut microbes? (e.g. this article in Nature.) Can we identify difficult-to-culture viruses associated with particular diseases?

There are some excellent explanations of metagenomics on the WWW. An outstanding example of science communication is the overview of metagenomics (with great graphics) and a free PDF booklet Understanding our microbial planet (4.3 Mb PDF link) provided by the National Academies of Science (USA). There are also some free posters available for download.

A recent blog article on metagenomics can be found at one of the blogs on my blogroll Mystery Rays from Outer Space.  Despite the title, this is a “straight science” blog and a good one at that. In a recent article Ian York looks at viruses found in lakes and sea turtles by metagenomic studies.

I have to a little poke one remark he makes, however:

In the next few years, there’s going to be yet another data explosion, as metagenomics turns up new things in astronomical numbers.

While he’s right that this will grow, it has already well and truly started. For many years I worked on the Transterm database. (See also reference 2.) A large part of the work creating the data involves processing the DNA (or RNA) sequence data for all the species stored in Genbank,  ”the” international collection of DNA sequences from all species. This involves processing a large taxonomy of species. In this taxonomy there is already a very large collection of “species” representing the DNA of these new unidentified species from environmental samples, with “species names” such as “unidentified soil organism”, “uncultured marine (Click on “Unclassified” in the first list in the NCBI Taxonomy Browser web page for example. The list is considerably smaller that it used to be, as they are now treating entire metagenomes as a “species”, nonetheless it will give you some feel for how much data is being gathered.)


1. Several other examples of animations explaining DNA sequencing can be found using google, for example:



You’ll have to work out which is best for you; I don’t have time to survey them all!

2. Jacobs GH, et alTransterm: a database to aid the analysis of regulatory sequences in mRNAs. Nucleic Acids Res. 2009 Jan;37(Database issue):D72-6. Open access; you can download the paper for free.