By Genomics Aotearoa 03/09/2021

By Michael Hoggard and Carmen Astudillo Garcia, University of Auckland

The rapid development and extraordinary cost reductions of next-generation sequencing technologies has greatly increased data on the functioning of complex microbial communities.

Although relatively new, metagenomics has already dramatically influenced our understanding of the microbial tree of life and the known virosphere via the genomic traces of organisms and viruses that have otherwise not been previously observable.

What is metagenomics?

Techniques that explore microorganisms have evolved in the past decades, from microbial culture techniques to single gene amplification and sequencing. And now whole-genome analysis (genomics) provides a more comprehensive, integrated approach to understanding single microbial function and physiology.

But more recently, a metagenomic approach can directly analyse all genomes contained in an environmental sample as a way to understand the functional gene composition of entire microbial communities.  This gives a much broader description than single gene/genome-based studies.

The process of metagenomics involves the extraction, preparation, and sequencing of DNA from a sample (for example, a water sample, or a human faecal sample), and the bioinformatic processing and analysis of generated metagenomic sequencing reads.

During this process, quality-filtered reads are often assembled into longer contigs (a set of overlapping DNA segments that together represent a consensus region of DNA) that may span one or several genes. At this level, metagenomics can provide invaluable insight into the genetic and metabolic potential of entire complex communities of organisms at the same time.

Metagenomic approaches using next-generation sequencing technologies such as the popular Illumina platform or Oxford Nanopore Technologies allow the researcher to not only choose from a wide range of taxonomic marker genes to improve classification (taxonomic annotation), but also to use a combination of markers to increase resolution when reconstructing phylogenies (the evolutionary history of a group of organisms).

Taken at this scale, we might think of the whole community, such as a soil microbial community, as a single functional unit. We can then examine the genetic potential contained within the community as a whole, including the diversity of organisms, metabolic potential, and relationships with broader ecosystem processes.

The most recent developments
Bioinformatics tools have been developed to partition these pools of mixed-organism assembled contigs into ‘bins’ representing the individual organism (or group of very closely related organisms broadly equivalent to a species) that the genetic material came from.

This has allowed for the large-scale construction of individual high-quality metagenome-assembled genomes, providing genomic insights that are comparable to traditional genomics of individual organisms, but over numerous (potentially hundreds or thousands) of individual genomes from the same sample.

Why are these techniques important?
Using metagenomics, genomes have been constructed of microbial organisms that have not yet been grown in isolation, and who we know exist by their genetic signature alone.

Metagenomics has been successfully applied to the study of complex microbial communities associated with a diverse array of environments, including soil, permafrost, the open ocean, deep sea hydrothermal vents, rivers and lakes, plants, and host-associated microbiomes – including that of the human body.

At the same time, metagenomic data have progressed our knowledge of microbial roles in key biological and environmental processes such as in animal (including human) and plant health and disease, environmental nutrient cycling pathways and ecosystem function, responses to pollution, environmental disasters, and land use change and/or degradation, and in potential biotechnology and bioremediation applications.

It is also advancing our understanding of species and strain-level diversity of microbial communities, the environmental prevalence and spread of antibiotic resistance, and is increasingly of interest as a tool to guide individualised patient treatment in the clinical setting.

Notably, metatranscriptomics (a comparable method for whole RNA sequencing to investigate active gene transcription and RNA viruses) was also at the heart of the first characterization and publication of the SARS-CoV-2 virus genome in early 2020. These data were essential for the subsequent development of testing methods for the emerging COVID-19 pandemic, as well as underpinning the strain-level genomic monitoring program that has been invaluable in New Zealand’s response to the pandemic.

Here in Aotearoa New Zealand, Dr Kim Handley (The University of Auckland / Genomics Aotearoa) and her group have been applying metagenomics and metatranscriptomics to better understand the diversity, ecology, evolution, and ecosystem functioning (such as environmental nutrient cycling) of microbial communities and viruses in aquatic environments, including groundwater, hot springs, rivers and estuary systems spanning the gradient from freshwater to marine.

Upskilling in metagenomics
Dr Handley’s group (including the Genomics Aotearoa post-doctoral research fellows Carmen Astudillo-Garcia and Michael Hoggard) hosted a specialised four-day metagenomics summer school at the University of Auckland – a joint venture between the University, Genomics Aotearoa, and the New Zealand eScience Infrastructure (NeSI).

The workshop aimed to provide a wide array of bioinformatics experience and academic levels from graduate students right up to principal investigators with background knowledge and the practical bioinformatics skillsets required for a complete workflow to process metagenomics data.

This included quality processing and assembly of raw sequence reads, partitioning assembled contigs into individual metagenome-assembled genomes, generating and investigating gene annotations of the acquired genomes, and example exercises to investigate, analyse, and present biological insights from these data.

The tools, steps and code needed to conduct all the metagenomics analyses covered is now available in a Git Hub repository  (

Handley’s group has its own repository covering the environmental metagenomics analysis pipeline used in the lab. This repository can be found also as part of Genomics Aotearoa GitHub repositories (

Genomics Aotearoa is providing regular training workshops in different aspects of genomic analysis. For information on upcoming training, see or subscribe here