David Winter

Geek of many colours, kiwi postdoc-ing in the US. Doesn't quite understand how anyone can do science without wanting to tell the world about it. David is on Twitter @TheAtavism

All the media! - The Atavism

Sep 25, 2012

Oh, hi there. Yeah, it's been a while h'uh? Just been crazy busy lately you know - one thing after another with manuscripts and datasets to analyse, then I got a whole bunch of lab reports to mark. We should totally like, get back to writing/reading ab...

Measuring population differentiation in R - The Atavism

Aug 09, 2012


This is a little bit different than most posts here. I have a paper out today in Molecular Ecology Resources:  "mmod: an R library for the calculation of population differentiation statistics" (doi: 10.1111/j.1755-0998.2012.03174.x). Looking around the web, there aren't many simple expositions of just what a "differentiation statistic" might be, and why the "modern measures of differentiation" my little R package can calculate might improve on the more traditional ones. So,  I thought I'd have a go here. 

Biologists often want to be able to measure the degree to which a population is divided into smaller sub-populations. This can be an important thing to quantify, because sub-populations within highly structured populations are, to some extent, genetically distinct from other sub-populations and therefore have their own evolutionary histories (and perhaps futures).

To illustrate this point I've run some simulations. Imagine if we had 5 subpopulations, each with a thousand individuals. In each population we will follow the fate of a locus with two alleles, R and r that have no effect on survival or reproduction and start with frequencies 0.8 and 0.2 respectively (these numbers motivated by this post). In the absence of gene flow between these populations (Panel 1) the frequency of the r allele bounces around due to genetetic drift (evolutionary change, after all, is inevitable). Crucially though, changes in one population can't effect other populations so we end up with substantial among-population differences in allele frequency. In the next two panels, in each generation a proportion of each population's individuals (0.001 and 0.01 respectively) are drawn from the other populations in the simulation. Now that the populations are sharing genes the lines that represent their allele frequencies pull together  (that is, the among-population variation is reduced). 


One way to quantify the among-population variation displayed in these simulations is to look at the number of heterozygotes you expect to observe across the entire population. The final values for P(r) in the first simulation were {0.33, 0.47. 0.88. 0.10. 0.33} with a mean frequency of 0.42 (so the frequency of the R allele would be 0.58). Knowing our Hardy Weinberg, if we had one big population with two alleles, one being at a frequency of 0.42 we'd expect to get 2pq = 2 * 0.42 * 0.58 = 0.40 heterozygotes. We can call that number Hfor expected total heterozygosity. But thats not what we'd actually see in this case. The sub-populations that make up this larger population have their own allele frequencies, when we calculate the expected proportion of heterozygotes for each of these populations by themselves we end up with {0.44, 0.49, 0.21, 0.18, 0.44} for a within-population expected heterozygosity (HS) of 0.35*. This lack of heterozygotes within sub-populations compared with the total population expectation will always arise when genetic drift makes sub-populations distinct from each other.  Masatoshi Nei  used this pattern to propose a statistic to quantify population divergence called GST, which he defined like this:


Nei's motivaton with GST was to generalise Sewall Wright's FST **, which was defined for diploid organisms and two-allele systems, so that it could be used for any genetic data. But there's a problem with this formulation. Because HT  is always larger than H and can't be greater than one, the maximum possible value of  GST  is 1-HS. This dependency on the within-population genetic diversity means comparisons between studies, and even between loci in one study, are difficult (since Hwill likely be different in each case). This is particularly worryingly for highly polymorphic makers like microsatellites, which can give values of HS as high as 0.9, severely constraining the possible values of GST.

Although the problem of  GST's dependence on HS has been known for a while, it's taken some time for new statistics that get around this problem to be developed. Philip Hedrick (doi: 10.1554/05-076.1) along with Patrick Meirmans (doi: 10.1111/j.1755-0998.2010.02927.x) introduced G''ST  - a version of GST that is corrected for the observed value of HS as well as the number of sub-populations being considered. Meirmans used a similar trick to define φ'ST  (doi: 10.1111/j.0014-3820.2006.tb01874.x), another FST analogue that partitions genetic distances into within- and between-population components. Most recently, Lou Joust introduced an entirely separate statistic, D, that  directly measures allelic divergence (doi 10.1111/j.1365-294X.2008.03887.x). 

The statistical programming language R is becoming increasingly popular among biologists. Although there is a strong suite of tools for performing population genetic analyses in R, code to calculate these "new" measures of population divergence have not been available. My package, mmod, fills this gap.  I won't give too many details of the package here, as that's detailed in the paper and the package is will documented. Briefly, mmod has functions to calculate the three statistics described above (and Nei's  GST ), as well as pairwise versions of each statistic for every population in a datastet. It also allows users to perform bootstrap and jacknife re-sampling of datasets, the results of which are returned as user-accessable objects which can be examined with any R function (there is also a helper function to easily apply differentiation statistics to bootstrap sample and summarise the results) . The library is on CRAN, so installation is as easy as typing "install.pacakge("mmod")", the source code is up on github. If want to use the package I'd suggest reading the vignette ("mmod-demo") before you dive in.

I'm keen to hear about bugs or feature requests from users, just email them to david.winter@gmail.com


Winter, D.J. (in press). MMOD: an R library for the calculation of population differentiation statisticsMolecular Ecology Resources : dx.doi.org/10.1111/j.1755-0998.2012.03174.x

* mmod actually uses nearly unbiased estimators for these parameters, to deal with the way small population samples can mis-represent the actual allele frequencies in populations.

** I don't want to write an entire history of F-statisitcs here, because it's a big and murky topic, but I did want to make the point that the formulation I gave for GST  is often presented as "Wright's FST " in genetics courses. Wright was certainly aware that his statistic was related to the proportion of heterozygotes you expect to get in a populaiton, but, when he introduced F-statistics in general, and FST  in particular, he was really dealing with correlation among gametes at various levels of population structure. Unfortunately, there are now many many definitions of FST  floating around, and it's probably pointless to argue about a "right one". If you use my package I encourage you to be explicit about, and cite, the particular statistic that you are using. For each of the the FST  analogues that the package calculates the in-line help contains the correct reference. 

Sunday Spinelessness – How snails conquered the land (again and again) - The Atavism

Aug 05, 2012

Christie Willcox wrote a nice article this week on how one small group of organisms called "vertebrates" first evolved to live on land. Since you are a vertebrate who lives on land, you should probably go and read Christie's piece. I wouldn't want you, however, to go around thinking those first fish to leave the ocean behind were pioneers making a uniquely difficult transition. By my figuring, onycophorans (velvet worms like peripatus), tardigrades, annelids, nematodes, nemerteans (ribbon worms) and quite a few arthropod lineages have also taken up a terrestrial lifestyle. Many of those lineages were already breathing air before Tiktaalik, Ichthyostega and your other long-lost relatives came along to join them on land. But if you want to talk about transitions from marine to terrestrial lifestyles then you really want to talk about snails. You can find snails living in  almost every habitat between the deep ocean and the desert, and snails have adapted to life on land many different times. In fact, a litre of leaf litter taken from a New Zealand forest can contain snails representing three separate transitions from water to land.

Almost all the land snails I've talked about here at The Atavism are descendants from just one invasion of the land. We call these species the stylommatophorans and you can tell them from other landlubber-snails because they have eyes on stalks (as modeled here by  Thalassohelix igniflua):

These snails are part of a larger group of air-breathing slugs and snails (including species living in fresh water,  estuaries and even the ocean) called pulmonates or "lung snails". As both the common and the scientific names suggest, pulmonates breathe with lungs. Specifically, the mantle cavity, which contains gills in sea snails, is perfused with fine veins that allow oxygen to permeate the snails's blood. In relatively thin-shelled species you can often see this "vasculated" tissue in living animals:

Blacklight photo of Cepaea nemoralis showing 'vascularised' lung. Photo is CC BY-SA via Wikipedian Every1Blowz
The pulmonates can also regulate the amount of air entering their lungs with the help of an organ called the pneumatostome or breathing pore -  an opening to the mantle cavity that the snail can open or close at will:

A leaf-veined slug from my garden - the small opening near the "centre line" of the slug is the pneumatostome. Interestingly, leaf-veined slugs don't have lungs, the pneumatostome opens to a series of blind tubes not unlike an insect's respiratory system

So that, along with a whole load of adaptations that prevent a fundamentally wet animal from drying out, is your basic land snail. But those little leaf-litter snails I've been talking about for the last couple of weeks provide a good reminder that other snail lineages have left the life aquatic. Here's a species you find almost everywhere there is native forest in Otago, Cytora tuarua:

Holotype of Cytora tuarua B. Marshall and Barker, 2007. Photo is from Te Papa Collectons onlne, and provided under a CC BY-NC-ND license
Cytora is from the superfamily Cyclophoroidea, a group of snaisl that have indepedantly adapted to life on (relatively) dry land. (The weirdly un-twisted Opisthostoma is in this post is another cyclophoroid).  Cyclophoroids share some stylommatophoran adaptations to life on land, they've lost their gills and replaced them with a heavily vesculalised mantle cavity. Slightly oddly, cyclophoroids also breathe with their kidneys. Or, at least, the nephridium, an organ which does the same job as a vertebrate kidney, includes "vascular spaces" that the snail can use to collect oxygen from the air. Cyclophoroids don't have an organ equivalent to the breathing pore to control the flow of air into the mantle cavity. Instead the mantle cavity is open and air enters by diffusion, or in larger species, as the result of movements of the animals head. 

For the most part, the respiratory and excretory systems in cyclophoroids are not as well adapted to life on land as those in their stylommatophoran cousins. For this reason, most cyclophoroids are only active in very humid conditions. In my limited experience, Cytora species are usually found deep in moist leaf litter and soil samples, and I've never seen one crawling about. Nevertheless, some species can survive in drier situations, and these are certainly terrestrial snails.

Local leaf litter samples reveal a third move from the water to land. I don't have nice photo of Georissa purchasi, and I can't find anything else on the web either, so you're stuck with a crumby drawing from my notebook:

I did warn you that it was a crumby drawing. In life G. purchasi have an orange-red sort of a hue, and you can often see patches of pigment from the animal through the shell.  Georissa species are from the family Hydrocenidae and are quite closely related to a group of predominantly freshwater snails called nerites. Just like the other lineages discussed, the Hydrocenidae have given up their gills and breathe through a vasculated mantle cavity. Very little is known about the biology of these snails. G. purchasi is sometimes said to be limited to very wet conditions, but I've collected (inactive) specimens form the back of fern fronds well above ground so it can't be completely allergic to dry . 

So, in a handful of leaf litter collected from a Dunedin park you might have cyclophoroids, hydrocenids and  stylommatophorans - descendants from three different moves from sea to land. If we look a little more broadly,  there are are many more examples of this transition.  I've written about the the helicinids before, then there are terrestrial littorines (perwinkle relatives) some of which have both gills and lungs. Plenty of other pulmonate lineages that have also taken up an entirely terrestrial lifestyle. Because some of these groups have adapted to life on land multiple times, there have probably been more than 10 invasions of the land by snails.

Most of the description of Cyclophoroids here is taken from:

Barker, GM (2001) Gastropods on land: phylogeny, diversity and adaptive morphology In Barker (Ed.),  The biology of terrestrial molluscs (pp 1146) CABI Publishing.