By Guest Author 15/04/2020


Professor Alexei Drummond and Dr David Welch

A mathematical model attempts to describe a system in terms of the elements that make up the system, their states and the interactions between these elements. The detail of the model is chosen to allow certain behaviours of the system to be well-understood without being bogged down by unnecessary details.

Mathematical modelling of infectious diseases

Most infectious disease models fall in the category of “compartmental” models. In these models a host population is divided into different types or “compartments” which describes their disease status. A basic distinction among compartmental models is between SIS models and SIR models.

SIS stands for susceptible-infected-susceptible and they are used to model endemic diseases for which infection does not confer immunity. In such a disease, individuals return to the susceptible pool after an infection and can be reinfected multiple times.

SIR models stand for susceptible-infected-removed and are used to model epidemic diseases where infected individuals are no longer susceptible and transition to the “removed” compartment after an infectious period (removed could be either by immunity or death).

The population can be modelled as well-mixed (everybody can infect everybody else) or structured (for example, into households where hosts in the same household infect each other at higher rates than hosts in different households). Using different age categories is another common way to structure the population.

Along with the mathematical structure of the model come a lot of rate parameters which describe how quickly hosts move between compartments. These include the rate at which an infected individual infects susceptible individuals, and the rate of recovery of infected individuals. In simple models the ratio of these two rates (infection rate/recovery rate) is known as the “reproductive number” (R). In general, R is defined as the number of secondary cases that a primary case causes. So long as R > 1, the epidemic will continue to grow and will not die out (except for by chance if the number of infections is very low).

Interests and nuisances

Generally speaking, there will be some parameters or quantities that we care about, and some that we don’t really care about. The ones we care about are called “parameters of interest” or “quantities of interest”, and the ones we don’t are sometimes called “nuisance parameters”.

In the case of COVID-19, a quantity of interest would be the number of deaths we expect, and that is connected to a parameter of interest called the “infection fatality ratio”. A nuisance would be the number of people that have very mild symptoms. We don’t really know how many people have very mild symptoms (because by definition they are hard to detect) but it might be necessary to know that number well in order to get a good estimate of the infection fatality ratio and therefore the number of deaths to expect.

Another important quantity is the “final size” of the epidemic: how many people would ultimately become infected. Under a simple closed SIR model, the final size is a simple function of R0. But under more complex modelling assumptions including structure, relating the final size to R0 is more complicated. This could have a major bearing on predictions related to the total number infections and deaths.

Addressing uncertainty

A key distinction among models of infectious diseases is whether the model is stochastic or deterministic. Stochastic models can address the uncertainty in predictions due to random chance. Since stochastic models generally deal directly with the absolute numbers of infections (rather than, say, the fraction of the population infected), they are also better at handling questions like extinction and elimination of diseases (i.e., predicting the time of removal of the last infected individual). Deterministic models are used to deal with situations where the average behaviour is of interest, or the number of infections is so large that we can assume the effects of randomness are negligible. Deterministic models are typically easier to deal with mathematically, but are problematic for understanding dynamics at the beginning or end of the epidemic when random chance can have a large role to play in outcomes.

All models make many assumptions, but many decades of work have gone into validating when these assumptions can be made and which inferences are robust to these simplifications. Good modellers don’t just understand the models and related assumptions, but also understand how the different parts of the models interact, and what aspects of the biology of a given infectious disease need to be carefully considered.

We use data to estimate the parameters of a model. For compartmental models, the main type of data we use are counts of how many people are infected, recovered, etc., through time. The parameters of the model are chosen so that the model can accurately recreate the curve that these counts make when charted. We can then use the model to extrapolate out beyond the observed data. But it is very hard or impossible to figure out where an infection came from or who infected whom using only count data. This is where genetic data obtained from the virus is very useful. When one person transmits a virus to another, the viruses in those two people will be very similar genetically. Conversely, viruses sampled from unrelated cases will be genetically different from each other.

Mapping the evolution of a disease

To model genetic data, we need to use phylogenetic models. Phylogenetic models are used to understand the evolution of an infectious disease. Rapidly evolving diseases such as those caused by RNA viruses (e.g. Coronaviruses, Dengue, Ebola, Hepatitis C, HIV, Influenza, Zika) create a large amount of genetic diversity over the course of an epidemic. In these cases, phylogenetic models can tell us something about who infected whom, as well as  helping to estimate the same parameters of interest that are dealt with by traditional mathematical models of infectious disease.

Using the basic idea that infections that are related to each other are more similar genetically than infections that are distantly related, the differences between the genome sequences of different infections can be compared to reconstruct an evolutionary history that shows the relationships between all the samples. This evolutionary history is a tree, like a family tree, but for viruses.  The technical term for the tree is a phylogeny, hence the name phylogenetics.  Two individuals that were infected in the same cluster will have viruses with very similar or identical genomes and will appear clustered closely together on the tree. This information can be crucial in verifying whether two infections came from the same source, thereby supporting more traditional epidemiological approaches (such as contact tracing).

Learning from the virus family tree

By studying where the genetic samples obtained in New Zealand fall in a tree that also includes samples taken from around the world, we can answer many questions. We can determine how many introductions of the disease occurred and from where. We can see how many independent clusters exist within New Zealand, and provide potential source information for cases where contact tracing has failed to determine the source.

Phylogenetic models also offer an alternative route to estimating the total number of infections in an epidemic. The key insight here is that a large number of infections will contain more genetic diversity than a small number of infections. So if we find sampled infections in a region that are all quite genetically distinct from each other, we deduce that there is a large number of undetected infections in that region (so long as multiple introductions can be ruled out by aforementioned approach). This feeds into questions like what proportion of the total number of infections are being detected as confirmed cases.

Increasingly, researchers working in this area are moving towards combining mathematical and phylogenetic approaches. Since genetic diversity in the virus and the number of cases are related, estimates of the genetic diversity through time that can be obtained from the phylogenetic models lead to estimates of the number of cases through time which is what the mathematical models are designed to handle. Using these models together, with common parameters and using as much data as possible at once leads to more robust estimates and a fuller understanding of the epidemic.

Professor Alexei Drummond is Director of the Centre for Computational Biology at the University of Auckland. Dr David Welch is a Senior Lecturer in the School of Computer Science at the University of Auckland. Both are currently involved in the National Crisis Management Centre response to the New Zealand Covid-19 epidemic.