By Grant Jacobs 17/08/2020 1

New Zealand has an advantage in managing COVID-19: all our cases come from the border, one way or other.[1] We can exploit this in how we shut down outbreaks. One way is to have at the ready the viral genomes of all border cases. With this we can quickly find the starting point of any outbreaks inside the country.

If the genome sequence of a case within the country very closely matches a case at the border, that’s likely where it started. Because our cases come from the border, viral genome sequences can be a shortcut to find the index case, a key to finding all people in an outbreak.

To be in with a chance of finding a match this way, we need to collect all the viral genomes from positive cases at our borders we can. Joep de Ligt, bioinformatics and genomics lead scientist at ESR, wrote,

We’ll have a new breakdown after today’s sequencing run finishes. But everything we have received we have attempted to sequence. As mentioned above these cases typically have low viral loads meaning low chances of getting a good quality genome.

“Everything we have received”: the sequencing teams (not just at ESR) are putting in a huge effort sequencing at short notice,[2] but don’t have samples for all cases at border.

New Zealand has received a lot of praise for how well we’ve done.[3] ‘Winners’ of the COVID-19 game will not (just) be nations with periods of no cases, but those nations that find ways to quickly put down outbreaks that follow.

Part of any game is spotting weaknesses at half-time, and adapting your play. The viral genomes of all border cases can be part of that for us.

Missing data

Joep implied they’re not getting samples for all cases at the border.

Perhaps some cases were simply missed. Other samples may have too little RNA to get a reliable genome sequence. Perhaps these people need to be re-tested (this might not be being done at the moment). The low level of RNA in a sample might be the sampling process not collecting enough RNA. It might also be well sampled, but from a person with such a small viral load they’re unlikely to be infectious – in that case they’re unlikely to be a start of an outbreak. In a few cases we may simply luck-out. It happens. Also, for a few cases a test may not be considered appropriate.

Whatever the issues are, we ought to make our best effort to get the genome sequence of as many of these border cases as we can. The better the coverage, the more likely we can match cases in the country back to their border ‘index’ cases.

Finding the index case

The index case is the case that started an outbreak. They are a critical case in tracing outbreaks.

Until you find the index case, you’re trying to work both ‘backwards’ (who infected this person) and ‘forwards’ (who did that person infect). Once you have the index case, you only work forwards to find all the people infected.

Matching the index case works because viral genomes change over time.

Viral genomes accumulate small changes when they replicate. These variations are called mutations. We owe the nasty image of mutations to science fiction. In reality most don’t change how the virus functions, many make the virus a dud (in which case we won’t know about it as that line will die out); only a few changes make meaningful differences, and most of those differences are subtle.

Because the viruses in each person slightly differ we can track the infections. A person will have gotten their infection from a person with a virus very similar to the one in their body, not a person with a very different variant.

If NZ has an outbreak, we should be able to find which person at the border started the outbreak by comparing the virus genome sequence of those infected in the country against the virus genome sequence of everyone who tested positive at the border. The closest match is our likely index case.

To do a proper job we really need the virus genome sequence of every person who tested positive at the border. If we don’t have the genome sequence of an index case we later need, we’ll have a harder job of finding everyone who is part of an outbreak from that starting point.

So far the genomes of the current outbreak hasn’t been matched to anyone at the border.

Tracking clusters

We can learn other things from virus genome sequences.

Genome sequences can also tell us if there is more than one cluster in an outbreak. Each cluster will have similar virus genome sequences. Unrelated clusters will have different variants of the virus genome. (Related clusters are trickier!)

As I write, there are reports that the first four genomes are all of the same type, B.1.1.1. These four cases are part of a close contact-traced cluster (a family), so this is what we expect. As more data comes in we’ll be able to tell if the new cases are likely part of the same cluster, not just from contact tracing, but also from the genomes of the virus.

Placing cases on the tree

Another use we can make of a virus genome sequences is to place an infection on the very large ‘family tree’ of virus infections around the world. This isn’t matching one person to one person, but working out where a person fits in the big pattern of virus transmission.

If we look at a large family tree of a virus sequences over time, subgroups of subgroups viruses can be seen as they are passed on from one person to another.

These can also tell us a bit about how the virus is moving around the world, and how a local line of infection fits into that. It’s clearer in the early stage of the pandemic. We could track the virus making its way to ski fields in central Europe, then up to Scandinavia, for example.

The B.1.1.1 line in the NZ family is mainly seen in the UK, but it has been found me many countries; South AfricaSwitzerland, Australia, Spain, and so on.

Weeks (or months) later…

It’s not always easy to identify what country an infection came from using the virus genome sequence unfortunately.

SARS-CoV-2, the virus causing COVID-19, has been pinging across the globe, taken by travellers to every country you can think of. It’s everywhere.

For virus groups that have been around for a while, that line of infection can come from many different places.

It gets about. The viral genome from a zoo tiger in the Bronx was also found in an 42 year-old Australian, and a 21 year-old Taiwanese.

For some varieties we have a patchy record of their genomes around the world. Some just aren’t sampled as well as others.

Just matching genomes might confusing. You need to add where that variety has been seen recently, and how it might travel to NZ.

We can say it’s unlikely to have come from our lockdown cases.

Not from our lockdown

The cases in New Zealand at the moment have viral genome sequences different to any of the genome sequences seen in cases from our lockdown – for those that we have genome sequences of.

If we had sequenced the genome of every person who tested positive within the country, we’d know for sure if it came from within the country and not the border, or vice versa. But we’re getting a good heads-up that it’s unlikely.

It’s unlikely a chain of infection would last ~100 days since the last documented case without being noticed. Very unlikely. But it’d be nice to formally rule it out. We likely[4] can’t quite do this as there are infected people that we don’t have virus genomes for, but that the new cases don’t match ones we know of adds more weight to that it didn’t come from inside the country, that it hasn’t been hanging around all that time. It’s more likely that we’re looking at a more recent introduction from our borders.

Finding chains

In principle, viral genome sequences could also be used to track chains of infection within a country. Here we’d be trying to build up the same chain of infections that contact tracing would, but using genome sequences.

For some viruses this would be workable, but for SARS-CoV-2,[5] the virus that causes COVID-19, we can’t do this using genome sequences alone.

The rate changes accumulate varies from one virus to another. SARS-CoV-2 doesn’t accumulate changes as quickly as other viruses: changes only happen once every several transmissions. As we track a line of infection for several people in a row the virus genome will look much the same. That make it a hard virus to try track infections in detail just using the genome sequences.[6]

Genome sequences can also help vaccine and drug design, by letting us know what parts of the virus often have variants, what the variants are, and what parts of the virus typically stay the same in all copies of the virus. There yet even more uses, like trying to trace back the origins of the virus.[7]

Borders are not perfect

This is all about trying to keep the virus out of NZ until we have a good vaccine. A lot of the communication say things work or they don’t.[6] In practice, border measures are highly probable things.

Most people staying in isolation for 14 days are not going to bring an infection into the country, but border measures are not ‘100%’.

Our border measures are good, but they’ll always be imperfect: winning the COVID-19 game is about strong borders and efficiently squashing outbreaks.

Part of the game is using layered approaches. Epidemiologists talk of the ‘Swiss cheese model’. Each layer of measures has holes, but layered up appropriately the holes are covered. (Think of stacking slices of holey cheese.)

The government has acted quickly—a good thing—but it’d have been great if we’d a genome sequence for every positive border case at the ready. We’d stand a better chance of finding that index case by matching against whats on file.

(Keep in mind it’s not trivial. As I wrote earlier, in some cases samples won’t work, people may need retesting, and so on.)

In the wider mix of things is testing border workers for infection, and the sea ports as well as the airports.

I’ve no doubt more effort will be put in these directions. That effort should include viral genome sequences for all positive cases at the border.

About the author

I’m a scientist (a computational biologist) and a science writer. I’ve been tracking this outbreak from early on by fortuitous accident. From early last year I have been researching zoonoses for writing projects. Part of that was looking out for examples of outbreaks. In early December 2019, a ‘pneumonia of unknown cause’ from China drew my attention, and I’ve followed the science and specialists’ discussions since. Consider this an opinion piece, but one founded on research science and specialists’ thoughts. You can follow me on Twitter.

Other articles in Code for life

(A selection of things unrelated to COVID-19 for readers wanting a change of pace!)

Not cow farts (Or, burps away!)

Temperature-induced hearing loss (A fascinating intersection of genetics and disability; for these people their hearing varies depending on the temperature!)

The sheep-leaf nudibranch (A very cute sea creature.)

Monkey business, or is my uncle also my Dad? (For male pygmy marmosets, their genetic father could be their uncle.)

Political parties and GMOs: we all need to move on (We have an election coming up…)

1000 of these now (Nearly endless reading here! – scroll down to the list about half-way down.)


I’ve cut out a lot of complexity and specifics to focus on the principle ideas. Any subject experts reading this ought to know my explanation isn’t for you!

My record here draws on many conversations online. Thanks to those in the team who shared tidbits of information. Apologies for not naming you all here for the rather pathetic reason that this has been a particularly slow effort without documenting it further.

1. I’m putting aside the idea of infection from imported goods. If you can’t find the index case, and we haven’t yet, you want to consider other alternatives if there are any. It’s much less likely that person-to-person spread, though. (And besides I want to keep this short!)

2. They’re using a fast sequencing technique developed in NZ during our lockdown.

3. There are other countries that have done very well, argably better than us. Vietnam and Sarawak are two places I’ve love to cover if I find time and headspace to.

4. I have to speculate here; I don’t have data to say firmly one way or other. There are certainly positive cases we have no genome sequence for, but it’s also possible that we’ve covered all the clusters with at least one person sequenced, and all the non-cluster cases.

5. Paradoxically (to me) the disease is named for the virus family, and the virus partly named for the disease. The virus (SARS-CoV-2) is a member of the coronavirus family (COVID); the disease (COVID-19) is a variant of SARS, severe acute respiratory syndrome.

6. I’d like to expand on this in another post.

7. There are several reasons for this. It’s a topic in itself. There’s unfortunately a lot of nonsense about mutations online and in media.


©Grant H. Jacobs, August 2020-

This is one of several pieces I’ve wanted to write for some time (some for weeks, others longer). This topic has been overrun by recent media accounts, but I’ve stuck mostly to my original score, as it were. (The main difference is Joep’s quote, and setting a lot of detail aside.) If I’m up to it, I’ll try get some of the other material out over the next wee while. Encouragement and support is always welcome!

Featured image

Screenshot (cropped) of the analysis of SARS-CoV-2 genome data at Nextstrain at the time of writing.

One Response to “COVID-19: sequence the viral genomes of all border cases”