In DepthInfectious Diseases

Genome analyses help track coronavirus' moves

See allHide authors and affiliations

Science  13 Mar 2020:
Vol. 367, Issue 6483, pp. 1176-1177
DOI: 10.1126/science.367.6483.1176

Embedded Image

Italy's COVID-19 outbreak has led to empty tables in St. Mark's Square in Venice.


Immediately after Christian Drosten published a genetic sequence of the novel coronavirus online on 28 February, he issued a warning on Twitter. As the virus has raced around the world, more than 350 genome sequences have been shared on GISAID, an online platform. They offer clues to how the virus, named SARS-CoV-2, is spreading and evolving. But because the sequences represent a tiny fraction of cases and show few telltale differences, they are easy to overinterpret, as Drosten realized.

A virologist at the Charité University Hospital in Berlin, Drosten had sequenced the virus from a German patient infected in Italy. The genome looked similar to that of a virus found in a patient in Munich, the capital of Bavaria, more than 1 month earlier; both shared three mutations not seen in early sequences from China. Drosten realized the similarity could suggest the Italian outbreak was “seeded” by the one in Bavaria, which state public health officials said they had quashed by tracing and quarantining all contacts of the 14 confirmed cases. But he thought it was just as likely that a Chinese variant carrying the three mutations had taken independent routes to both countries. The newly sequenced genome “is not sufficient to claim a link between Munich and Italy,” Drosten tweeted.

His warning went unheeded. A few days later, Trevor Bedford of the Fred Hutchinson Cancer Research Center, who analyzes the stream of viral genomes, tweeted that the pattern “suggested” that the outbreak in Bavaria had not been contained after all, and had touched off the Italian outbreak. The analysis spread widely on Twitter and elsewhere—this Science correspondent retweeted the thread as well—and some Twitter users called on Germany to apologize.

Virologist Eeva Broberg of the European Centre for Disease Prevention and Control agrees with Drosten that there are more plausible scenarios for how the disease reached northern Italy than undetected spread from Bavaria. Other scientists agree. “I have to kick [Bedford's] butt a bit for this,” says Richard Neher, a computational biologist at the University of Basel who works with Bedford. “It's a cautionary tale,” says Andrew Rambaut, a molecular evolutionary biologist at the University of Edinburgh. “There is no way you can make that claim just from the phylogeny alone.” Bedford now acknowledges as much. “I think I should have been more careful with that Twitter thread.”

It was a case study in the power and pitfalls of real-time analysis of viral genomes. “This is an incredibly important disease. We need to understand how it is moving,” says Bette Korber, a biologist at the Los Alamos National Laboratory who is also studying the genome of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2). But for now, scientists who analyze genomes can only make “suggestions,” she says.

The very first SARS-CoV-2 sequence, in early January, answered the most basic question about the disease: What pathogen is causing it? The genomes that followed were almost identical, suggesting the virus, which originated in an animal, had crossed into the human population just once. If it had jumped the species barrier multiple times, the first human cases would show more variety.

Some diversity is now emerging. Over the length of its 30,000-base-pair genome, SARS-CoV-2 accumulates an average of about one to two mutations per month, Rambaut says. Using these little changes, researchers draw up phylogenetic trees, much like family trees, make connections between cases, and gauge whether there might be undetected spread of the virus.

For example, the second virus genome sequenced in Washington—from a teenager diagnosed on 27 February—looked like a direct descendant of the first genome, from a case found 6 weeks earlier. Bedford tweeted that he considered it “highly unlikely” that the two genomes came from separate introductions, and said the virus must have been circulating undetected in Washington. Both patients came from Snohomish County, making the link far more persuasive than the one Bedford drew between Bavaria and Italy, Rambaut says: “It's very unlikely that this highly related virus would travel to exactly the same town in Washington.” By now the state has reported more than 160 cases, and genomes from additional patients have bolstered the link Bedford suspected.

Still, the wealth of genomes is just a tiny sample of the more than 100,000 cases worldwide, and it's uneven. On 9 March, Chinese scientists uploaded 50 new genome sequences—some of them partial—from COVID-19 patients in Guangdong province; most previous ones were from Hubei province. But overall, less than half of the published genomes are from China, which accounts for 80% of all COVID-19 cases. And sequences from around the world are still very similar, which makes drawing firm conclusions hard. “As the outbreak unfolds, we expect to see more and more diversity and more clearly distinct lineages,” Neher says. “And then it will become easier and easier to actually put things together.”

Scientists will also be scouring the genomic diversity for signs that the virus is getting more dangerous. There, too, caution is warranted. An analysis of 103 genomes published by Lu Jian of Peking University and colleagues on 3 March in the National Science Review argued they fell into one of two distinct types, named S and L, distinguished by two mutations. Because 70% of sequenced SARS-CoV-2 genomes belong to L, the newer type, the authors concluded that this type has evolved to become more aggressive and to spread faster.

“What they've done is basically seen these two branches and said, that one is bigger, [so that virus] must be more virulent or more transmissible,” Rambaut says. But other factors could be at play. “One of these lineages is going to be bigger than the other just by chance.” Some researchers have called for the paper to be retracted. “The claims made in it are clearly unfounded and risk spreading dangerous misinformation at a crucial time in the outbreak,” four scientists at the University of Glasgow wrote on In a response, Lu wrote that the four had misunderstood his study.

Most genomic changes don't alter the behavior of the virus, Drosten says. The only way to confirm that a mutation has an effect is to study it in the lab and show, for instance, that it has become better at entering cells or transmitting, he says. So far, the world has been spared that piece of bad news.

View Abstract

Stay Connected to Science

Navigate This Article