Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak

See allHide authors and affiliations

Science  04 Jan 2019:
Vol. 363, Issue 6422, pp. 74-77
DOI: 10.1126/science.aau9343

Mobile detection of Lassa virus

Lassa fever is a hemorrhagic viral disease endemic to West Africa. Usually, each year sees only a smattering of cases reported, but hospitalized patients risk a 15% chance of death. Responding to fears that a 10-fold surge in cases in Nigeria in 2018 signaled an incipient outbreak, Kafetzopoulou et al. performed metagenomic nanopore sequencing directly from samples from 120 patients (see the Perspective by Bhadelia). Results showed no strong evidence of a new strain emerging nor of person-to-person transmission; rather, rodent contamination was the main source. To prevent future escalation of this disease, we need to understand what triggers the irruption of rodents into human dwellings.

Science, this issue p. 74; see also p. 30


The 2018 Nigerian Lassa fever season saw the largest ever recorded upsurge of cases, raising concerns over the emergence of a strain with increased transmission rate. To understand the molecular epidemiology of this upsurge, we performed, for the first time at the epicenter of an unfolding outbreak, metagenomic nanopore sequencing directly from patient samples, an approach dictated by the highly variable genome of the target pathogen. Genomic data and phylogenetic reconstructions were communicated immediately to Nigerian authorities and the World Health Organization to inform the public health response. Real-time analysis of 36 genomes and subsequent confirmation using all 120 samples sequenced in the country of origin revealed extensive diversity and phylogenetic intermingling with strains from previous years, suggesting independent zoonotic transmission events and thus allaying concerns of an emergent strain or extensive human-to-human transmission.

Lassa fever is an acute viral hemorrhagic illness, first described in 1969 in the town of Lassa, Nigeria (1). It is contracted primarily through exposure to urine or feces of infected Mastomys spp. rodents or, less frequently, through the bodily fluids of infected humans. Lassa virus (LASV) is endemic in parts of West Africa, including Nigeria, Benin, Côte d’Ivoire, Mali, Sierra Leone, Guinea, and Liberia (2). The upsurge of Lassa fever cases during the 2018 endemic season in Nigeria—referred to here as the 2018 Lassa fever outbreak—has been the largest on record, reaching 1495 suspected cases and 376 confirmed cases and affecting more than 18 states by 18 March (fig. S1). This notably exceeds the 102 confirmed cases reported during the same period in 2017 (fig. S1) (3). The unprecedented scale of the outbreak raised fears of the emergence of a strain with a higher rate of transmission. Because of these concerns, on 28 February the Nigeria Centre for Disease Control (NCDC) and the World Health Organization (WHO) urgently requested sequencing information and preliminary results from our pilot-scale study, in which we used a metagenomic approach with the Oxford Nanopore MinION device (Oxford Nanopore Technologies) to conduct in-country, mid-outbreak viral genome sequencing. This instigated a major uptick in sequencing efforts, leading to the sequencing of 120 samples.

Nanopore sequencing is an emerging technology with great potential. The MinION is a small, robust sequencing device suited for the genetic analysis of pathogens in remote or resource-limited settings (4). Nanopore sequencing of polymerase chain reaction (PCR) amplicons of Ebola virus genomes provided important data from the field in real time during the 2014–2016 Ebola virus disease outbreak in West Africa (5), and a more sophisticated multiplex amplicon sequencing methodology (6) has been used effectively during recent Zika and yellow fever outbreaks in Brazil (7, 8). However, highly variable pathogens such as LASV present a substantial challenge for this type of amplicon-based approach. Owing to an interstrain nucleic acid sequence variation of up to 32 and 25% for the L (large segment encoding the RNA polymerase and the zinc-binding protein) and S (small segment encoding the glycoprotein and the nucleoprotein) segments, respectively (9), even PCR-based laboratory diagnosis poses a serious challenge. Designing targeted whole-genome sequencing approaches, such as those using PCR amplicons or bait-and-capture probes, without prior knowledge of the targeted LASV lineage is therefore cumbersome. Random reverse-transcription (RT) and amplification by sequence-independent single primer amplification (SISPA) for metagenomic sequencing to identify RNA viruses has been demonstrated to work on the MinION (10), and our previous work highlighted the feasibility of retrieving complete viral genomes directly from patient samples at clinically relevant viral titers using this approach for dengue and chikungunya viruses (11). We describe h­ere the application of field metagenomic sequencing of LASV at the Irrua Specialist Teaching Hospital (ISTH), Edo State, during the 2018 Lassa fever season.

A total of 120 LASV-positive samples were sequenced during a 7-week mission; these were selected on the basis of cycle threshold value and location of the 341 cases reported by ISTH between 1 January and 18 March 2018 (figs. S1 and S2). The majority of samples originated from Edo State followed by Ondo and Ebonyi (fig. S2). Selected samples covered the wide range of clinical viral loads observed, including several samples testing negative in one of the two real-time RT-PCR assays used (fig. S3 and data S1). Up to six samples were run in multiplex per MinION flow cell, along with a negative control. To produce high-confidence consensus sequences for phylogenetic inference, we chose to map both basecalled reads and raw signal data to a reference sequence and call variants using Nanopolish software, as developed for the West African Ebola virus disease outbreak (5); basecalled reads were then remapped to the consensus and a further round of correction was applied (fig. S4). Owing to the diversity of LASV, selection of an individual reference genome for read alignment was required for each sample. To select the closest existing LASV reference genome, nonhuman reads from each sample were assembled de novo using Canu (12). A notable proportion of reads generated per sample were LASV at an average frequency of 4.26% with a maximum of 42.9%, allowing for sufficient genomic sequence (>70%) for phylogenetic comparison of at least one segment in 91 of the samples tested (figs. S3 to S6).

Additionally, sequences were validated by Illumina resequencing of 14 SISPA preparations, which matched with their Oxford Nanopore counterparts with little to no divergence, confirming the accuracy of the Oxford Nanopore approach (table S1).

Metagenomic classification using the Centrifuge software system (13) identified 0.10% of reads from sample 110 as originating from hepatitis A virus, providing 74% genome coverage at 20-fold depth. LASV accounted for 0.83% of reads in the same sample, providing 96% genome coverage. These findings demonstrate the potential of this simple approach to identify multiple RNA viruses, including those present as co-infections. In all other samples tested, LASV was the sole pathogen identified despite a small number of reads classified as other viruses (fig. S7 and data S1).

To dissect the molecular epidemiology of the 2018 Lassa fever outbreak in Nigeria, we performed phylogenetic analysis of all newly generated LASV sequences together with unpublished sequences from previous years (data S2) and sequences available in GenBank. We used this as a frame of reference to document how the genomic data generated in real time (made publicly available at provided valuable epidemiological insights into the unfolding outbreak dynamics.

Maximum likelihood phylogenetic reconstruction of the S segment sequences indicates that all 2018 viruses fall within the Nigerian LASV diversity, specifically within genotypes II and III, and they are phylogenetically interspersed with Nigerian LASV sequences from previous years (Fig. 1). This phylogenetic pattern is mimicked by the L segment reconstruction (fig. S8). Only seven viruses in the entire genome dataset (n = 348) were identified as clustering significantly differently in the L and S segments (supplementary methods), which is in line with the small number of potential LASV reassortments identified previously (9). The phylogenetic pattern implicates independent spillover from rodent hosts as the major driver of Lassa fever incidence during the outbreak (Fig. 1 and fig. S8).

Fig. 1 Phylogenetic reconstruction of the S segment data.

The circular tree includes 96 sequences from 2012 to 2017, 88 sequences from 2018, and sequences available from GenBank. The rectangular tree focuses on the genotype II clade (in blue in the circular tree), which includes most of the 2018 sequences. The six genotypes are indicated with different colors and roman numerals. Bootstrap support >90% is indicated with a small gray circle at the middle of their respective branches. The color strip highlights the human LASV sequences obtained from previous years (light gray); sequences obtained from rodent samples (dark gray); and, for 2018, the first seven sequences generated in Nigeria (light pink), the remaining 28 sequences analyzed on-site (medium pink), and the remaining sequences finalized in Europe (dark pink). The same color code is used in the genotype II rectangular tree. Bootstrap values >80% are shown for the major genotype II lineages.

However, a number of sequences from the 2018 outbreak clustered as pairs in the phylogenetic reconstructions, raising concerns over human-to-human transmission. We illustrate such cluster pairs in a Bayesian time-measured tree estimated from genotype II S (Fig. 2) and L segment sequences (fig. S9). These analyses resulted in highly similar evolutionary rate estimates for both segments (mean, ~1.2 × 10−3 substitutions per site per year) (Fig. 2 and figs. S9 and S10), in agreement with previous estimates (9). We used these rate estimates together with an estimate of the time between successive cases in a transmission chain to assess how many substitutions can be expected between directly linked infections. We compared conservative to more liberal expectations, the latter accommodating an independent upper estimate of potential sequencing errors (Fig. 2 and fig. S9). In the S segment, for example, more than two substitutions between sequences from directly linked infections is highly unlikely (P < 0.01 and P = 0.03, respectively, for the conservative and liberal probability estimates). This expectation is consistent with the low number of substitutions observed in the coding region of human-to-human LASV transmission (14). Four clusters of sequences showing ≤4 and ≤12 nucleotide differences in the S and L segments, respectively, were identified (035-045, 035-058, 137-138, and 053-089-106; for some of them, only the S or L segment sequence was available). Retrospective tracing revealed that the sequences for pairs 137-138 and 035-058 were derived from the same patients. Epidemiological investigation of the remaining clusters did not provide evidence for transmission chains, though direct linkage cannot be excluded. Even when applying liberal assumptions for the number of mutations during human-to-human transmission, the vast majority of cases during the 2018 outbreak resulted from spillover from the natural reservoir.

Fig. 2 Assessing the potential for direct linkage between pairs of 2018 sequences in the S segment.

The maximum clade credibility tree summarizes a Bayesian evolutionary inference for the genotype II sequences in the S segment. A time scale and a marginal posterior distribution for the time to the most recent common ancestor are shown to the left. The size of the internal node circles reflects posterior probability support values. 2018 sequences clustering as pairs are indicated in dark pink; the number of substitutions between them is indicated at their respective tips. A posterior estimate of the evolutionary rate and probability distributions for observing a given number of substitutions during a human-to-human transmission event are shown as insets. The distribution represented by gray bars is based on the mean evolutionary rate estimate and a mean estimate for the generation time, whereas the light blue distribution is based on upper estimates and also incorporates an upper estimate for the MinION sequencing error (supplementary methods). At the bottom of the tree, clusters of sequences for which human-to-human transmission cannot be excluded according to the upper estimates of generation time are indicated. A pair of identical sequences (137-138) that was retrospectively found to be derived from the same patient is marked with a gray box. One pair (096-115) was disregarded as a potential transmission chain because of 21 differences in the L segment (fig. S9). The temporal signal before BEAST inference was explored in fig. S10.

A request for information on circulating strains was made on 28 February at the height of the outbreak; within 10 days, our pilot study was expedited and the initial analysis completed. The fact that the 2018 outbreak was fueled by the circulating LASV diversity and not by transmission of a new or divergent lineage was already evident from the first seven genomes generated by 10 March (fig. S1). This information was promptly communicated to the NCDC, forming the basis of its report released on 12 March 2018 (15). Whereas this small sample was restricted to genotype II, the final collection of 36 LASV genome sequences generated on-site also included a representative of genotype III (Fig. 1 and fig. S9), further supporting the spillover of long-standing LASV diversity in the outbreak. The conclusions drawn from the first set of genome sequences immediately eased fears of extensive human-to-human transmission and allowed public health resources to be allocated appropriately. The response was focused on intensified community engagement on rodent control, environmental sanitation, and safe food storage. Further research is needed to evaluate whether improved diagnostics and disease awareness and/or ecological and climate factors promoting transmission are the drivers behind the changing epidemiology of Lassa fever in Nigeria.

Portable metagenomic sequencing of genetically diverse RNA viruses on the MinION, direct from patient samples without the need to export material outside of the country of origin and with no pathogen-specific enrichment, is shown to be a feasible methodology enabling a real-time characterization of potential outbreaks in the field.

Supplementary Materials

Materials and Methods

Figs. S1 to S10

Table S1

References (1730)

Data S1 and S2

References and Notes

Acknowledgments: We thank the health authorities of Nigeria for their cooperation during the outbreak response. Funding: L.E.K., S.T.P., R.H., R.V., M.W.C., and J.A.H. acknowledge funding by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Emerging and Zoonotic Infections at the University of Liverpool in partnership with Public Health England (PHE), in collaboration with Liverpool School of Tropical Medicine. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, the Department of Health, or Public Health England. L.E.K. has received travel expenses and accommodation from Oxford Nanopore to speak at conferences regarding this work. L.E.K. has received some reagents free of charge from Oxford Nanopore in support of her Ph.D. project. M.W.C. has received reagents free of charge from Oxford Nanopore in support of previous projects not related to the work presented in this manuscript. L.E.K. and M.W.C. have not received other financial compensation nor hold shares. P.L. and M.A.S. acknowledge funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant 725422-ReservoirDOCS) and from the Wellcome Trust Collaborative Award, 206298/Z/17/Z. P.L. acknowledges support by the Special Research Fund, KU Leuven (“Bijzonder Onderzoeksfonds,” KU Leuven, OT/14/115), and the Research Foundation–Flanders (“Fonds voor Wetenschappelijk Onderzoek – Vlaanderen,” G066215N, G0D5117N, and G0B9317N). M.A.S. acknowledges support under National Science Foundation grant DMS 1264153. This study was supported by the German Federal Ministry of Health through support of the WHO Collaborating Centre for Arboviruses and Hemorrhagic Fever Viruses at the Bernhard Nocht Institute for Tropical Medicine (agreements ZMV I 1-2517WHO005 and ZMV I 1-2517WHO010) and through the Global Health Protection Program (agreement ZMVI1-2517-GHP-704), the German Federal Ministry for Economic Cooperation and Development through the Rapid Deployment Expert Group to Combat Threats (SEEG), the European Union’s Horizon 2020 research and innovation program to S.G. (grant 653316-EVAg), and the German Research Foundation (DFG) to S.G. and D.U.E. (GU 883/4-1). D.U.E. acknowledges fellowships from Alexander von Humboldt Foundation and Kirmser Foundation. The funders had no role in the design and interpretation of the data and preparation of the manuscript. Author contributions: L.E.K., S.G., S.D., S.T.P., and P.L. conceptualized the study; L.E.K., S.T.P., and P.L. set up the methodology; L.E.K., J.H., A.T., S.D., and D.U.E. performed sequencing and data validation; L.E.K., P.L., M.A.S., S.T.P., D.S., F.K., J.M., and S.Lo. performed the formal sequencing data analysis; L.E.K., S.D., J.H., A.T., M.P., and L.O. performed sample selection, data collection, and organization of sequencing datasets; D.M.W., K.E., D.S., F.K., and J.M. set up and assisted with the bioinformatics pipeline; M.A.S., D.U.O., M.P., L.O., Y.I., D.I.A., T.O., E.O., R.O., J.Ag., B.E., J.Ai., P.E., B.O., S.E., P.A., M.A., R.Es., E.M., R.G., A.E., G.I., G.Od., G.Ok., R.En., J.O., E.O.Y., I.O., C.A., M.O., R.A., E.T., D.A., N.A., P.O.O., M.O.R., K.O.I., C.O.I., P.A., C.E., G.A., and E.I. performed diagnostic analysis; L.E.K., S.T.P., P.L., and S.D. visualized data presentation; L.E.K., S.T.P., P.L., and S.D. wrote the manuscript; all authors reviewed and edited the manuscript; S.G., M.W.C., J.A.H., R.H., and R.V. supervised the study; M.P., R.V., A.T., C.I., P.F., D.N., S.O., E.O.E., S.G., S.D., and S.Lu. performed project administration and implementation; S.G., P.L., M.W.C., R.V., R.H., J.A.H., L.E.K., and D.U.E. were involved in funding acquisition. Competing interests: C.I. is a member of the WHO Strategic Technical Advisory Group on Infectious Diseases; D.A. serves as an expert for the WHO R&D Blueprint for action to prevent epidemics (the Blueprint); S.G. is a member of the Scientific Advisory Group (SAG) to advise WHO on the implementation of the Blueprint, including a plan for international coordination of the R&D effort in the event of a highly infectious pathogen epidemic; S.O. serves as an expert for the Blueprint. All other authors declare no competing interests. Data and materials availability: LASV sequences from 2018 are deposited in GenBank under BioProject PRJNA482058 (data S1); sequences from 2012 to 2017 are deposited under BioProjects PRJNA482054 and PRJNA482058 (data S2). Alignments, trees, and BEAST xml files are available at and in (16).
View Abstract


Navigate This Article