Spatio-Temporal Analysis of Surveillance Data

Jon Wakefield, Tracy Qi Dong, Vladimir N. Minin

arXiv

November 1, 2017

ABSTRACT

In this chapter, we consider space-time analysis of surveillance count data. Such data are ubiquitous and a number of approaches have been proposed for their analysis. We first describe the aims of a surveillance endeavor, before reviewing and critiquing a number of common models. We focus on models in which time is discretized to the time scale of the latent and infectious periods of the disease under study. In particular, we focus on the time series SIR (TSIR) models originally described by Finkenstadt and Grenfell in their 2000 paper and the epidemic/endemic models first proposed by Held, Hohle, and Hofmann in their 2005 paper. We implement both of these models in the Stan software and illustrate their performance via analyses of measles data collected over a 2-year period in 17 regions in the Weser-Ems region of Lower Saxony, Germany.

Evolution-informed forecasting of seasonal influenza A (H3N2)

Xiangjun Du, Aaron A. King, Robert J. Woods, Mercedes Pascual

Science Translational Medicine

October 25, 2017

ABSTRACT

Interpandemic or seasonal influenza A, currently subtypes H3N2 and H1N1, exacts an enormous annual burden both in terms of human health and economic impact. Incidence prediction ahead of season remains a challenge largely because of the virus’ antigenic evolution. We propose a forecasting approach that incorporates evolutionary change into a mechanistic epidemiological model. The proposed models are simple enough that their parameters can be estimated from retrospective surveillance data. These models link amino acid sequences of hemagglutinin epitopes with a transmission model for seasonal H3N2 influenza, also informed by H1N1 levels. With a monthly time series of H3N2 incidence in the United States for more than 10 years, we demonstrate the feasibility of skillful prediction for total cases ahead of season, with a tendency to underpredict monthly peak epidemic size, and an accurate real-time forecast for the 2016/2017 influenza season.

Resource-driven encounters among consumers and implications for the spread of infectious disease

 

Rebecca K. Borchering, Steve E. Bellan, Jason M. Flynn, Juliet R. C. Pulliam, Scott A. McKinley

Royal Society Interface

October 11, 2017

ABSTRACT

Animals share a variety of common resources, which can be a major driver of conspecific encounter rates. In this work, we implement a spatially explicit mathematical model for resource visitation behaviour in order to examine how changes in resource availability can influence the rate of encounters among consumers. Using simulations and asymptotic analysis, we demonstrate that, under a reasonable set of assumptions, the relationship between resource availability and consumer conspecific encounters is not monotonic. We characterize how the maximum encounter rate and associated critical resource density depend on system parameters like consumer density and the maximum distance from which consumers can detect and respond to resources. The assumptions underlying our theoretical model and analysis are motivated by observations of large aggregations of black-backed jackals at carcasses generated by seasonal outbreaks of anthrax among herbivores in Etosha National Park, Namibia. As non-obligate scavengers, black-backed jackals use carcasses as a supplemental food resource when they are available. While jackals do not appear to acquire disease from ingesting anthrax carcasses, changes in their movement patterns in response to changes in carcass abundance do alter jackals' conspecific encounter rate in ways that may affect the transmission dynamics of other diseases, such as rabies. Our theoretical results provide a method to quantify and analyse the hypothesis that the outbreak of a fatal disease among herbivores can potentially facilitate outbreaks of an entirely different disease among jackals. By analysing carcass visitation data, we find support for our model's prediction that the number of conspecific encounters at resource sites decreases with additional increases in resource availability. Whether or not this site-dependent effect translates to an overall decrease in encounters depends, unexpectedly, on the relationship between the maximum distance of detection and the resource density.

Efficient Data Augmentation for Fitting Stochastic Epidemic Models to Prevalence Data

Jonathan Fintzi, Xiang Cui, Jon Wakefield, Vladimir Minin

Journal of Computational and Graphical Statistics

October 9, 2017

ABSTRACT

Stochastic epidemic models describe the dynamics of an epidemic as a disease spreads through a population. Typically, only a fraction of cases are observed at a set of discrete times. The absence of complete information about the time evolution of an epidemic gives rise to a complicated latent variable problem in which the state space size of the epidemic grows large as the population size increases. This makes analytically integrating over the missing data infeasible for populations of even moderate size. We present a data augmentation Markov chain Monte Carlo (MCMC) framework for Bayesian estimation of stochastic epidemic model parameters, in which measurements are augmented with subject-level disease histories. In our MCMC algorithm, we propose each new subject-level path, conditional on the data, using a time-inhomogenous continuous-time Markov process with rates determined by the infection histories of other individuals. The method is general, and may be applied to a broad class of epidemic models with only minimal modifications to the model dynamics and/or emission distribution. We present our algorithm in the context of multiple stochastic epidemic models in which the data are binomially sampled prevalence counts, and apply our method to data from an outbreak of influenza in a British boarding school.

A Surrogate Function for One-Dimensional Phylogenetic Likelihoods

Brian C Claywell, Vu Dinh, Mathieu Fourment, Connor O McCoy, Frederick A Matsen IV

Molecular Biology and Evolution

September 26, 2017

ABSTRACT

Phylogenetics has seen a steady increase in data set size and substitution model complexity, which require increasing amounts of computational power to compute likelihoods. This motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this article, we develop an approximation to the 1D likelihood function as parametrized by a single branch length. Our method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts.

The RAPIDD Ebola forecasting challenge: Model description and synthetic data generation

Marco Ajelli, Qian Zhang, Kaiyuan Sun, Stefano Merler, Laura Fumanelli, Gerardo Chowell, Lone Simonsen, Cecile Viboud, Alessandro Vespignani

Epidemics

September 20, 2017

ABSTRACT

The Ebola forecasting challenge organized by the Research and Policy for Infectious Disease Dynamics (RAPIDD) program of the Fogarty International Center relies on synthetic disease datasets generated by numerical simulations of a highly detailed spatially-structured agent-based model. We discuss here the architecture and technical steps of the challenge, leading to datasets that mimic as much as possible the data collection, reporting, and communication process experienced in the 2014–2015 West African Ebola outbreak. We provide a detailed discussion of the model's definition, the epidemiological scenarios’ construction, synthetic patient database generation and the data communication platform used during the challenge. Finally we offer a number of considerations and takeaways concerning the extension and scalability of synthetic challenges to other infectious diseases.

Preliminary results of models to predict areas in the Americas with increased likelihood of Zika virus transmission in 2017.

Jason Asher, Christopher Barker, Grace Chen, Derek Cummings, Matteo Chinazzi, Shelby Daniel-Wayman, Marc Fischer, Neil Ferguson, Dean Follman, M. Elizabeth Halloran, Michael Johansson, Kiersten Kugeler, Jennifer Kwan, Justin Lessler, Ira M. Longini, Stefano Merler, Andrew Monaghan, Ana Pastore y Piontti, Alex Perkins, D. Rebecca Prevots, Robert Reiner, Luca Rossi, Isabel Rodriguez-Barraquer, Amir S. Siraj, Kaiyuan Sun, Alessandro Vespignani, Qian Zhang

bioRxiv

September 18, 2017

ABSTRACT

Numerous Zika virus vaccines are being developed. However, identifying sites to evaluate the efficacy of a Zika virus vaccine is challenging due to the general decrease in Zika virus activity. We compare results from three different modeling approaches to estimate areas that may have increased relative risk of Zika virus transmission during 2017. The analysis focused on eight priority countries (i.e., Brazil, Colombia, Costa Rica, Dominican Republic, Ecuador, Mexico, Panama, and Peru). The models projected low incidence rates during 2017 for all locations in the priority countries but identified several sub-national areas that may have increased relative risk of Zika virus transmission in 2017. Given the projected low incidence of disease, the total number of participants, number of study sites, or duration of study follow-up may need to be increased to meet the efficacy study endpoints.

The RAPIDD ebola forecasting challenge: Synthesis and lessons learnt

Cécile Viboud, Kaiyuan Sun, Robert Gaffey, Marco Ajelli, Laura Fumanelli, Stefano Merler, Qian Zhang, Gerardo Chowell, Lone Simonsen, Alessandro Vespignani, the RAPIDD Ebola Forecasting Challenge group

Epidemics

August 26, 2017

ABSTRACT

Infectious disease forecasting is gaining traction in the public health community; however, limited systematic comparisons of model performance exist. Here we present the results of a synthetic forecasting challenge inspired by the West African Ebola crisis in 2014–2015 and involving 16 international academic teams and US government agencies, and compare the predictive performance of 8 independent modeling approaches. Challenge participants were invited to predict 140 epidemiological targets across 5 different time points of 4 synthetic Ebola outbreaks, each involving different levels of interventions and “fog of war” in outbreak data made available for predictions. Prediction targets included 1–4 week-ahead case incidences, outbreak size, peak timing, and several natural history parameters. With respect to weekly case incidence targets, ensemble predictions based on a Bayesian average of the 8 participating models outperformed any individual model and did substantially better than a null auto-regressive model. There was no relationship between model complexity and prediction accuracy; however, the top performing models for short-term weekly incidence were reactive models with few parameters, fitted to a short and recent part of the outbreak. Individual model outputs and ensemble predictions improved with data accuracy and availability; by the second time point, just before the peak of the epidemic, estimates of final size were within 20% of the target. The 4th challenge scenario − mirroring an uncontrolled Ebola outbreak with substantial data reporting noise − was poorly predicted by all modeling teams. Overall, this synthetic forecasting challenge provided a deep understanding of model performance under controlled data and epidemiological conditions. We recommend such “peace time” forecasting challenges as key elements to improve coordination and inspire collaboration between modeling groups ahead of the next pandemic threat, and to assess model forecasting accuracy for a variety of known and hypothetical pathogens.

Dependency of Vaccine Efficacy on Pre-Exposure and Age: A Closer Look at a Tetravalent Dengue Vaccine

Yang Yang, Ya Meng, M. Elizabeth Halloran, Ira M. Longini, Jr.

Clinical Infectious Diseases

August 24, 2017

ABSTRACT

Background

A recombinant, live-attenuated, tetravalent dengue vaccine (CYD-TDV) was licensed for children of 9 years old or older in a few countries, but the dependence of vaccine efficacy on baseline immunity status and age groups has not been fully characterized.

Methods

Combining the two phase III trials, CYD14 and CYD15, we estimated the vaccine efficacy for each of the four serotypes of dengue virus (DENV), as well as all serotypes combined, simultaneously stratified by baseline immunity status and age group, while accounting for uncertainty in the baseline immunity status of subjects.

Results

Baseline seropositive subjects showed high efficacy for all serotypes, 70.2% (95% confidence interval [CI]: 57.4, 80.1) for dengue 1 (DENV-1), 67.9% (95% CI: 49.9, 82.0) for DENV-2, 77.5% (95% CI: 64.3, 90.2) for DENV-3, 89.9% (95% CI: 79.8, 99.9) for DENV-4, and 75.4% (95% CI: 68.3, 81.6) overall. In contrast, baseline seronegative subjects showed moderate efficacy against DENV-4, 51.2% [95% CI: 20.0, 72.8] but no significant efficacy against other serotypes. Among seropositive children, the overall efficacy tended to increase with age, 35.9% (95% CI: -7.6, 69.3) for children ≤5 years old, 65.6% (95% CI: 40.3, 84.2) for 6 – 8 years old, 73.4% (95% CI: 62.6, 82.1) for 9 – 11 years old, and 80.6% (95% CI: 72.9, 87.3) for 12 years or older.

Conclusions

The CYD-TDV vaccine was highly efficacious for all dengue serotypes among children older than 5 years who have acquired baseline immunity from previous exposure. Increasing vaccine efficacy with age was not fully explained by increasing prevalence of baseline immunity with age.

Birth/birth-death processes and their computable transition probabilities with biological applications

Lam Si Tung Ho, Jason Xu, Forrest W. Crawford, Vladimir N. Minin, Marc A. Suchard

Journal of Mathematical Biology

July 24, 2017

ABSTRACT

Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth/birth-death process, a tractable bivariate extension of the birth-death process, where rates are allowed to be nonlinear. We develop an efficient algorithm to calculate its transition probabilities using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution.

Monte Carlo profile confidence intervals for dynamic systems

E. L. Ionides, C. Breto, J. Park, R. A. Smith, A. A. King

Royal Society Interface

July 5, 2017

ABSTRACT

Monte Carlo methods to evaluate and maximize the likelihood function enable the construction of confidence intervals and hypothesis tests, facilitating scientific investigation using models for which the likelihood function is intractable. When Monte Carlo error can be made small, by sufficiently exhaustive computation, then the standard theory and practice of likelihood-based inference applies. As datasets become larger, and models more complex, situations arise where no reasonable amount of computation can render Monte Carlo error negligible. We develop profile likelihood methodology to provide frequentist inferences that take into account Monte Carlo uncertainty. We investigate the role of this methodology in facilitating inference for computationally challenging dynamic latent variable models. We present examples arising in the study of infectious disease transmission, demonstrating our methodology for inference on nonlinear dynamic models using genetic sequence data and panel time-series data. We also discuss applicability to nonlinear time-series and spatio-temporal data.

Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples

Joshua Quick, Nathan D Grubaugh, Steven T Pullan, Ingra M Claro, Andrew D Smith, Karthik Gangavarapu, Glenn Oliveira, Refugio Robles-Sikisaka, Thomas F Rogers, Nathan A Beutler, Dennis R Burton, Lia Laura Lewis-Ximenez, Jaqueline Goes de Jesus, Marta Giovanetti, Sarah C Hill, Allison Black, Trevor Bedford, Miles W Carroll, Marcio Nunes, Luiz Carlos Alcantara Jr., Ester C Sabino, Sally A Baylis, Nuno R Faria, Matthew Loose, Jared T Simpson, Oliver G Pybus, Kristian G Andersen, Nicholas J Loman

Nature Protocols

May 24, 2017

ABSTRACT

Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence-complete genomes, comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimized library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an Internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved in 1–2 d by starting with clinical samples and following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. The protocol can be used to sequence other viral genomes using the online Primal Scheme primer designer software. It is suitable for sequencing either RNA or DNA viruses in the field during outbreaks or as an inexpensive, convenient method for use in the lab.

Zika virus evolution and spread in the Americas

Hayden C. Metsky, Christian B. Matranga, Shirlee Wohl, Stephen F. Schaffner, Catherine A. Freije, Sarah M. Winnicki, Kendra West, James Qu, Mary Lynn Baniecki, Adrianne Gladden-Young, Aaron E. Lin, Christopher H. Tomkins-Tinch, Simon H. Ye, Daniel J. Park, Cynthia Y. Luo, Kayla G. Barnes, Rickey R. Shah, Bridget Chak, Giselle Barbosa-Lima, Edson Delatorre, Yasmine R. Vieira, Lauren M. Paul, Amanda L. Tan, Carolyn M. Barcellona, Mario C. Porcelli, Chalmers Vasquez, Andrew C. Cannons, Marshall R. Cone, Kelly N. Hogan, Edgar W. Kopp, Joshua J. Anzinger, Kimberly F. Garcia, Leda A. Parham, Rosa M. Gélvez Ramírez, Maria C. Miranda Montoya, Diana P. Rojas, Catherine M. Brown, Scott Hennigan, Brandon Sabina, Sarah Scotland, Karthik Gangavarapu, Nathan D. Grubaugh, Glenn Oliveira, Refugio Robles-Sikisaka, Andrew Rambaut, Lee Gehrke, Sandra Smole, M. Elizabeth Halloran, Luis Villar, Salim Mattar, Ivette Lorenzana, Jose Cerbino-Neto, Clarissa Valim, Wim Degrave, Patricia T. Bozza, Andreas Gnirke, Kristian G. Andersen, Sharon Isern, Scott F. Michael, Fernando A. Bozza, Thiago M. L. Souza, Irene Bosch, Nathan L. Yozwiak, Bronwyn L. MacInnis, Pardis C. Sabeti

Nature

May 24, 2017

ABSTRACT

Although the recent Zika virus (ZIKV) epidemic in the Americas and its link to birth defects have attracted a great deal of attention much remains unknown about ZIKV disease epidemiology and ZIKV evolution, in part owing to a lack of genomic data. Here we address this gap in knowledge by using multiple sequencing approaches to generate 110 ZIKV genomes from clinical and mosquito samples from 10 countries and territories, greatly expanding the observed viral genetic diversity from this outbreak. We analysed the timing and patterns of introductions into distinct geographic regions; our phylogenetic evidence suggests rapid expansion of the outbreak in Brazil and multiple introductions of outbreak strains into Puerto Rico, Honduras, Colombia, other Caribbean islands, and the continental United States. We find that ZIKV circulated undetected in multiple regions for many months before the first locally transmitted cases were confirmed, highlighting the importance of surveillance of viral infections. We identify mutations with possible functional implications for ZIKV biology and pathogenesis, as well as those that might be relevant to the effectiveness of diagnostic tests.

Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus

Ashley Sobel Leonard, Daniel Weissman, Benjamin Greenbaum, Elodie Ghedin, Katia Koelle

Journal of Virology

May 3, 2017

ABSTRACT

The bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from donor to recipient host. Accurate quantification of the bottleneck size is particularly important for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks reduce the amount of transferred viral genetic diversity and, thus, may slow the rate of viral adaptation. Previous studies have estimated bottleneck sizes governing viral transmission using statistical analyses of variants identified in pathogen sequencing data. These analyses, however, did not account for variant calling thresholds and stochastic viral replication dynamics within recipient hosts. Because these factors can skew bottleneck size estimates, we introduce a new method for inferring bottleneck sizes that accounts for these factors. Through the use of a simulated dataset, we first show that our method, based on beta-binomial sampling, accurately recovers transmission bottleneck sizes, whereas other methods fail to do so. We then apply our method to a dataset of influenza A infections for which viral deep-sequencing data from transmission pairs are available. We find that the IAV transmission bottleneck size estimates in this study are highly variable across transmission pairs, while the mean bottleneck size of 196 virions is consistent with the previous estimate for this dataset. Further, regression analysis shows a positive association between estimated bottleneck size and donor infection severity, as measured by temperature. These results support findings from experimental transmission studies showing that bottleneck sizes across transmission events can be variable and in part influenced by epidemiological factors.

Spread of Zika virus in the Americas

Qian Zhang, Kaiyuan Sun, Matteo Chinazzi, Ana Pastore y Piontti, Natalie E. Dean, Diana Patricia Rojas, Stefano Merler, Dina Mistry, Piero Poletti, Luca Rossi, Margaret Bray, M. Elizabeth Halloran, Ira M. Longini, Jr., Alessandro Vespignani

Proceedings of the National Academy of Sciences

April 25, 2017

ABSTRACT

We use a data-driven global stochastic epidemic model to analyze the spread of the Zika virus (ZIKV) in the Americas. The model has high spatial and temporal resolution and integrates real-world demographic, human mobility, socioeconomic, temperature, and vector density data. We estimate that the first introduction of ZIKV to Brazil likely occurred between August 2013 and April 2014 (90% credible interval). We provide simulated epidemic profiles of incident ZIKV infections for several countries in the Americas through February 2017. The ZIKV epidemic is characterized by slow growth and high spatial and seasonal heterogeneity, attributable to the dynamics of the mosquito vector and to the characteristics and mobility of the human populations. We project the expected timing and number of pregnancies infected with ZIKV during the first trimester and provide estimates of microcephaly cases assuming different levels of risk as reported in empirical retrospective studies. Our approach represents a modeling effort aimed at understanding the potential magnitude and timing of the ZIKV epidemic and it can be potentially used as a template for the analysis of future mosquito-borne epidemics.

Controlling cholera in the Ouest Department of Haiti using oral vaccines

Alexander Kirpich, Thomas A. Weppelmann, Yang Yang, John Glenn Morris Jr., Ira M. Longini Jr.

PLOS Neglected Tropical Diseases

April 14, 2017

ABSTRACT

Following the 2010 cholera outbreak in Haiti, a plan was initiated to provide massive improvements to the sanitation and drinking water infrastructure in order to eliminate cholera from the island of Hispaniola by 2023. Six years and a half billion dollars later, there is little evidence that any substantial improvements have been implemented; with increasing evidence that cholera has become endemic. Thus, it is time to explore strategies to control cholera in Haiti using oral cholera vaccines (OCVs). The potential effects of mass administration of OCVs on cholera transmission were assessed using dynamic compartment models fit to cholera incidence data from the Ouest Department of Haiti. The results indicated that interventions using an OCV that was 60% effective could have eliminated cholera transmission by August 2012 if started five weeks after the initial outbreak. A range of analyses on the ability of OCV interventions started January 1, 2017 to eliminate cholera transmission by 2023 were performed by considering different combinations of vaccine efficacies, vaccine administration rates, and durations of protective immunity. With an average of 50 weeks for the waiting time to vaccination and an average duration of three years for the vaccine-induced immunity, all campaigns that used an OCV with a vaccine efficacy of at least 60% successfully eliminated cholera transmission by 2023. The results of this study suggest that even with a relatively wide range of vaccine efficacies, administration rates, and durations of protective immunity, future epidemics could be controlled at a relatively low cost using mass administration of OCVs in Haiti.

Maternal pertussis immunisation: clinical gains and epidemiological legacy

AI Bento, AA King, P Rohani

Eurosurveillance

April 13, 2017

ABSTRACT

The increase in whooping cough (pertussis) incidence in many countries with high routine vaccination coverage is alarming, with incidence in the US reaching almost 50,000 reported cases per year, reflecting incidence levels not seen since the 1950s. While the potential explanations for this resurgence remain debated, we face an urgent need to protect newborns, especially during the time window between birth and the first routine vaccination dose. Maternal immunisation has been proposed as an effective strategy for protecting neonates, who are at higher risk of severe pertussis disease and mortality. However, if maternally derived antibodies adversely affect the immunogenicity of the routine schedule, through blunting effects, we may observe a gradual degradation of herd immunity. ‘Wasted’ vaccines would result in an accumulation of susceptible children in the population, specifically leading to an overall increase in incidence in older age groups. In this Perspective, we discuss potential long-term epidemiological effects of maternal immunisation, as determined by possible immune interference outcomes.

Infectious Disease Dynamics Inferred from Genetic Data via Sequential Monte Carlo

R. A. Smith, E. L. Ionides, A. A. King

Molecular Biology and Evolution

April 8, 2017

ABSTRACT

Genetic sequences from pathogens can provide information about infectious disease dynamics that may supplement or replace information from other epidemiological observations. Most currently available methods first estimate phylogenetic trees from sequence data, then estimate a transmission model conditional on these phylogenies. Outside limited classes of models, existing methods are unable to enforce logical consistency between the model of transmission and that underlying the phylogenetic reconstruction. Such conflicts in assumptions can lead to bias in the resulting inferences. Here, we develop a general, statistically efficient, plug-and-play method to jointly estimate both disease transmission and phylogeny using genetic data and, if desired, other epidemiological observations. This method explicitly connects the model of transmission and the model of phylogeny so as to avoid the aforementioned inconsistency. We demonstrate the feasibility of our approach through simulation and apply it to estimate stage-specific infectiousness in a subepidemic of HIV in Detroit, Michigan. In a supplement, we prove that our approach is a valid sequential Monte Carlo algorithm. While we focus on how these methods may be applied to population-level models of infectious disease, their scope is more general. These methods may be applied in other biological systems where one seeks to infer population dynamics from genetic sequences, and they may also find application for evolutionary models with phenotypic rather than genotypic data.

Deep mutational scanning identifies sites in influenza nucleoprotein that affect viral inhibition by MxA

Orr Ashenberg, Jai Padmakumar, Michael B. Doud, Jesse D. Bloom 

PLOS Pathogens

March 27, 2017

ABSTRACT

The innate-immune restriction factor MxA inhibits influenza replication by targeting the viral nucleoprotein (NP). Human influenza virus is more resistant than avian influenza virus to inhibition by human MxA, and prior work has compared human and avian viral strains to identify amino-acid differences in NP that affect sensitivity to MxA. However, this strategy is limited to identifying sites in NP where mutations that affect MxA sensitivity have fixed during the small number of documented zoonotic transmissions of influenza to humans. Here we use an unbiased deep mutational scanning approach to quantify how all single amino-acid mutations to NP affect MxA sensitivity in the context of replication-competent virus. We both identify new sites in NP where mutations affect MxA resistance and re-identify mutations known to have increased MxA resistance during historical adaptations of influenza to humans. Most of the sites where mutations have the greatest effect are almost completely conserved across all influenza A viruses, and the amino acids at these sites confer relatively high resistance to MxA. These sites cluster in regions of NP that appear to be important for its recognition by MxA. Overall, our work systematically identifies the sites in influenza nucleoprotein where mutations affect sensitivity to MxA. We also demonstrate a powerful new strategy for identifying regions of viral proteins that affect inhibition by host factors.

Complete mapping of viral escape from neutralizing antibodies

Michael B. Doud, Scott E. Hensley, Jesse D. Bloom

PLOS Pathogens

March 13, 2017

abstract

Identifying viral mutations that confer escape from antibodies is crucial for understanding the interplay between immunity and viral evolution. We describe a high-throughput approach to quantify the selection that monoclonal antibodies exert on all single amino-acid mutations to a viral protein. This approach, mutational antigenic profiling, involves creating all replication-competent protein variants of a virus, selecting with antibody, and using deep sequencing to identify enriched mutations. We use mutational antigenic profiling to comprehensively identify mutations that enable influenza virus to escape four monoclonal antibodies targeting hemagglutinin, and validate key findings with neutralization assays. We find remarkable mutation-level idiosyncrasy in antibody escape: for instance, at a single residue targeted by two antibodies, some mutations escape both antibodies while other mutations escape only one or the other. Because mutational antigenic profiling rapidly maps all mutations selected by an antibody, it is useful for elucidating immune specificities and interpreting the antigenic consequences of viral genetic variation.