Synonymous and nonsynonymous distances help untangle convergent evolution and recombination

Peter B. Chi, Sujay Chattopadhyay, Philippe Lemey, Evgeni V. Sokurenko, Vladimir N. Minin

Statistical Applications in Genetics and Molecular Biology

June 10, 2015

Abstract

When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the comparable recombination detection method when the phylogenetic incongruence signal is due to convergent evolution. We apply our method to three empirical examples, where we analyze: (1) sequences from a transmission network of the human immunodeficiency virus, (2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and (3) hepatitis C virus sequences sampled longitudinally from one patient.