VETERINARY RESEARCH. Virginie Dupuy 1,2*, Axel Verdier 1,2, François Thiaucourt 1,2 and Lucía Manso-Silván 1,2 - PDF

Description
Dupuy et al. Veterinary Research (2015) 46:74 DOI /s x VETERINARY RESEARCH RESEARCH ARTICLE A large-scale genomic approach affords unprecedented resolution for the molecular epidemiology

Please download to get full document.

View again

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Social Media

Publish on:

Views: 27 | Pages: 11

Extension: PDF | Download: 0

Share
Transcript
Dupuy et al. Veterinary Research (2015) 46:74 DOI /s x VETERINARY RESEARCH RESEARCH ARTICLE A large-scale genomic approach affords unprecedented resolution for the molecular epidemiology and evolutionary history of contagious caprine pleuropneumonia Virginie Dupuy 1,2*, Axel Verdier 1,2, François Thiaucourt 1,2 and Lucía Manso-Silván 1,2 Open Access Abstract Contagious caprine pleuropneumonia (CCPP), caused by Mycoplasma capricolum subsp. capripneumoniae (Mccp), is a devastating disease of domestic goats and of some wild ungulate species. The disease is currently spreading in Africa and Asia and poses a serious threat to disease-free areas. A comprehensive view of the evolutionary history and dynamics of Mccp is essential to understand the epidemiology of CCPP. Yet, analysing the diversity of genetically monomorphic pathogens, such as Mccp, is complicated due to their low variability. In this study, the molecular epidemiology and evolution of CCPP was investigated using a large-scale genomic approach based on next-generation sequencing technologies, applied to a sample of strains representing the global distribution of this disease. A highly discriminatory multigene typing system was developed, allowing the differentiation of 24 haplotypes among 25 Mccp strains distributed in six genotyping groups, which showed some correlation with geographic origin. A Bayesian approach was used to infer the first robust phylogeny of the species and to date the principal events of its evolutionary history. The emergence of Mccp was estimated only at about 270 years ago, which explains the low genetic diversity of this species despite its high mutation rate, evaluated at substitutions per site per year. Finally, plausible scenarios were proposed to elucidate the evolution and dynamics of CCPP in Asia and Africa, though limited by the paucity of Mccp strains, particularly in Asia. This study shows how combining large-scale genomic data with spatial and temporal data makes it possible to obtain a comprehensive view of the epidemiology of CCPP, a precondition for the development of improved disease surveillance and control measures. Introduction Contagious caprine pleuropneumonia (CCPP) is a severe respiratory disease affecting goats and some wild ruminant species. The disease, listed by the World Organisation for Animal Health (OIE) [1] has a great economic impact on livestock production in fragile rural economies and poses a serious threat to disease-free areas. CCPP is caused by Mycoplasma capricolum subsp. capripneumoniae (Mccp), a member of the Mycoplasma mycoides cluster [2]. This cluster comprises five mycoplasmas which are pathogenic for ruminants, including Mycoplasma capricolum subsp. capricolum (Mcc), the closest relative of Mccp, and Mycoplasma mycoides * Correspondence: 1 CIRAD, UMR CMAEE, F Montpellier, France 2 INRA, UMR1309 CMAEE, F Montpellier, France subsp. mycoides (Mmm), the agent of contagious bovine pleuropneumonia (CBPP). Since it was first isolated in 1976 [3], Mccp has only been isolated in 17 countries, mainly because of its fastidiousness in culture. However, clinical descriptions have been published in nearly 40 countries in Africa and Asia, suggesting a much wider distribution [4]. The disease is present in the Arabian Peninsula, North, Central and East Africa and Asia, but its boundaries are still uncertain, particularly in western and southern Africa and in Asia. In the last decade, an increasing number of outbreaks have been reported in both domestic and wild ruminants [5]. New detections were often the result of improved diagnosis, confirming the presence of the disease in suspected regions [6 8], but in certain cases, they indicated that the disease had spread to new territories [9,10] Dupuy et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http:// creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Dupuy et al. Veterinary Research (2015) 46:74 Page 2 of 11 A comprehensive view of the evolutionary history and the dynamics of Mccp strains is essential to understand the epidemiology of CCPP. Molecular epidemiology investigations require robust genotyping tools with sufficient resolution, while evolutionary analyses are also constrained by the need for a reliable molecular clock to infer evolutionary timescales. Various molecular methods have been developed for the analysis of Mccp strains. The first study on the molecular evolution of Mccp was based on the 16S rrna gene [11], which provided a basis for the phylogeny and systematics of bacteria since the evolutionary studies of Woese [12]. Mccp strains showed a surprisingly high degree of polymorphism between their two 16S rrna gene copies [13], which allowed the use of this housekeeping gene to combine epidemiological and evolutionary analyses. Still, the study was limited by the low discriminatory power of this molecular marker. Despite the heterogeneity observed in the sequence of its two rrna operons, the Mccp genome has been shown to be rather monomorphic. Thus, a phylogenic study of the M. mycoides cluster based on five partial housekeeping gene sequences showed a very low distance between Mccp strains [14]. The low diversity of such a monomorphic bacterium precluded the use of standard multilocus sequence typing (MLST), which relies on housekeeping genes [15]. Approaches based on alternative sequences, independently of their coding capacity, were thus preferred. Analysis of the H2 locus allowed the discrimination of four groups showing a good correlation with geographic origin [16]. This system was improved by the addition of seven loci, increasing the discriminatory power and leading to the description of five groups [4]. Although this multilocus sequence analysis (MLSA) scheme targets polymorphic loci by focusing on non-housekeeping genes (e.g., intergene regions, pseudogenes) it interrogates less than 1% of the genome and therefore has limited discriminatory power for epidemiological investigations. Moreover, it is not reliable for phylogenetic analyses because of differences in the molecular evolutionary clock among target sequences. These limitations can now be overcome by the recent development of high-throughput methods [17]. Even though single nucleotide polymorphism (SNP) frequencies are low in monomorphic pathogens, their numbers can be dramatically increased by enlarging the scale of the analysis. Using high-throughput data can enhance genetic investigations by providing a way to disclose genome-wide variations. Combining large-scale genomic data with spatial and temporal data already enabled a comprehensive view of the molecular epidemiology and evolution of bacterial pathogens like Salmonella typhi [18], Yersinia pestis [19], Mmm [20] and Mycobacterium tuberculosis [21]. Following the increasing use and affordability of DNA sequencing with next-generation sequencing (NGS) technologies, complete annotated genomes of several Mccp strains have become available [22 24], thereby making large-scale genomic investigations possible. In this study, we investigated the molecular epidemiology and evolution of CCPP using a large-scale genomic approach based on NGS data, on a sample of strains representing the global distribution of this disease. Our main objective was to develop a discriminatory genotyping method to investigate the genetic diversity and population structure of Mccp. A robust phylogeny was also inferred from a large phylogenomic data set and divergence time of Mccp strains was estimated to reconstruct the evolutionary history of CCPP. Materials and methods Sampling The 25 strains analysed in this study are summarised in Table 1. This collection includes 15 strains isolated in Central (4)/East (11) Africa, four strains from the Arabian Peninsula, four strains from the Mediterranean Basin and two strains from Central/East Asia in attempts to encompass the known global diversity of this pathogen. They are all epidemiologically unrelated isolates and most of them were previously analysed by traditional genotyping systems [4,11,16]. Sample preparation and sequencing Twenty-one strains for which the genome sequence was not available were cultured in modified Hayflick s medium [14] at 37 C, 5% CO 2. Culture purity was ensured by phenotypic control on solid medium and specific Mccp QPCR amplification [25]. DNA was extracted using a standard phenol/chloroform method [26]. DNA purity, quality and quantity were checked using Nano- Drop ND-1000 Spectrophotometer (Thermo Fisher Scientific, MA, USA), gel electrophoresis and Qubit 2.0 fluorometer (Invitrogen, USA), respectively. Then, 21 tagged standard genomic libraries were constructed and pooled to be sequenced in 100 bp single reads on an Illumina HiSeq 2000 (GATC, Constanz, Germany). Gene selection The choice of Mccp genes was based on the choice previously made for the analysis of the evolutionary history of Mmm [20], consisting in 62 rigorously selected genes. Pseudogenes and duplicated genes had been excluded from this set, as well as genes coding for membrane proteins or restriction enzymes, and those known to be involved in horizontal transfer. The annotated, circularised genome sequence of strain 9231-Abomsa [23] (Table 1) was used as reference. Four genes did not exist in the Mccp genome (guac, gntr, suk, bgl), while dnac was duplicated and was therefore excluded. As a result, a Dupuy et al. Veterinary Research (2015) 46:74 Page 3 of 11 Table 1 List of Mycoplasma capricolum subsp. capripneumoniae strains analysed and corresponding MLSA types Strain Year Country Location World Region Supplier Accession number MLSA type Tigray 1995 Ethiopia Tigray East Africa NVI-E P Ethiopia a Tigray East Africa - GenbaNK:JMJI Qatar Doha, Al Wabra Arabian Peninsula AWWP M79/ Uganda East East Africa NVI-S ILRI Kenya NK East Africa - GenbaNK:LN Chad Karal, Dandi Central Africa LRVZF Chad N Djamena Central Africa LRVZF Sudan Darfur, Nyala Central Africa VRA Niger Goure Central Africa LABOCEL F Turkey Thrace Mediterranean Basin VLA Tajikistan Farkhor Central Asia CIRAD C550/ UAE Dubai Arabian Peninsula CVRL M China Gansu East Asia - GenbaNK:CM Gabes 1980 Tunisia Gabes Mediterranean Basin CIRAD P 1990 Oman NK Arabian Peninsula MAF-O Turkey Elazig Mediterranean Basin FU / Turkey a NK Mediterranean Basin MRI Errer 1997 Ethiopia Errer East Africa NVI-E Yatta B 1997 Kenya Yatta East Africa NVI-S AMRC-C Sudan NK East Africa AU F Kenya NK East Africa Type strain C Oman NK Arabian Peninsula AVS C Ethiopia NK East Africa NVI-E Abomsa 1982 Ethiopia Godjam East Africa - GenbaNK:LM CLP Ethiopia NK East Africa NVI-E a Strain P1 was isolated in Eritrea, 7/2 was isolated in Oman [56] but the animals came from the location indicated above Abbreviations: AU Aarhus University, Denmark, AVS Agriculture and Veterinary Services, Oman, AWWP Al Wabra Wildlife Preservation, Qatar, CIRAD, France, CVRL Central Veterinary Research Laboratory, United Arab Emirates, FU Firat University, Turkey, LABOCEL Laboratoire Central de l Elevage de Niamey, Niger, LRVZF Laboratoire de Recherches Vétérinaires et Zootechniques de Farcha, Chad, MAF-O Ministry of Agriculture and Fisheries, Oman, MRI Moredun Research Institute, UK, NVI-E National Veterinary Institute, Ethiopia, NVI-S National Veterinary Institute, Sweden, VRA Veterinary Research Administration, Sudan, NK not known subset of 57 genes, comprising 47 coding sequences and 10 pseudogenes of the core genome, were used (Additional file 1). The 57 genes are evenly distributed along the chromosome of strain 9231-Abomsa (Additional file 2). Data set collection The sequences of the 57 selected genes from strain Abomsa, including flanking regions (up to 350 bp), were concatenated and annotated, resulting in an enlarged sequence of bp (Additional file 3). This sequence was used as reference to automatically retrieve the entire gene set from the whole genome sequence data of 21 Mccp strains by mapping raw data using Seqman NGen (2.0) software (DNASTAR, Madison WI, USA). First, this procedure allowed the correct mapping of reads on the entire sequence of each corresponding gene, thanks to the presence of flanking regions. Second, it allowed the visual verification of sequencing depth and the identification of any incongruities on all coding sequences in the Seqman genome browser. On average, a read depth of 500X was obtained. SNPs and indels were called when more than 85% of the reads supported the change. An in-house software was developed at CIRAD Montpellier to extract and concatenate the gene sequences corresponding to each strain from the enlarged reference sequence. Sequence searches are based on the Needleman- Wunsch algorithm, using tags to frame sequences of interest. These tags are short sequences homologous to the extremities of each target gene. The algorithm described here is implemented in C++ as a stand-alone program, SelectRegion, andthesourceisfreelyavailable from the authors on request. The process can also be automated using graphical-interfaces within the web-based Galaxy [27]. Dupuy et al. Veterinary Research (2015) 46:74 Page 4 of 11 Otherwise, sequence data were retrieved from the published genome sequences of strains M1601, and ILRI181 (Table 1). Genotyping analyses For diversity analyses, the 57 genes (comprising both coding sequences and pseudogenes) corresponding to each of the 25 strains analysed were selected and concatenated as described above. Sequences were aligned using Clustal W with default parameters (Additional file 4). Haplotypes were estimated taking into account all sites, including gaps, removing invariant sites, with DnaSP [28]. A median-joining network was reconstructed using NETWORK V4.6 [29]. The discriminatory power of the genotyping system was calculated using Simpson s index of diversity [30], which expresses the probability of two unrelated strains sampled from the test population being placed into different genotyping groups. A total of 95% confidence intervals (CI) were determined as previously described [31]. Phylogeny and molecular dating analyses From the 57 genes initially selected, a subset of 47 coding sequences was retained for evolutionary analyses. The 47 genes of 25 Mccp strains were extracted from each corresponding enlarged reference sequence using the internal software SelectRegion. The type strain of the subspecies Mcc (California Kid T ), which is the closest relative of Mccp, was chosen as outgroup and corresponding sequence data were obtained from the published genome sequence [Genbank:CP000123]. Sequences were aligned using Clustal W with default parameters (Additional file 5). The Tamura-Nei, 93 (TN93) model was selected as the best fitting model by Modeltest V3.7 [32]. A maximum-likelihood phylogenetic tree was inferred using PhyML V3.0 [33] on the Galaxy web platform. A bootstrap resampling procedure with 1000 replicates was used to assess the reliability of key tree nodes. To infer a temporal framework from dated sequences, a Bayesian approach, implemented in the flexible Bayesian phylogenetic analysis package BEAST V1.6.2 [34], was used. This allowed the simultaneous estimation of the tree structure, the time of the most recent common ancestor (MRCA), the divergence time of nodes, and the mutation rate. The TN93 evolution model and partition codon (1 + 2, 3) were selected. Two molecular clocks (strict-clock and uncorrelated lognormal-clock) and various demographic models (constant, expansion, exponential and extended Bayesian skyline plot) were tested. Convergence was evaluated using Tracer V1.6. To choose the best fitting clock and demographic model, the Bayes Factors [35,36] were calculated from the marginal likelihoods on the Akaike s information criterion through MCMC (AICM) using Tracer V1.6. The Maximum Clade Credibility tree was constructed using TreeAnnotator V1.6 and visualised using FigTree V1.3. Results Genetic diversity and molecular genotyping of Mccp strains A discriminatory genotyping system based on a largescale genomic approach was developed to characterise the diversity of Mccp by analysing 25 strains representing the known global distribution of this species (Table 1). A set of 57 genes, comprising 47 coding sequences and 10 pseudogenes (Additional file 1), was analysed, covering base pairs (7.7% of the genome). Fifty-two of these genes were polymorphic as a result of 239 polymorphic positions consisting in 212 SNPs and 17 indels (Additional file 6). Nine events corresponded to indels of either one/two bases or one/two codons and eight events to variations in homopolymer size. The average frequency of polymorphic sites in the gene set was of one event per 577 pb for coding sequences, and one event per 210 pb for pseudogenes. The 239 polymorphic positions made it possible to define 24 haplotypes among the 25 strains analysed, resulting in a Simpson s diversity index of ( ). Polymorphic positions consisted in 115 informative sites and 124 sites specific to single haplotypes. A single network connecting all strains was drawn using NETWORK (Figure 1). The 24 haplotypes were structured in six genotyping groups, named group A to F. Group A was quite homogeneous and included four extremely similar strains from East Africa and a more distant strain from the Arabian Peninsula. Within this group, M79/93 and P1 were the only strains that could not be distinguished. Four strains from Central Africa, also showing little diversity, constituted group B. Group C included three rather variable strains from the Mediterranean Basin, the Arabian Peninsula and Central Asia. Chinese strain M1601, positioned at the extremity of a long isolated branch, was designated group D, and was the only representative of this group. Group E showed little variability and comprised three strains originating from the Mediterranean Basin and a strain from the Arabian Peninsula. Finally, Group F, which was the most populated and variable group, included seven strains from East Africa and one strain from the Arabian Peninsula. Three groups (A, B and F) presented at least four isolates originating from the same geographic region, whereas too few strains from groups D and E were available to confirm a correlation between genotyping group and geographic origin. In group C no clear correlation could be found with geographic origin, also arguably due to insufficient sampling. In the Arabian Peninsula many different groups (all except B and D) were found and in Dupuy et al. Veterinary Research (2015) 46:74 Page 5 of 11 Group A Group B Tigray M79/ P Group C ILRI181 C550/ F Group D East Africa Central Africa Central / East Asia Arabian Peninsula Mediterranean Basin 1 2 strains Group E 7/2 Gabes P M1601 Group F Yatta B AMRC-C758 F C CLP Errer C Abomsa Figure 1 Haplotype network of Mycoplasma capricolum subsp. capripneumoniae. The median-joining network was reconstructed using NETWORK based on the alignment of 57 concatenated genes from 25 Mccp strains. The tree shows 24 haplotypes (nodes) resulting from 239 polymorphic positions including gaps. Different colours indicate the geographic origin of haplotypes. Turkey two groups (C and E) were present. Also, two distinct groups (A and F) were found in East Africa. Evolutionary history of Mccp A robust phylogeny of Mccp was reconstructed based on high-throughput genomic data of 25 selected Mccp strains (Table 1) and an Mcc outgroup. Among the 57 genes previously analysed for genotyping, a set of 47 coding sequences was retained, while pseudogenes were excluded (Additional file 1) to minimise molecular clock variation and homoplasy. After alignment of the
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks