In silico Identification of Serovar-Specific Genes for Salmonella Serotyping
Xiaomei Zhang, Michael Payne and Ruiting Lan
Front. Microbiol., 24 April 2019 | https://doi.org/10.3389/fmicb.2019.00835
COMMENT: With the development of whole-genome sequencing technology traditional serotyping is being replaced by molecular serotyping. In the case of Salmonella, sequences of the rfb gene cluster for O antigen, gene fliC and gene fljB for H antigens, and genes targeted by MLST can be used for serovar identification. In addition, previous studies have identified serovar-specific genes or DNA fragments for serotyping through whole-genome sequencing based genomic comparison. However, the serovar-specific genes or DNA fragments identified only distinguish a small number of serovars and a prior knowledge of the relationship of serovar to sequence type is required. In this study, 2258 Salmonella accessory genomes were compared to identify 414 candidate serovar-specific or lineage-specific gene markers for 106 serovars which includes 24 polyphyletic serovars and the paraphyletic serovar Enteritidis. A combination of several lineage-specific gene markers can be used for the clear identification of the polyphyletic serovars and the paraphyletic serovar. A subset of these gene markers were validated by independent genomes and were able to assign serovars correctly in 95.3% of cases.
In this study, we aimed to use the extensive publicly available collection of Salmonella genomes to identify serovar-specific gene markers for the most frequent Salmonella serovars. We show the potential of these serovar-specific gene markers as markers for molecular serotyping either in silico typing of genomic data or for development of laboratory diagnostic methods.
The accessory genes from 2258 genomes representing 107 serovars were screened to identify potential serovar-specific gene markers. This initial screening identified 354 potential serovar-specific gene markers within 101 serovars. (...) Six serovars namely, Bareilly, Bovismorbificans, Thompson, Reading, Typhi, and Saintpaul had no candidate serovar-specific gene markers that were present in all lineages of a given serovar.
Forty serovars contained 194 serovar-specific gene markers with 100% specificity and sensitivity
Interestingly four polyphyletic serovars, Bredeney, Kottbus, Livingstone, and Virchow, each had one candidate serovar-specific gene marker which was present in all isolates of that serovar. The Bredeney serovar-specific gene was predicted to encode a translocase involved in O antigen conversion and could have been gained in parallel.
A minimum of 131 gene markers allows identification of the serovars with error rates from 0 to 8.33%.
We tested an additional 1089 genomes belonging to 106 non-typhoidal Salmonella serovars to evaluate the ability of the 131 specific gene markers to correctly assign serovars to isolates. Using the serovar-specific gene markers, 1038 of the 1089 isolates (95.3%) were successfully assigned
The top 20 common serovars causing human infection found in each continent were collapsed into a combined list of 46 serovars. When only these serovars were considered, 18 out of 46 could be uniquely identified by one of the serovar-specific gene markers.
...different combinations of genes may be used to specifically limit false positive results from serovars present in that region. (…) For example, a panel of 15 genes could be used for typing the 10 most frequent serovars in Australia.
Our serovar-specific gene marker based method does not require the accurate examination of O antigen gene clusters or sequence variation of the H antigen genes which can be problematic. Our method also alleviates the need for the entire gene or genome sequence be assembled which is necessary in MLST or cgMLST based methods. Therefore, this approach may be useful for cases where very little sequence is available such as in metagenomics or culture free typing as well as providing a third alternative to confirm other analyses.
In this study we identified candidate serovar-specific gene markers and candidate lineage-specific gene markers for 106 serovars by characterizing the accessory genomes of a representative selection of 2258 strains as potential markers for in silico serotyping. We account for polyphyletic and paraphyletic serovars to provide a new method, using the presence or absence of these gene markers, to predict the serovar of an isolate from genomic data. The gene markers identified here may also be used to develop serotyping assays in the absence of an isolated strain which will be useful as diagnosis moves to culture independent and metagenomic methods.
NOTE: MLST, multi-locus sequence typing; cgMLST, core genome multilocus sequence typing.