Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle.
Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, Beghini F, Manghi P, Tett A, Ghensi P, Collado MC, Rice BL, DuLong C, Morgan XC, Golden CD, Quince C, Huttenhower C, Segata N
Cell. Jan 2019
COMMENT: This article describes a lot of work done analyzing 9,428 human metagenomic samples and trying to reconstruct the microbial genomes that there were in each sample using de novo assembly and binning procedures. The results of this work show the high percentage of unknown genomes that there is in the human microbiome. It will be needed a lot of lab and bioinformatics work to refine these new detected genomes to reach the level of precision and completeness required to be used as reference genomes but, in any case, to know that they are there is very useful.
The genomes were assembled, per sample, using metaSPAdes and binned using MetaBAT2:
We leveraged 9,428 metagenomes to reconstruct 154,723 microbial genomes (45% of high quality) spanning body sites, ages, countries, and lifestyles. We recapitulated 4,930 species-level genome bins (SGBs), 77% without genomes in public repositories (unknown SGBs [uSGBs]). uSGBs are prevalent (in 93% of well-assembled samples), expand underrepresented phyla, and are enriched in non-Westernized populations (40% of the total SGBs)
To organize the 154,723 genomes into species-level genome bins (SGBs), we employed an all-versus-all genetic distance quantification followed by clustering and identification of genome bins spanning a 5% genetic diversity …
We identified 3,796 SGBs (i.e., 77.0% of the total) covering unexplored microbial diversity as they represent species without any publicly available genomes from isolate sequencing or previous metagenomic assemblies
The functional annotation was done mainly based on similarity to UniRef90 and UniRef50 protein entries:
Functional annotation of all the reconstructed genomes assigned a UniRef90 (The UniProt Consortium, 2017) label to 230 M genes and a UniRef50 to 268 M genes (72.7% and 84.8% of the total of 316 M genes, respectively).
The percentage of functionally annotated genes varied depending on the availability of reference proteomes of closely related species:
… the rate of annotation varied greatly in SGBs (e.g., >90% genes annotated for well studied species such as Escherichia coli or Bacteroides fragilis versus 22% for ID 15286, which is the largest SGB without reference genomes)
Some distinctive functional annotations were detected in each body site:
Each of the body sites considered had a clear distinctive set of annotations with the adult fecal microbiome enriched for 101,056 gene families representative of anaerobe-specific functions such as formate oxidation and methanogenesis and a strong representation of biofilm formation functions in the oral cavity and on the skin.
A set of unkown genomes reconstructed in the oral samples belonged to Saccharibacteria (previously named TM7):
For example, the candidate phylum Saccharibacteria (previously named TM7) contains members of the oral microbiome that are particularly difficult to cultivate. For this clade, we reconstructed 387 genomes from 108 SGBs, some representing members observed only using 16S rRNA gene sequencing.
The 107 Saccharibacteria uSGBs thus suggest a substantially undersampled diversity of human associated members of this phylum. Its importance is also confirmed by the occurrence of at least one genome from these 108 SGBs in 33% of oral cavity samples, where they can reach average abundances above 3% (Table S4) and maximum abundances exceeding 10%.
A set of new reconstructed genomes belonged to archaea:
Among uSGBs, we also reconstructed genomes assigned to Thermoplasmatales (ID 376, 378, 380, 381), Candidatus Methanomethylophilus (ID 372, 382, 384), Methanomassiliicoccus (ID 362, 364), and Methanosphaera (ID 697), all very distant from their nearest reference genomes (average 22.4%, SD 4.0% nucleotide distance). This expanded human-associated archaeal diversity suggests the presence of several as-yet-uncharacterized archaea of potentially unique functional relevance in this ecosystem
Authors concluded that this study would allow better exploitation of metagenomic technologies:
We thus identify thousands of microbial genomes from yet-to-be-named species, expand the pangenomes of human-associated microbes, and allow better exploitation of metagenomic technologies.
NOTES ABOUT METHODS:
Description of datasets and samples analyzed:
Software and Algorithms used in this work:
- metaSPAdes (version 3.10.1) Nurk et al., 2017 https://github.com/ablab/spades/releases
- MEGAHIT (version 1.1.1) Li et al., 2015 https://github.com/voutcn/megahit
- MetaBAT2 (version 2.12.1) Kang et al., 2015 https://bitbucket.org/berkeleylab/metabat
- CheckM (version 1.0.7) Parks et al., 2015 https://github.com/Ecogenomics/CheckM
- CMSeq (version 1.0.0) This study https://bitbucket.org/CibioCM/cmseq
- Mash (version 2.0) Ondov et al., 2016 https://github.com/marbl/Mash
- MetaPhlAn2 (version 2.0) Segata et al., 2012b; Truong et al., 2015 https://bitbucket.org/biobakery/metaphlan2
- HUMANn2 (version 0.7.1) Franzosa et al., 2018 https://bitbucket.org/biobakery/humann2/
- Bowtie2 (version 2.2.9) Langmead and Salzberg, 2012 https://github.com/BenLangmead/bowtie2
- Prodigal (version 2.6.3) https://github.com/hyattpd/Prodigal
- Pyani (version 0.2.6) Pritchard et al., 2016 https://github.com/widdowquinn/pyani
- StrainPhlAn (version 2.0.0) Truong et al., 2017 https://bitbucket.org/biobakery/metaphlan2
- Anvi’o (version 4) Eren et al., 2015 https://github.com/merenlab/anvio
- BWA (version 0.7.17) Li and Durbin, 2009 https://github.com/lh3/bwa
- CONCOCT (version 0.5.0) Alneberg et al., 2014 https://github.com/BinPro/CONCOCT
- RPSBlast Marchler-Bauer et al., 2003 ftp://ftp.ncbi.nih.gov/blast/executables/
- PhyloPhlAn (version dev, 0.25) Segata et al., 2013 https://bitbucket.org/nsegata/phylophlan
- Diamond (version 0.9.9.110) Buchfink et al., 2015 https://github.com/bbuchfink/diamond
- mafft (version 7.310) Katoh and Standley, 2013 https://github.com/The-Bioinformatics-Group/Albiorix/wiki/mafft
- trimal (version 1.2rev59) Capella-Gutie´ rrez et al., 2009 https://github.com/scapella/trimal
- RAxML (version 8.1.15) Stamatakis, 2014 https://github.com/stamatak/standard-RAxML
- IQ-TREE (version 1.6.6) Nguyen et al., 2015 https://github.com/Cibiv/IQ-TREE
- Roary (version 3.8) Page et al., 2015 https://github.com/sanger-pathogens/Roary
- blastn (version 2.6.0+) Altschul et al., 1990 ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast
- FastTree (version 2.1.9) Price et al., 2010 https://github.com/PavelTorgashov/FastTree
- ecodist R package Goslee and Urban, 2007 https://github.com/cran/ecodist
- GraPhlAn (version 1.1.3) Asnicar et al., 2015 https://bitbucket.org/nsegata/graphlan/
- FigTree (version 1.4.3) N/A http://tree.bio.ed.ac.uk/software/figtree/
- Prokka (version 1.12) Seemann, 2014 https://github.com/tseemann/prokka
- EggNOG mapper (version 1.0.3) Huerta-Cepas et al., 2017 https://github.com/jhcepas/eggnog-mapper
- HMM Eddy, 2011 https://github.com/guyz/HMM
- Barrnap (version 0.9) N/A https://github.com/tseemann/barrnap
- RDP (version 2.11) Cole et al., 2014; Wang et al., 2007 https://github.com/rdpstaff/classifier