Capturing sequence diversity in metagenomes with comprehensive and scalable probe design.

RSS de esta página

PubMed ID: 30718881

Imagen Publicación

Metsky HC, Siddle KJ, Gladden-Young A, Qu J, Yang DK, Brehio P, Goldfarb A, Piantadosi A, Wohl S, Carter A, Lin AE, Barnes KG, Tully DC, Corleis B, Hennigan S, Barbosa-Lima G, Vieira YR, Paul LM, Tan AL, Garcia KF, Parham LA, Odia I, Eromon P, Folarin OA, Goba A, Simon-Lorière E, Hensley L, Balmaseda A, Harris E, Kwon DS, Allen TM, Runstadler JA, Smole S, Bozza FA, Souza TML, Isern S, Michael SF, Lorenzana I, Gehrke L, Bosch I, Ebel G, Grant DS, Happi CT, Park DJ, Gnirke A, Sabeti PC, Matranga CB

Nat Biotechnol. 02 2019. doi: 10.1038/s41587-018-0006-x

COMMENT: Sequencing of patient samples for detection and characterization of human viral pathogens is hampered by the low viral titers or high levels of host material. New tools are needed that improve metagenomic sequencing sensitivity. Previous studies have used targeted amplification or enrichment via capture of viral nucleic acid using oligonucleotide probes to improve the sensitivity of sequencing for specific viruses. However, achieving comprehensive sequencing of viruses is challenging owing to the large diversity of viral genomes. This study developed a computational method to design optimal probe sets that enhance nucleic acid capture for enrichment of diverse microbial taxa.


Here we develop and implement CATCH (compact aggregation of targets for comprehensive hybridization), a method that yields scalable and comprehensive probe designs from any collection of target sequences. We use CATCH to design several multivirus probe sets and then use these to enrich viral nucleic acid in sequencing libraries from patient and environmental samples across diverse source material. We evaluate their performance and investigate any biases introduced by capture with these probe sets. Finally, to demonstrate use in clinical and biosurveillance settings, we apply these probe sets to recover Lassa virus genomes in low-titer clinical samples from the 2018 Lassa fever outbreak in Nigeria and to identify viruses in human and mosquito samples with unknown content.

Main results:

CATCH accepts any collection of unaligned sequences to design probe sets.

CATCH condenses highly diverse target sequence data into a small number of oligonucleotides, enabling more efficient and sensitive sequencing that is only biased by the extent of known diversity.

CATCH is implemented in a Python package that is publicly available at

The authors focused on applying CATCH to capture viral genomes in complex metagenomic samples.

We used CATCH to design a probe set that targets all viral species reported to infect humans (V ALL ), which could be used to achieve more sensitive metagenomic sequencing of viruses from human samples. V ALL encompasses 356 species (86 genera, 31 families).(...) We constrained the number of probes to 350,000.

To compare the performance of V ALL against probe sets with lower complexity, we separately designed three focused probe sets for commonly co-circulating viral infections: measles and mumps viruses (V MM ; 6,219 probes), Zika and chikungunya viruses (V ZC ; 6,171 probes), and a panel of 23 species (16 genera, 12 families) circulating in West Africa (V WAFR ; 44,995 probes)

We synthesized V ALL as 75-nucleotide (nt) biotinylated single- stranded DNA (ssDNA) and the focused probe sets (V WAFR , V MM , V ZC ) as 100-nt biotinylated ssRNA

To evaluate the enrichment efficiency of V ALL , we prepared sequencing libraries from 30 patient and environmental samples.(..) The samples encompass a range of source materials: plasma, serum, buccal swabs, urine, avian swabs, and mosquito pools. We performed capture on these libraries and sequenced them both before and after capture.

Overall, we observed a median increase in unique viral reads across all samples of 18X. (…) Overall, our results suggest that neither the complexity of the V ALL probe set nor its use of shorter ssDNA probes prevent it from efficiently enriching viral content.

We show that capture with probe sets designed by CATCH improves viral genome detection and recovery while accurately preserving sample complexity.

In West Africa we are using the V ALL probe set to characterize LASV and other viruses in patients with undiagnosed fevers by sequencing on a MiSeq (Illumina). This could also be applied on other small machines such as the iSeq (Illumina) or MinION (Oxford Nanopore) . Further, the increase in viral content enables more samples to be pooled and sequenced on a single run, increasing sample throughput and decreasing per-sample cost relative to unbiased sequencing

Targeted amplicon approaches may be faster and more sensitive for sequencing ultra-low-titer samples, but the suitability of these approaches is limited by genome size, sequence heterogeneity, and the need for prior knowledge of the target species.

Similarly, for molecular diagnostics of particular pathogens, many commonly used assays such as qRT–PCR and rapid antigen tests arelikely to be faster and less expensive than metagenomic sequencing.


CATCH is a versatile approach that could also be used to design oligonucleotide sequences for capturing non-viral microbial genomes or for uses other than whole-genome enrichment.

CATCH could benefit studies in other areas that use capture-based approaches, such as the detection of previously characterized fetal and tumor DNA from cell-free material in which known targets of interest may represent a small fraction of all material..

CATCH can identify conserved regions or regions suitable for differential identification, which can help in the design of PCR primers and CRISPR–Cas13 crRNAs for nucleic acid diagnostics

CATCH is, to our knowledge, the first approach to systematically design probe sets for whole-genome capture of highly diverse target sequences that span many species, making it a valuable extension to the existing toolkit for effective viral detection and surveillance with enrichment and other targeted approaches. We anticipate that CATCH, together with these approaches, will help provide a more complete understanding of microbial genetic diversity.


Diana López-Farfán