RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

Original languageEnglish (US)
Pages (from-to)637-646
Number of pages10
JournalJournal of Computational Biology
Volume24
Issue number7
DOIs
StatePublished - Jul 1 2017

    Fingerprint

Keywords

  • low frequency SNVs
  • machine learning
  • next-generation sequencing
  • sequencing error modeling
  • somatic mutation

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this