RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

Original languageEnglish (US)
Pages (from-to)637-646
Number of pages10
JournalJournal of Computational Biology
Volume24
Issue number7
DOIs
StatePublished - Jul 1 2017

Fingerprint

Nucleotides
Sequencing
Low Frequency
Tumors
Tumor
Throughput
High Throughput
Neoplasms
Mutation
Bioinformatics
Benchmarking
High-Throughput Nucleotide Sequencing
Target
Learning systems
Error Model
Pooling
Computational Modeling
Generalized Linear Model
DNA
Computational Biology

Keywords

  • low frequency SNVs
  • machine learning
  • next-generation sequencing
  • sequencing error modeling
  • somatic mutation

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

RareVar : A Framework for Detecting Low-Frequency Single-Nucleotide Variants. / Hao, Yangyang; Xuei, Xiaoling; Li, Lang; Nakshatri, Harikrishna; Edenberg, Howard; Liu, Yunlong.

In: Journal of Computational Biology, Vol. 24, No. 7, 01.07.2017, p. 637-646.

Research output: Contribution to journalArticle

@article{5892c7bc8d4f4fa884092b33ded9c24e,
title = "RareVar: A Framework for Detecting Low-Frequency Single-Nucleotide Variants",
abstract = "Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5{\%} allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5{\%}-3{\%} in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.",
keywords = "low frequency SNVs, machine learning, next-generation sequencing, sequencing error modeling, somatic mutation",
author = "Yangyang Hao and Xiaoling Xuei and Lang Li and Harikrishna Nakshatri and Howard Edenberg and Yunlong Liu",
year = "2017",
month = "7",
day = "1",
doi = "10.1089/cmb.2017.0057",
language = "English (US)",
volume = "24",
pages = "637--646",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "7",

}

TY - JOUR

T1 - RareVar

T2 - A Framework for Detecting Low-Frequency Single-Nucleotide Variants

AU - Hao, Yangyang

AU - Xuei, Xiaoling

AU - Li, Lang

AU - Nakshatri, Harikrishna

AU - Edenberg, Howard

AU - Liu, Yunlong

PY - 2017/7/1

Y1 - 2017/7/1

N2 - Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

AB - Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.

KW - low frequency SNVs

KW - machine learning

KW - next-generation sequencing

KW - sequencing error modeling

KW - somatic mutation

UR - http://www.scopus.com/inward/record.url?scp=85021758780&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021758780&partnerID=8YFLogxK

U2 - 10.1089/cmb.2017.0057

DO - 10.1089/cmb.2017.0057

M3 - Article

C2 - 28541743

AN - SCOPUS:85021758780

VL - 24

SP - 637

EP - 646

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 7

ER -