ResSeq

Enhancing Short-Read Sequencing Alignment by Rescuing Error-Containing Reads

Weixing Feng, Peichao Sang, Deyuan Lian, Yansheng Dong, Fengfei Song, Meng Li, Bo He, Fenglin Cao, Yunlong Liu

Research output: Contribution to journalArticle

Abstract

Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.

Original languageEnglish (US)
Article number6942207
Pages (from-to)795-798
Number of pages4
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume12
Issue number4
DOIs
StatePublished - Jul 1 2015
Externally publishedYes

Fingerprint

Sequencing
Alignment
Bayes Theorem
Biological Factors
Genomics
Single Nucleotide Polymorphism
Nucleotides
Genome
Single nucleotide Polymorphism
Polymorphism
Measurement errors
Measurement Error
Locus
Genes
Calculate
Evaluate
Requirements

Keywords

  • Alignment
  • Error Analysis
  • Sequencing
  • Short-Read

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

ResSeq : Enhancing Short-Read Sequencing Alignment by Rescuing Error-Containing Reads. / Feng, Weixing; Sang, Peichao; Lian, Deyuan; Dong, Yansheng; Song, Fengfei; Li, Meng; He, Bo; Cao, Fenglin; Liu, Yunlong.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, No. 4, 6942207, 01.07.2015, p. 795-798.

Research output: Contribution to journalArticle

Feng, Weixing ; Sang, Peichao ; Lian, Deyuan ; Dong, Yansheng ; Song, Fengfei ; Li, Meng ; He, Bo ; Cao, Fenglin ; Liu, Yunlong. / ResSeq : Enhancing Short-Read Sequencing Alignment by Rescuing Error-Containing Reads. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2015 ; Vol. 12, No. 4. pp. 795-798.
@article{fb6d92d7f23e4617b6d5c134fca46a5d,
title = "ResSeq: Enhancing Short-Read Sequencing Alignment by Rescuing Error-Containing Reads",
abstract = "Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.",
keywords = "Alignment, Error Analysis, Sequencing, Short-Read",
author = "Weixing Feng and Peichao Sang and Deyuan Lian and Yansheng Dong and Fengfei Song and Meng Li and Bo He and Fenglin Cao and Yunlong Liu",
year = "2015",
month = "7",
day = "1",
doi = "10.1109/TCBB.2014.2366103",
language = "English (US)",
volume = "12",
pages = "795--798",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - ResSeq

T2 - Enhancing Short-Read Sequencing Alignment by Rescuing Error-Containing Reads

AU - Feng, Weixing

AU - Sang, Peichao

AU - Lian, Deyuan

AU - Dong, Yansheng

AU - Song, Fengfei

AU - Li, Meng

AU - He, Bo

AU - Cao, Fenglin

AU - Liu, Yunlong

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.

AB - Next-generation short-read sequencing is widely utilized in genomic studies. Biological applications require an alignment step to map sequencing reads to the reference genome, before acquiring expected genomic information. This requirement makes alignment accuracy a key factor for effective biological interpretation. Normally, when accounting for measurement errors and single nucleotide polymorphisms, short read mappings with a few mismatches are generally considered acceptable. However, to further improve the efficiency of short-read sequencing alignment, we propose a method to retrieve additional reliably aligned reads (reads with more than a pre-defined number of mismatches), using a Bayesian-based approach. In this method, we first retrieve the sequence context around the mismatched nucleotides within the already aligned reads; these loci contain the genomic features where sequencing errors occur. Then, using the derived pattern, we evaluate the remaining (typically discarded) reads with more than the allowed number of mismatches, and calculate a score that represents the probability that a specific alignment is correct. This strategy allows the extraction of more reliably aligned reads, therefore improving alignment sensitivity. Implementation: The source code of our tool, ResSeq, can be downloaded from: https://github.com/hrbeubiocenter/Resseq.

KW - Alignment

KW - Error Analysis

KW - Sequencing

KW - Short-Read

UR - http://www.scopus.com/inward/record.url?scp=84939168179&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939168179&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2014.2366103

DO - 10.1109/TCBB.2014.2366103

M3 - Article

VL - 12

SP - 795

EP - 798

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 4

M1 - 6942207

ER -