Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm

Jingwen Yan, Lei Du, Sungeun Kim, Shannon L. Risacher, Heng Huang, Jason H. Moore, Andrew Saykin, Li Shen

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Motivation: Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single-nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. The complexity of these datasets has presented critical bioinformatics challenges that require new enabling tools. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, most of the existing SCCA algorithms are designed using the soft thresholding method, which assumes that the input features are independent from one another. This assumption clearly does not hold for the imaging genetic data. In this article, we propose a new knowledge-guided SCCA algorithm (KG-SCCA) to overcome this limitation as well as improve learning results by incorporating valuable prior knowledge. Results: The proposed KG-SCCA method is able to model two types of prior knowledge: one as a group structure (e.g. linkage disequilibrium blocks among SNPs) and the other as a network structure (e.g. gene co-expression network among brain regions). The new model incorporates these prior structures by introducing new regularization terms to encourage weight similarity between grouped or connected features. A new algorithm is designed to solve the KG-SCCA model without imposing the independence constraint on the input features. We demonstrate the effectiveness of our algorithm with both synthetic and real data. For real data, using an Alzheimer's disease (AD) cohort, we examine the imaging genetic associations between all SNPs in the APOE gene (i.e. top AD gene) and amyloid deposition measures among cortical regions (i.e. a major AD hallmark). In comparison with a widely used SCCA implementation, our KG-SCCA algorithm produces not only improved cross-validation performances but also biologically meaningful results.

Original languageEnglish
JournalBioinformatics
Volume30
Issue number17
DOIs
StatePublished - Sep 1 2014

Fingerprint

Transcriptome
Amyloid
Learning algorithms
Learning Algorithm
Imaging
Learning
Imaging techniques
Canonical Correlation Analysis
Single nucleotide Polymorphism
Nucleotides
Polymorphism
Alzheimer's Disease
Single Nucleotide Polymorphism
Alzheimer Disease
Genes
Gene
Prior Knowledge
Brain
Neuroimaging
Genetic Association

ASJC Scopus subject areas

  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Biochemistry
  • Medicine(all)

Cite this

Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm. / Yan, Jingwen; Du, Lei; Kim, Sungeun; Risacher, Shannon L.; Huang, Heng; Moore, Jason H.; Saykin, Andrew; Shen, Li.

In: Bioinformatics, Vol. 30, No. 17, 01.09.2014.

Research output: Contribution to journalArticle

Yan, Jingwen ; Du, Lei ; Kim, Sungeun ; Risacher, Shannon L. ; Huang, Heng ; Moore, Jason H. ; Saykin, Andrew ; Shen, Li. / Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm. In: Bioinformatics. 2014 ; Vol. 30, No. 17.
@article{dacd2bcc1afe4be3913bef315fefd716,
title = "Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm",
abstract = "Motivation: Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single-nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. The complexity of these datasets has presented critical bioinformatics challenges that require new enabling tools. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, most of the existing SCCA algorithms are designed using the soft thresholding method, which assumes that the input features are independent from one another. This assumption clearly does not hold for the imaging genetic data. In this article, we propose a new knowledge-guided SCCA algorithm (KG-SCCA) to overcome this limitation as well as improve learning results by incorporating valuable prior knowledge. Results: The proposed KG-SCCA method is able to model two types of prior knowledge: one as a group structure (e.g. linkage disequilibrium blocks among SNPs) and the other as a network structure (e.g. gene co-expression network among brain regions). The new model incorporates these prior structures by introducing new regularization terms to encourage weight similarity between grouped or connected features. A new algorithm is designed to solve the KG-SCCA model without imposing the independence constraint on the input features. We demonstrate the effectiveness of our algorithm with both synthetic and real data. For real data, using an Alzheimer's disease (AD) cohort, we examine the imaging genetic associations between all SNPs in the APOE gene (i.e. top AD gene) and amyloid deposition measures among cortical regions (i.e. a major AD hallmark). In comparison with a widely used SCCA implementation, our KG-SCCA algorithm produces not only improved cross-validation performances but also biologically meaningful results.",
author = "Jingwen Yan and Lei Du and Sungeun Kim and Risacher, {Shannon L.} and Heng Huang and Moore, {Jason H.} and Andrew Saykin and Li Shen",
year = "2014",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/btu465",
language = "English",
volume = "30",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm

AU - Yan, Jingwen

AU - Du, Lei

AU - Kim, Sungeun

AU - Risacher, Shannon L.

AU - Huang, Heng

AU - Moore, Jason H.

AU - Saykin, Andrew

AU - Shen, Li

PY - 2014/9/1

Y1 - 2014/9/1

N2 - Motivation: Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single-nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. The complexity of these datasets has presented critical bioinformatics challenges that require new enabling tools. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, most of the existing SCCA algorithms are designed using the soft thresholding method, which assumes that the input features are independent from one another. This assumption clearly does not hold for the imaging genetic data. In this article, we propose a new knowledge-guided SCCA algorithm (KG-SCCA) to overcome this limitation as well as improve learning results by incorporating valuable prior knowledge. Results: The proposed KG-SCCA method is able to model two types of prior knowledge: one as a group structure (e.g. linkage disequilibrium blocks among SNPs) and the other as a network structure (e.g. gene co-expression network among brain regions). The new model incorporates these prior structures by introducing new regularization terms to encourage weight similarity between grouped or connected features. A new algorithm is designed to solve the KG-SCCA model without imposing the independence constraint on the input features. We demonstrate the effectiveness of our algorithm with both synthetic and real data. For real data, using an Alzheimer's disease (AD) cohort, we examine the imaging genetic associations between all SNPs in the APOE gene (i.e. top AD gene) and amyloid deposition measures among cortical regions (i.e. a major AD hallmark). In comparison with a widely used SCCA implementation, our KG-SCCA algorithm produces not only improved cross-validation performances but also biologically meaningful results.

AB - Motivation: Imaging genetics is an emerging field that studies the influence of genetic variation on brain structure and function. The major task is to examine the association between genetic markers such as single-nucleotide polymorphisms (SNPs) and quantitative traits (QTs) extracted from neuroimaging data. The complexity of these datasets has presented critical bioinformatics challenges that require new enabling tools. Sparse canonical correlation analysis (SCCA) is a bi-multivariate technique used in imaging genetics to identify complex multi-SNP-multi-QT associations. However, most of the existing SCCA algorithms are designed using the soft thresholding method, which assumes that the input features are independent from one another. This assumption clearly does not hold for the imaging genetic data. In this article, we propose a new knowledge-guided SCCA algorithm (KG-SCCA) to overcome this limitation as well as improve learning results by incorporating valuable prior knowledge. Results: The proposed KG-SCCA method is able to model two types of prior knowledge: one as a group structure (e.g. linkage disequilibrium blocks among SNPs) and the other as a network structure (e.g. gene co-expression network among brain regions). The new model incorporates these prior structures by introducing new regularization terms to encourage weight similarity between grouped or connected features. A new algorithm is designed to solve the KG-SCCA model without imposing the independence constraint on the input features. We demonstrate the effectiveness of our algorithm with both synthetic and real data. For real data, using an Alzheimer's disease (AD) cohort, we examine the imaging genetic associations between all SNPs in the APOE gene (i.e. top AD gene) and amyloid deposition measures among cortical regions (i.e. a major AD hallmark). In comparison with a widely used SCCA implementation, our KG-SCCA algorithm produces not only improved cross-validation performances but also biologically meaningful results.

UR - http://www.scopus.com/inward/record.url?scp=84907030081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907030081&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu465

DO - 10.1093/bioinformatics/btu465

M3 - Article

VL - 30

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

ER -