Identifying quantitative trait loci via group-sparse multitask regression and feature selection

An imaging genetics study of the ADNI cohort

Hua Wang, Feiping Nie, Heng Huang, Sungeun Kim, Kwangsik Nho, Shannon L. Risacher, Andrew Saykin, Li Shen

Research output: Contribution to journalArticle

80 Citations (Scopus)

Abstract

Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the inuence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ 2,1-norm (G 2,1-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G 2,1-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ 2,1-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.

Original languageEnglish
Article numberbtr649
Pages (from-to)229-237
Number of pages9
JournalBioinformatics
Volume28
Issue number2
DOIs
StatePublished - Jan 2012

Fingerprint

Quantitative Trait Loci
Single nucleotide Polymorphism
Nucleotides
Polymorphism
Feature Selection
Single Nucleotide Polymorphism
Feature extraction
Cohort Studies
Regression
Imaging
Imaging techniques
Norm
Brain
Regularization
Group Structure
Genetic Variation
Unit
Alzheimer's Disease
Performance Prediction
Regression Coefficient

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Identifying quantitative trait loci via group-sparse multitask regression and feature selection : An imaging genetics study of the ADNI cohort. / Wang, Hua; Nie, Feiping; Huang, Heng; Kim, Sungeun; Nho, Kwangsik; Risacher, Shannon L.; Saykin, Andrew; Shen, Li.

In: Bioinformatics, Vol. 28, No. 2, btr649, 01.2012, p. 229-237.

Research output: Contribution to journalArticle

@article{5472f715ae3e40228aa941a764d0e887,
title = "Identifying quantitative trait loci via group-sparse multitask regression and feature selection: An imaging genetics study of the ADNI cohort",
abstract = "Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the inuence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ 2,1-norm (G 2,1-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G 2,1-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ 2,1-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.",
author = "Hua Wang and Feiping Nie and Heng Huang and Sungeun Kim and Kwangsik Nho and Risacher, {Shannon L.} and Andrew Saykin and Li Shen",
year = "2012",
month = "1",
doi = "10.1093/bioinformatics/btr649",
language = "English",
volume = "28",
pages = "229--237",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Identifying quantitative trait loci via group-sparse multitask regression and feature selection

T2 - An imaging genetics study of the ADNI cohort

AU - Wang, Hua

AU - Nie, Feiping

AU - Huang, Heng

AU - Kim, Sungeun

AU - Nho, Kwangsik

AU - Risacher, Shannon L.

AU - Saykin, Andrew

AU - Shen, Li

PY - 2012/1

Y1 - 2012/1

N2 - Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the inuence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ 2,1-norm (G 2,1-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G 2,1-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ 2,1-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.

AB - Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the inuence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation. Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ 2,1-norm (G 2,1-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G 2,1-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ 2,1-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.

UR - http://www.scopus.com/inward/record.url?scp=84862970066&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862970066&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr649

DO - 10.1093/bioinformatics/btr649

M3 - Article

VL - 28

SP - 229

EP - 237

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 2

M1 - btr649

ER -