Mining massive SNP data for identifying associated SNPs and uncovering gene relationships

Amy Webb, Aaron Albin, Zhan Ye, Majid Rastegar-Mojarad, Kun Huang, Jeffrey Parvin, Wolfgang Sadee, Lang Li, Simon Lin, Yang Xiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and even across different chromosomes. To facilitate the study of SNP associations, our goal is to find SNPs that coexist in a significant number of samples regardless of their genomic distance, and subsequently to study the relationships among these associated SNPs and corresponding genes. This problem of mining co-occurrent SNP associations is computationally challenging and motivates us to design an efficient data mining algorithm FCIRC to mine SNP associations from massive SNP data. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes, and some may imply biological and disease mechanisms related to corresponding genes.

Original languageEnglish
Title of host publicationACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages304-313
Number of pages10
ISBN (Print)9781450328944
DOIs
StatePublished - Sep 20 2014
Event5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014 - Newport Beach, United States
Duration: Sep 20 2014Sep 23 2014

Other

Other5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014
CountryUnited States
CityNewport Beach
Period9/20/149/23/14

Fingerprint

Chromosomes
Single Nucleotide Polymorphism
Genes
Data mining
Data Mining

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications
  • Software
  • Biomedical Engineering

Cite this

Webb, A., Albin, A., Ye, Z., Rastegar-Mojarad, M., Huang, K., Parvin, J., ... Xiang, Y. (2014). Mining massive SNP data for identifying associated SNPs and uncovering gene relationships. In ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 304-313). Association for Computing Machinery, Inc. https://doi.org/10.1145/2649387.264939

Mining massive SNP data for identifying associated SNPs and uncovering gene relationships. / Webb, Amy; Albin, Aaron; Ye, Zhan; Rastegar-Mojarad, Majid; Huang, Kun; Parvin, Jeffrey; Sadee, Wolfgang; Li, Lang; Lin, Simon; Xiang, Yang.

ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2014. p. 304-313.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Webb, A, Albin, A, Ye, Z, Rastegar-Mojarad, M, Huang, K, Parvin, J, Sadee, W, Li, L, Lin, S & Xiang, Y 2014, Mining massive SNP data for identifying associated SNPs and uncovering gene relationships. in ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, pp. 304-313, 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM BCB 2014, Newport Beach, United States, 9/20/14. https://doi.org/10.1145/2649387.264939
Webb A, Albin A, Ye Z, Rastegar-Mojarad M, Huang K, Parvin J et al. Mining massive SNP data for identifying associated SNPs and uncovering gene relationships. In ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc. 2014. p. 304-313 https://doi.org/10.1145/2649387.264939
Webb, Amy ; Albin, Aaron ; Ye, Zhan ; Rastegar-Mojarad, Majid ; Huang, Kun ; Parvin, Jeffrey ; Sadee, Wolfgang ; Li, Lang ; Lin, Simon ; Xiang, Yang. / Mining massive SNP data for identifying associated SNPs and uncovering gene relationships. ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2014. pp. 304-313
@inproceedings{0b866c4e31b7416eafc15d25b899f558,
title = "Mining massive SNP data for identifying associated SNPs and uncovering gene relationships",
abstract = "Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and even across different chromosomes. To facilitate the study of SNP associations, our goal is to find SNPs that coexist in a significant number of samples regardless of their genomic distance, and subsequently to study the relationships among these associated SNPs and corresponding genes. This problem of mining co-occurrent SNP associations is computationally challenging and motivates us to design an efficient data mining algorithm FCIRC to mine SNP associations from massive SNP data. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes, and some may imply biological and disease mechanisms related to corresponding genes.",
author = "Amy Webb and Aaron Albin and Zhan Ye and Majid Rastegar-Mojarad and Kun Huang and Jeffrey Parvin and Wolfgang Sadee and Lang Li and Simon Lin and Yang Xiang",
year = "2014",
month = "9",
day = "20",
doi = "10.1145/2649387.264939",
language = "English",
isbn = "9781450328944",
pages = "304--313",
booktitle = "ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - Mining massive SNP data for identifying associated SNPs and uncovering gene relationships

AU - Webb, Amy

AU - Albin, Aaron

AU - Ye, Zhan

AU - Rastegar-Mojarad, Majid

AU - Huang, Kun

AU - Parvin, Jeffrey

AU - Sadee, Wolfgang

AU - Li, Lang

AU - Lin, Simon

AU - Xiang, Yang

PY - 2014/9/20

Y1 - 2014/9/20

N2 - Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and even across different chromosomes. To facilitate the study of SNP associations, our goal is to find SNPs that coexist in a significant number of samples regardless of their genomic distance, and subsequently to study the relationships among these associated SNPs and corresponding genes. This problem of mining co-occurrent SNP associations is computationally challenging and motivates us to design an efficient data mining algorithm FCIRC to mine SNP associations from massive SNP data. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes, and some may imply biological and disease mechanisms related to corresponding genes.

AB - Studies on SNP correlations have been focused on SNPs located on the same chromosome since SNPs on different chromosomes are expected to segregate randomly. Previous studies suggest that SNPs can be associated with each other over long distances and even across different chromosomes. To facilitate the study of SNP associations, our goal is to find SNPs that coexist in a significant number of samples regardless of their genomic distance, and subsequently to study the relationships among these associated SNPs and corresponding genes. This problem of mining co-occurrent SNP associations is computationally challenging and motivates us to design an efficient data mining algorithm FCIRC to mine SNP associations from massive SNP data. By applying our method on the original SNP data and random chromosome permutation data, we demonstrate that our method is able to find non-random SNP associations across multiple chromosomes. Among the large amount of associated SNPs identified by our method, many of them involve multiple chromosomes. Some SNP associations also suggest novel relationships among the corresponding genes, and some may imply biological and disease mechanisms related to corresponding genes.

UR - http://www.scopus.com/inward/record.url?scp=84920733525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920733525&partnerID=8YFLogxK

U2 - 10.1145/2649387.264939

DO - 10.1145/2649387.264939

M3 - Conference contribution

AN - SCOPUS:84920733525

SN - 9781450328944

SP - 304

EP - 313

BT - ACM BCB 2014 - 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

PB - Association for Computing Machinery, Inc

ER -