High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)

Cong Liu, Jianping Jiang, Jianlei Gu, Zhangsheng Yu, Tao Wang, Hui Lu

Research output: Contribution to journalArticle

6 Scopus citations


Background: High-throughput technology could generate thousands to millions biomarker measurements in one experiment. However, results from high throughput analysis are often barely reproducible due to small sample size. Different statistical methods have been proposed to tackle this "small n and large p" scenario, for example different datasets could be pooled or integrated together to provide an effective way to improve reproducibility. However, the raw data is either unavailable or hard to integrate due to different experimental conditions, thus there is an emerging need to develop a method for "knowledge integration" in high-throughput data analysis. Results: In this study, we proposed an integrative prescreening approach, SKI, for high-throughput data analysis. A new rank is generated based on two initial ranks: (1) knowledge based rank; and (2) marginal correlation based rank. Our simulation shows the SKI outperforms other methods without knowledge-integration in terms of higher true positive rate given the same number of variables selected. We also applied our method in a drug response study and found its performance to be better than regular screening methods. Conclusion: The proposed method provides an effective way to integrate knowledge for high-throughput analysis. It could easily implemented with our provided R package named SKI.

Original languageEnglish (US)
Article number118
JournalBMC Systems Biology
StatePublished - Dec 23 2016


  • Dimension reduction
  • Knowledge integration
  • SKI
  • Sure independence screening
  • Variable selection

ASJC Scopus subject areas

  • Structural Biology
  • Modeling and Simulation
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint Dive into the research topics of 'High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI)'. Together they form a unique fingerprint.

  • Cite this