Biomedical case studies in data intensive computing

Geoffrey Fox, Xiaohong Qiu, Scott Beason, Jong Choi, Jaliya Ekanayake, Thilina Gunarathne, Mina Rho, Haixu Tang, Neil Devadasan, Gilbert Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Many areas of science are seeing a data deluge coming from new instruments, myriads of sensors and exponential growth in electronic records. We take two examples - one the analysis of gene sequence data (35339 Alu sequences) and other a study of medical information (over 100,000 patient records) in Indianapolis and their relationship to Geographic and Information System and Census data available for 635 Census Blocks in Indianapolis. We look at initial processing (such as Smith Waterman dissimilarities), clustering (using robust deterministic annealing) and Multi Dimensional Scaling to map high dimension data to 3D for convenient visualization. We show how scaling pipelines can be produced that can be implemented using either cloud technologies or MPI which are compared. This study illustrates challenges in integrating data exploration tools with a variety of different architectural requirements and natural programming models. We present preliminary results for end to end study of two complete applications.

Original languageEnglish (US)
Title of host publicationCloud Computing - First International Conference, CloudCom 2009, Proceedings
Pages2-18
Number of pages17
DOIs
StatePublished - Dec 16 2009
Event1st International Conference on Cloud Computing, CloudCom 2009 - Beijing, China
Duration: Dec 1 2009Dec 4 2009

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5931 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other1st International Conference on Cloud Computing, CloudCom 2009
CountryChina
CityBeijing
Period12/1/0912/4/09

    Fingerprint

Keywords

  • Clouds
  • Clustering
  • Dryad
  • Hadoop
  • MapReduce
  • MPI
  • Sequencing

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Fox, G., Qiu, X., Beason, S., Choi, J., Ekanayake, J., Gunarathne, T., Rho, M., Tang, H., Devadasan, N., & Liu, G. (2009). Biomedical case studies in data intensive computing. In Cloud Computing - First International Conference, CloudCom 2009, Proceedings (pp. 2-18). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5931 LNCS). https://doi.org/10.1007/978-3-642-10665-1_2