Word Adjacency Graph Modeling: Separating Signal From Noise in Big Data

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

There is a need to develop methods to analyze Big Data to inform patient-centered interventions for better health outcomes. The purpose of this study was to develop and test a method to explore Big Data to describe salient health concerns of people with epilepsy. Specifically, we used Word Adjacency Graph modeling to explore a data set containing 1.9 billion anonymous text queries submitted to the ChaCha question and answer service to (a) detect clusters of epilepsy-related topics, and (b) visualize the range of epilepsy-related topics and their mutual proximity to uncover the breadth and depth of particular topics and groups of users. Applied to a large, complex data set, this method successfully identified clusters of epilepsy-related topics while allowing for separation of potentially non-relevant topics. The method can be used to identify patient-driven research questions from large social media data sets and results can inform the development of patient-centered interventions.

Original languageEnglish (US)
Pages (from-to)166-185
Number of pages20
JournalWestern Journal of Nursing Research
Volume39
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Epilepsy
Social Media
Health
Research
Datasets

Keywords

  • Big Data
  • epilepsy
  • informatics
  • machine learning
  • methods

ASJC Scopus subject areas

  • Nursing(all)

Cite this

Word Adjacency Graph Modeling : Separating Signal From Noise in Big Data. / Miller, Wendy; Groves, Doyle; Knopf, Amy; Otte, Julie; Silverman, Ross.

In: Western Journal of Nursing Research, Vol. 39, No. 1, 01.01.2017, p. 166-185.

Research output: Contribution to journalArticle

@article{75914706632a47ad922f04177e3b3320,
title = "Word Adjacency Graph Modeling: Separating Signal From Noise in Big Data",
abstract = "There is a need to develop methods to analyze Big Data to inform patient-centered interventions for better health outcomes. The purpose of this study was to develop and test a method to explore Big Data to describe salient health concerns of people with epilepsy. Specifically, we used Word Adjacency Graph modeling to explore a data set containing 1.9 billion anonymous text queries submitted to the ChaCha question and answer service to (a) detect clusters of epilepsy-related topics, and (b) visualize the range of epilepsy-related topics and their mutual proximity to uncover the breadth and depth of particular topics and groups of users. Applied to a large, complex data set, this method successfully identified clusters of epilepsy-related topics while allowing for separation of potentially non-relevant topics. The method can be used to identify patient-driven research questions from large social media data sets and results can inform the development of patient-centered interventions.",
keywords = "Big Data, epilepsy, informatics, machine learning, methods",
author = "Wendy Miller and Doyle Groves and Amy Knopf and Julie Otte and Ross Silverman",
year = "2017",
month = "1",
day = "1",
doi = "10.1177/0193945916670363",
language = "English (US)",
volume = "39",
pages = "166--185",
journal = "Western Journal of Nursing Research",
issn = "0193-9459",
publisher = "SAGE Publications Inc.",
number = "1",

}

TY - JOUR

T1 - Word Adjacency Graph Modeling

T2 - Separating Signal From Noise in Big Data

AU - Miller, Wendy

AU - Groves, Doyle

AU - Knopf, Amy

AU - Otte, Julie

AU - Silverman, Ross

PY - 2017/1/1

Y1 - 2017/1/1

N2 - There is a need to develop methods to analyze Big Data to inform patient-centered interventions for better health outcomes. The purpose of this study was to develop and test a method to explore Big Data to describe salient health concerns of people with epilepsy. Specifically, we used Word Adjacency Graph modeling to explore a data set containing 1.9 billion anonymous text queries submitted to the ChaCha question and answer service to (a) detect clusters of epilepsy-related topics, and (b) visualize the range of epilepsy-related topics and their mutual proximity to uncover the breadth and depth of particular topics and groups of users. Applied to a large, complex data set, this method successfully identified clusters of epilepsy-related topics while allowing for separation of potentially non-relevant topics. The method can be used to identify patient-driven research questions from large social media data sets and results can inform the development of patient-centered interventions.

AB - There is a need to develop methods to analyze Big Data to inform patient-centered interventions for better health outcomes. The purpose of this study was to develop and test a method to explore Big Data to describe salient health concerns of people with epilepsy. Specifically, we used Word Adjacency Graph modeling to explore a data set containing 1.9 billion anonymous text queries submitted to the ChaCha question and answer service to (a) detect clusters of epilepsy-related topics, and (b) visualize the range of epilepsy-related topics and their mutual proximity to uncover the breadth and depth of particular topics and groups of users. Applied to a large, complex data set, this method successfully identified clusters of epilepsy-related topics while allowing for separation of potentially non-relevant topics. The method can be used to identify patient-driven research questions from large social media data sets and results can inform the development of patient-centered interventions.

KW - Big Data

KW - epilepsy

KW - informatics

KW - machine learning

KW - methods

UR - http://www.scopus.com/inward/record.url?scp=85006506890&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006506890&partnerID=8YFLogxK

U2 - 10.1177/0193945916670363

DO - 10.1177/0193945916670363

M3 - Article

AN - SCOPUS:85006506890

VL - 39

SP - 166

EP - 185

JO - Western Journal of Nursing Research

JF - Western Journal of Nursing Research

SN - 0193-9459

IS - 1

ER -