Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

OBJECTIVE:: To detect and visualize salient queries about menopause using Big Data from ChaCha. METHODS:: We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. RESULTS:: We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. CONCLUSIONS:: We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.

Original languageEnglish (US)
JournalMenopause
DOIs
StateAccepted/In press - Feb 20 2017

Fingerprint

Menopause
Hormones
Hot Flashes
Sweat
Information Storage and Retrieval
Emotions

ASJC Scopus subject areas

  • Obstetrics and Gynecology

Cite this

@article{b6236d0f7a864454b8341b08a2927e99,
title = "Menopause and big data: Word Adjacency Graph modeling of menopause-related ChaCha data",
abstract = "OBJECTIVE:: To detect and visualize salient queries about menopause using Big Data from ChaCha. METHODS:: We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. RESULTS:: We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. CONCLUSIONS:: We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.",
author = "Janet Carpenter and Doyle Groves and Chen Chen and Julie Otte and Wendy Miller",
year = "2017",
month = "2",
day = "20",
doi = "10.1097/GME.0000000000000833",
language = "English (US)",
journal = "Menopause",
issn = "1072-3714",
publisher = "Lippincott Williams and Wilkins",

}

TY - JOUR

T1 - Menopause and big data

T2 - Word Adjacency Graph modeling of menopause-related ChaCha data

AU - Carpenter, Janet

AU - Groves, Doyle

AU - Chen, Chen

AU - Otte, Julie

AU - Miller, Wendy

PY - 2017/2/20

Y1 - 2017/2/20

N2 - OBJECTIVE:: To detect and visualize salient queries about menopause using Big Data from ChaCha. METHODS:: We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. RESULTS:: We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. CONCLUSIONS:: We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.

AB - OBJECTIVE:: To detect and visualize salient queries about menopause using Big Data from ChaCha. METHODS:: We used Word Adjacency Graph (WAG) modeling to detect clusters and visualize the range of menopause-related topics and their mutual proximity. The subset of relevant queries was fully modeled. We split each query into token words (ie, meaningful words and phrases) and removed stopwords (ie, not meaningful functional words). The remaining words were considered in sequence to build summary tables of words and two and three-word phrases. Phrases occurring at least 10 times were used to build a network graph model that was iteratively refined by observing and removing clusters of unrelated content. RESULTS:: We identified two menopause-related subsets of queries by searching for questions containing menopause and menopause-related terms (eg, climacteric, hot flashes, night sweats, hormone replacement). The first contained 263,363 queries from individuals aged 13 and older and the second contained 5,892 queries from women aged 40 to 62 years. In the first set, we identified 12 topic clusters: 6 relevant to menopause and 6 less relevant. In the second set, we identified 15 topic clusters: 11 relevant to menopause and 4 less relevant. Queries about hormones were pervasive within both WAG models. Many of the queries reflected low literacy levels and/or feelings of embarrassment. CONCLUSIONS:: We modeled menopause-related queries posed by ChaCha users between 2009 and 2012. ChaCha data may be used on its own or in combination with other Big Data sources to identify patient-driven educational needs and create patient-centered interventions.

UR - http://www.scopus.com/inward/record.url?scp=85013408933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013408933&partnerID=8YFLogxK

U2 - 10.1097/GME.0000000000000833

DO - 10.1097/GME.0000000000000833

M3 - Article

C2 - 28225431

AN - SCOPUS:85013408933

JO - Menopause

JF - Menopause

SN - 1072-3714

ER -