Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

Beth G. Greene, John S. Logan, David Pisoni

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

Original languageEnglish
Pages (from-to)100-107
Number of pages8
JournalBehavior Research Methods, Instruments, & Computers
Volume18
Issue number2
DOIs
StatePublished - Mar 1986

Fingerprint

Speech Perception
Intelligibility
Natural Speech

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Psychology (miscellaneous)

Cite this

Perception of synthetic speech produced automatically by rule : Intelligibility of eight text-to-speech systems. / Greene, Beth G.; Logan, John S.; Pisoni, David.

In: Behavior Research Methods, Instruments, & Computers, Vol. 18, No. 2, 03.1986, p. 100-107.

Research output: Contribution to journalArticle

@article{11a0790bf363420c8d7035e94354e417,
title = "Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems",
abstract = "We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.",
author = "Greene, {Beth G.} and Logan, {John S.} and David Pisoni",
year = "1986",
month = "3",
doi = "10.3758/BF03201008",
language = "English",
volume = "18",
pages = "100--107",
journal = "Behavior Research Methods",
issn = "1554-351X",
publisher = "Springer New York",
number = "2",

}

TY - JOUR

T1 - Perception of synthetic speech produced automatically by rule

T2 - Intelligibility of eight text-to-speech systems

AU - Greene, Beth G.

AU - Logan, John S.

AU - Pisoni, David

PY - 1986/3

Y1 - 1986/3

N2 - We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

AB - We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.

UR - http://www.scopus.com/inward/record.url?scp=0002232642&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0002232642&partnerID=8YFLogxK

U2 - 10.3758/BF03201008

DO - 10.3758/BF03201008

M3 - Article

AN - SCOPUS:0002232642

VL - 18

SP - 100

EP - 107

JO - Behavior Research Methods

JF - Behavior Research Methods

SN - 1554-351X

IS - 2

ER -