Acoustic Analysis and Synthesis of Pathological Voice Qualities.

Christopher Long, Phil Bangayan, and Abeer Alwan

Department of Electrical Engineering
66-147E Engineering IV
405 Hilgard Av.
Los Angeles, CA  90024-1594

Appeared in the 126th Acoustical Society of America Conference, October 1993


An analysis-by-synthesis approach was adopted to classify the acoustic and perceptual features of three pathological voice qualities: breathy, strained, and rough. 160 waveforms of the vowel /a/ spoken by female and male subjects with pathological voice qualities were obtained from the VA Hospital in West LA. The temporal and spectral features of the waveforms were studied and the results were used in synthesizing the utterances using the Klatt formant synthesizer. Preliminary results on 30 breathy and strained voices indicate that the perception of 'pathological' breathiness is mainly related to: 1) a large open quotient of the glottal waveform (OQ), and 2) the amplitude of aspiration noise (AH) relative to that of voicing (AV) with female voices exhibiting a larger (AH-AV) than male voices. For some voices, it was also necessary to introduce extra poles to the vocal-tract transfer function to achieve a better spectral match. Synthesis of strained voices required a lower OQ than that needed for normal voices and, in some cases, amplitude and/or frequency modulation was introduced to achieve a better match in the time domain. The synthetic voices were judged perceptually by clinicians to be of high quality. The results will be discussed in terms of the effects of different vibratory patterns of the vocal folds on the acoustic speech waveform.

