Synthesis of Severely Pathological Voices.

Phil Bangayan, Abeer Alwan

Department of Electrical Engineering
66-147E Enginering IV
405 Hilgard Av.
Los Angeles, CA  90024-1594

Jody Kreiman

Division of Head/Neck Surgery
UCLA School of Medicine and VA Medical Center
Los Angeles, CA  90024

Christopher Long

Speech and Hearing Program
Health Sciences and Technology
Massachusetts Institute of Technology
Cambridge, MA  02139
Appeared in the 127th Acoustical Society of America Conference, June 1994


In this paper, the acoustic and perceptual correlates of bicyclic, rough/breathy, rough/bicyclic, strained/breathy, and breathy/bicyclic, are studied. The work represents a continuation of a previous study [ASA93, Pt.2, 2aSP9]. An analysis-by-synthesis approach is used, utilizing KLSYN88, to study ten speech waveforms obtained from the VA Hospital in West LA. Preliminary results indicate the synthesizer's diplophonia parameter (DI) is useful in synthesizing bicyclic voices. Other severe disorders can be synthesized in one of three ways: (1) simultaneous and equal use of parameters needed to synthesize milder cases of pathologies; for example, rough/breathy voices are synthesized with a time-varying F0, characteristic of rough voices, in combination with a high amplitude of aspiration noise, needed for breathiness perception, (2) increased use of a single set of parameters appropriate for a milder pathology; for example, a rough/bicyclic voice is synthesized with a time-varying F0 and very little DI, and (3) sequential use of parameters appropriate for two different qualities; for example, the synthesis of a strained/breathy voice requires varying the open-quotient parameter in time to match the acoustic and perceptual correlates of breathiness in one time interval and those of the strained quality in the other. These results will be discussed in terms of the independence, or otherwise correlation, of acoustic and perceptual features.


This paper was accompanied by nine pairs of voices, listed below. The voices were originally sampled at 16-bit and 20kHz, but for the purposes of this page, they were downsampled to 8kHz and compressed into 8-bit mu-law format. Thus, these are not exactly the voices demonstrated at the conference.

Category                   Natural       Synthetic
bicyclic male:             bim2nat       bim2syn
bicyclic male:             bim1nat       bim1syn
bicyclic female:           bif1nat       bif1syn
rough-breathy male:        rbrm2nat      rbrm2syn
rough-breathy male:        rbrm3nat      rbrm3syn
rough-breathy male:        rbrm1nat      rbrm1syn
rough-breathy female:      rbrf1nat      rbrf1syn
rough-bicyclic male:       rbim1nat      rbim1syn
strained-breathy female:   sbf1nat       sbf1syn


