Phil Bangayan, Abeer Alwan
Department of Electrical Engineering 66-147E Enginering IV UCLA 405 Hilgard Av. Los Angeles, CA 90024-1594
Division of Head/Neck Surgery UCLA School of Medicine and VA Medical Center Los Angeles, CA 90024
Speech and Hearing Program Health Sciences and Technology Massachusetts Institute of Technology Cambridge, MA 02139Appeared in the 127th Acoustical Society of America Conference, June 1994
In this paper, the acoustic and perceptual correlates of bicyclic, rough/breathy, rough/bicyclic, strained/breathy, and breathy/bicyclic, are studied. The work represents a continuation of a previous study [ASA93, Pt.2, 2aSP9]. An analysis-by-synthesis approach is used, utilizing KLSYN88, to study ten speech waveforms obtained from the VA Hospital in West LA. Preliminary results indicate the synthesizer's diplophonia parameter (DI) is useful in synthesizing bicyclic voices. Other severe disorders can be synthesized in one of three ways: (1) simultaneous and equal use of parameters needed to synthesize milder cases of pathologies; for example, rough/breathy voices are synthesized with a time-varying F0, characteristic of rough voices, in combination with a high amplitude of aspiration noise, needed for breathiness perception, (2) increased use of a single set of parameters appropriate for a milder pathology; for example, a rough/bicyclic voice is synthesized with a time-varying F0 and very little DI, and (3) sequential use of parameters appropriate for two different qualities; for example, the synthesis of a strained/breathy voice requires varying the open-quotient parameter in time to match the acoustic and perceptual correlates of breathiness in one time interval and those of the strained quality in the other. These results will be discussed in terms of the independence, or otherwise correlation, of acoustic and perceptual features.
This paper was accompanied by nine pairs of voices, listed below. The voices were originally sampled at 16-bit and 20kHz, but for the purposes of this page, they were downsampled to 8kHz and compressed into 8-bit mu-law format. Thus, these are not exactly the voices demonstrated at the conference.
Category Natural Synthetic bicyclic male: bim2nat bim2syn bicyclic male: bim1nat bim1syn bicyclic female: bif1nat bif1syn rough-breathy male: rbrm2nat rbrm2syn rough-breathy male: rbrm3nat rbrm3syn rough-breathy male: rbrm1nat rbrm1syn rough-breathy female: rbrf1nat rbrf1syn rough-bicyclic male: rbim1nat rbim1syn strained-breathy female: sbf1nat sbf1syn
This paper was presented in poster format. (Not yet ready)
Back to SPAPL Publications Page
Back to Phil's Home PagePhil Bangayan (firstname.lastname@example.org)