Brian Strope


Modeling the perception of pitch-rate amplitude modulation in noise

B. Strope and A. Alwan

ABSTRACT: Currently, most automatic speech recognition (ASR) systems integrate spectral estimates over multiple pitch periods and remove explicit pitch and voicing information. However, amplitude modulation cues in voiced speech provide a robust and salient pitch perception which may be instrumental for recognizing speech in noise. In this study, three psychoacoustic models are used to predict the temporal modulation transfer function (TMTF) and the detection of voicing for high-pass filtered natural fricatives in noise. Models using an envelope statistic and modulation filtering predict the TMTF data, while predictions from a model using a summary autocorrelogram approximate both data sets.

[UCLA] [EE] [SPAPL] [bps] [research]
bps@ucla.edu