Modeling Auditory Perception for
Robust Speech Recognition

Brian P. Strope

Doctor of Philosophy in Electrical Engineering
University of California, Los Angeles, 1998
Professor Abeer A. Alwan, Chair

While non-stationary stochastic modeling techniques and the exponential growth of computational resources have led to substantial improvements in vocabulary size and speaker independence, most automatic speech recognition (ASR) systems remain overly sensitive to the acoustic environment, precluding widespread applications. The human auditory system, speech production mechanisms, and languages, on the other hand, are extremely well-tuned to facilitate speech communication in noise. Better modeling of these systems and mechanisms should illuminate robust strategies for speech processing applications. In this work, models of temporal adaptation, spectral peak isolation, an explicit parameterization of the position and motion of local spectral peaks, and the perception of pitch-rate amplitude modulation cues are shown to reduce the error rate of a word recognition system in noise by more than a factor of 4 over the typical current processing.

The disseration as:

  • pdf (1MB)
  • postscript (8MB)
  • ftp

  • [UCLA] [EE] [SPAPL] [bps]