COPYRIGHT NOTICE This page includes the toolkits and data that we used in our papers. Please cite our corresponding papers when using any of the following materials.
Sharewares: Codes, Databases, and Useful Links

Codes

Glottaltopograph (GTG) analyze tool: a toolkit to analyze high-speed laryngeal videos.
Glottaltopography is a method to analyze high-speed laryngeal videos. The method is described in this paper: Gang Chen, Jody Kreiman, Abeer Alwan, "The glottaltopogram: a method of analyzing high-speed images of the vocal folds", Computer Speech and Language, 2014, in press. Briefly, the "glottaltopogram" is based on principal component analysis of pixels' light-intensity time sequences from consecutive video images. This method reveals the overall synchronization of the vibrational patterns of the vocal folds over the entire laryngeal area. This method is effective in visualizing pathological and normal vocal fold vibratory patterns. The GTG toolkit is available for download here.
Harmfreq_MOLRT: a statistical model, likelihood ratio test (LRT)-based speech/non-speech detection algorithm
Harmfreq_MOLRT is a statistical model, likelihood ratio test (LRT)-based speech/non-speech detection algorithm. The likelihood ratios (LRs) for voiced and unvoiced frames are computed differently: LR for voiced frames is calculated using only the harmonic DFTs; for unvoiced frames, LR is calculated using all DFTs. It is an improved version of the multiple observation (MO) LRT VAD proposed by Ramirez et. al. [Matlab code of Harmfreq_MOLRT VAD]
MBSC: a Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm for noisy speech
MBSC is a Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm for noisy speech. The package contains the matlab code that is used to generate the pitch detection results reported in L. N. Tan, and A. Alwan, "Multi-Band Summary Correlogram-based Pitch Detection for Noisy Speech", Speech Communication, in press. A fast version of the code is also provided in the package. [Matlab code of MBSC pitch detector (to be updated soon)]
SAFE: a Statistical Algorithm for F0 Estimation
SAFE is a toolkit using a Statistical Algorithm for F0 Estimation for both clean and noisy speech. It is developed at the Speech Processing and Auditory Perception Laboratory at UCLA by Wei Chu and Prof. Abeer Alwan. Available for download from here.
VFR: Variable Frame Rate	↑Top

details

download

VoiceSauce: A Program for Voice Analysis

↑Top

details

XVocal: Vocal Tract Articulatory Synthesizer

↑Top

XVocal

VTCALCS

XVocal

Databases

Databases Distributed through the Linguistic Data Consortium at the University of Pennsylvania (LDC)

↑Top

The Child Subglottal Resonances Database. Released t 2022, ISBN: 1-58563-985-0
UCLA Speaker Variability Database. Released through the LDC, 2021, ISBN: 1-58563-977-X
UCLA High-Speed Laryngeal Audio and Video Database. Released through the LDC, 2017, ISBN: 1-58563-803-X
The Subglottal Resonances Database. Released through the LDC, 2015, ISBN: 1-58563-711-4

UCLA Speaker Variability Database

Consonant Vowel Tokens (CV) Database

↑Top

details

Narrated Videotape Showing 3D Tongue and Vocal Tract Reconstructions from MRI Data for Consonants and Vowels

↑Top

http://www.ee.ucla.edu/~spapl/projects/mri.html

details

For a free copy of the videotape, please email Prof. Alwan at: alwan@icsl.ucla.edu

Non-Speech Time-Stamps for Aurora 2 Test Sets

↑Top

Consonant-Vowel-Consonant(CVC) syllables spoken at different rates in the presence of different levels of babble noise

↑Top

This database includes raw audio, 0dB babble noise corrupted audio and 5dB babble noise corrupted audio files.

Useful Links

Alexander Graham Bell's Path to the Telephone

•

F0 Estimation Resorces

(from the PhD dissertation of Arturo Camacho, SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music, 2007) ;

•

: This algorithm (Boersma, 1993) computes the autocorrelation of the signal and divides it by the autocorrelation of the window used to analyze the signal. It uses postprocessing to reduce discontinuities in the pitch trace. It is available with the Praat System at <http://www.fon.hum.uva.nl/praat> The name of the function is ac.

•

AC-S

: This algorithm uses the autocorrelation of the cubed signal. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs> . The name of the function is fxac.

•

ANAL

: This algorithm (Secrest and Doddington, 1983) uses autocorrelation to estimate the pitch, and dynamic programming to remove discontinuities in the pitch trace. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxanal.

•

CATE: This algorithm uses a quasi

autocorrelation function of the speech excitation signal to estimate the pitch. We implemented it based on its original description (Di Martino, 1999). The dynamic programming component used to remove discontinuities in the pitch trace was not implemented.

•

CC

: This algorithm uses cross-correlation to estimate the pitch and post-processing to remove discontinuities in the pitch trace. It is available with the Praat System at
<http://www.fon.hum.uva.nl/praat>. The name of the function is cc.

•

CEP

: This algorithm (Noll, 1967) uses the cepstrum of the signal and is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxcep.

•

ESRPD

: This algorithm (Bagshaw, 1993; Medan, 1991) uses a normalized cross-correlation to estimate the pitch, and post-processing to remove discontinuities in the pitch trace. It is available with the Festival Speech Filing System at <http://www.cstr.ed.ac.uk/projects/festival>. The name of the function is pda.

•

RAPT

: This algorithm (Secrest and Doddington, 1983) uses a normalized cross- correlation to estimate the pitch, and dynamic programming to remove discontinuities in the pitch trace. It is available with the Speech Filing System at <http://www.phon.ucl.ac.uk/resource/sfs>. The name of the function is fxrapt.

•

SHS

: This algorithm (Hermes, 1988) uses subharmonic summation. It is available with the Praat System at <http://www.fon.hum.uva.nl/praat>. The name of the function is shs.

•

SHR

: This algorithm (Sun, 2000) uses the subharmonic-to-harmonic ratio. It is available at Matlab Central <http://www.mathworks.com/matlabcentral, under the title “Pitch Determination Algorithm”>. The name of the function is shrp.

•

TEMPO

: This algorithm (Kawahara et al., 1999) uses the instantaneous frequency of the outputs of a filterbank. It is available with the STRAIGHT System at its author web page <http://www.wakayama-u.ac.jp/~kawahara>. The name of the function is exstraightsource.

•

YIN

: This algorithm (de Cheveigné and Kawahara, 2002) uses a modified version of the average squared difference function. It is available from its author web page at <http://www.ircam.fr/pcm/cheveign/sw/yin.zip>. The name of the function is yin.

UCLA Speech Processing and Auditory Perception Laboratory

Codes

Glottaltopograph (GTG) analyze tool: a toolkit to analyze high-speed laryngeal videos.

Harmfreq_MOLRT: a statistical model, likelihood ratio test (LRT)-based speech/non-speech detection algorithm

MBSC: a Multi-Band Summary Correlogram (MBSC)-based pitch detection algorithm for noisy speech

SAFE: a Statistical Algorithm for F0 Estimation

VFR: Variable Frame Rate

VoiceSauce: A Program for Voice Analysis

XVocal: Vocal Tract Articulatory Synthesizer

CTMRedit: a Matlab based MRI Image Segmentation Tool with GUI

Speechdemo: a Matlab based Speech Processing Platform with GUI

ITU G.722 Wide-band Codec implementation in ANSI C

Databases

Databases Distributed through the Linguistic Data Consortium at the University of Pennsylvania (LDC)

UCLA Speaker Variability Database

Consonant Vowel Tokens (CV) Database

VTR Formants Database

Narrated Videotape Showing 3D Tongue and Vocal Tract Reconstructions from MRI Data for Consonants and Vowels

Non-Speech Time-Stamps for Aurora 2 Test Sets

Consonant-Vowel-Consonant(CVC) syllables spoken at different rates in the presence of different levels of babble noise

Useful Links

UCSC Speech Links

Alexander Graham Bell's Path to the Telephone

F0 Estimation Resorces

AC-P

AC-S

ANAL

CATE: This algorithm uses a quasi

CC

CEP

ESRPD

RAPT

SHS

SHR

TEMPO

YIN