Thanks to Prof. Abeer Alwan's guidance
and support, I received my PhD degree in Electrical Engineering at the
end of 2011.
My gratitude also goes to Prof. Daniel
Blumstein, Mihaela van
der Schaar, and Kung Yao for being on my doctoral committee. My
Robust Signal Processing for Human Pitch Tracking and Bird Song
Classification and Detection" and defense
slides are available for download (*) (**).
I am working on speech processing, speech recognition, and language
identification as a speech scientist at Voci Technologies. If you would
like to initiate a discussion with me, please feel free to drop a line
to me: firstname.lastname@example.org.
(*) The statistical algorithm for F0
estimation (SAFE) toolkit is available for download from here! Welcome
to test it!
(**) The Rocky Mountain Biological Laboratory Robin song (RMBL-Robin)
database is available for download from here! Feedbacks
• W. Chu and A. Alwan,
a statistical approach to F0 estimation under clean and noisy conditions,”
IEEE Trans. on Audio, Speech, and Language Processing, Vol. 20, No. 3,
pp. 933-967, 2012. [slides]
• W. Chu and A. Alwan,
a filter bank EM algorithm for the joint optimization of features and
acoustic model parameters in bird call classification,”
Interspeech 2012, pp. 1993-1996. [poster]
• W. Chu and D.T.
robust bird song detection using syllable pattern-based hidden Markov
models,” ICASSP 2011, pp. 345-348. [poster]
• W. Chu and A. Alwan, "SAFE:
a statistical algorithm for F0 estimation for both clean and noisy
speech," Interspeech 2010, pp. 2590-2593. [slides]
• W. Chu and A. Alwan,
correlation-maximization denoising filter used as an enhancement
frontend for noise robust bird call classification,”
InterSpeech 2009, pp. 2831-2834. [slides]
• W. Chu and A. Alwan, "Reducing
F0 frame error of F0 tracking algorithms under noisy conditions with an
unvoiced/voiced classification frontend," ICASSP 2009,
• W. Chu and J. Liu, "Using
Confidence Measures to Evaluate the Speaker Turns in Speaker
Segmentation," Proc of Intl Conf on Information Sciences, Signal
Processing and its Application (ISSPA07).
• W Chu and J. Liu, "Subband
Energy Distance Measure Applied in Multi-Pass Speech/Non-Speech
Discrimination," Proc of Intl Conf on Information Sciences, Signal
Processing and its Application (ISSPA07).
• W. Chu, X. Xiao, J. Liu, "Confidence
Score Based Unsupervised Incremental Adaptation for OOV Words Detection,"
Proc of Intl Workshops on Statistical Techniques in Pattern Recognition
• Voci Technologies 01/2012
– Speech processing, speech recognition, and language recognition.
• Speech Processing and Audio
Perception Lab, UCLA 09/2007 - 12/2011
Research Assistant, Advisor: Prof.
Noise robust F0 estimation and tracking
* Proposed SAFE - a Statistical Algorithm for F0
Estimation under both clean and noisy condition. The statistical
framework is promising in modeling the effect of the noise on Prominent
SNR Peaks in the spectra given F0. Working on incorporating statistical
modeling of F0 transition into SAFE to deliver an F0 tracker.
* Proposed an error metric called F0 Frame Error
which is a combination of Gross Pitch Error and Voice Decision Error to
compare the performance of F0 tracking algorithms in a unified
framework. Used a statistical-based voiced/unvoiced classification
frontend to reduce Voice Decision Errors under noisy conditions.
Bird song classification, recognition, and detection
* Extended the EM algorithm to jointly estimate
optimal center frequencies and bandwidths of the filter bank in
cepstral feature extraction, and model parameters in bird call
classification. Proposed an extended auxiliary function in which
feature extraction and model parameters are updated iteratively and
* Used hierarchical clustering analysis to infer
bird syllable patterns for finer acoustic modeling. Compared to using
one single general pattern for all syllables, both of the precision and
recall rates of the syllable pattern-based HMM bird song detector are
increased. The algorithm is being transplanted onto a hand-held device.
* Proposed a correlation-maximization denoising
filter for reducing the non-periodic noise in the bird calls which have
periodic structure. Compared to the Wiener filter, features extracted
from the output of the proposed filter resulted in a lower bird call
classification error rate.
• Speech Group, Disney
Research, Pittsburgh 06/2010 - 09/2010
Summer Intern, Mentor: Dr. John McDonough
and Prof. Bhiksha Raj
– Used microphone array processing and speech recognition
technologies to build an interactively storytelling demo for children.
Understood Acoustic Echo Cancellation and Weighted Finite State
Transducer-based speech recognition. Learned how to collect, annotate,
and maintain an audio-visual children speech database..
• Speech Lab, Rosetta Stone 06/2009 - 08/2009
Summer Intern, Mentor: Dr. Bryan Pellom
and Dr. Kadri Hacioglu
– Developed statistical-based methods for deciding the
pronunciation of a word. Understood the rule-based and maximum entropy
criterion-based modelling techniques used in Machine Translation and
applied them in the Letter-To-Sound conversion. Wrote an A* search
routine in C++.
• Speech Group, Mitsubishi Electric Research Lab 06/2008 -
Summer Intern, Mentor: Prof.
– Developed a discriminative training module (lattice-based MMI)
on Sphinx speech recognizer. Also explored how initial model parameters
can affect the final model parameters in an iterative learning process.
Understood the Maximum Likelihood estimation, the Baum-Welch algorithm,
and the Extended Baum-Welch algorithm.
• Speech Group, Microsoft Research Asia, Beijing 04/2007 -
Summer Intern, Mentor: Dr. Chao Huang
– Built a demo for detecting acoustic events (speech, music, ring
tone, background noise) in an office environment. Compared the
effectiveness of noise robust features, Gaussian mixture model and
hidden Markov model, MAP and MLLR unsupervised adaptations. Learned how
to manage job queues on computing clusters.
• Microprocessor Tech Lab, Intel China Research Center, Beijing
07/2006 - 10/2006
Research Intern, Mentor: Dr. Wei Hu
– Built a demo for locating and tracking the voice of actors and
actresses in TV series and movies. Used Bayesian Information Criterion
to unsupervisedly segment and cluster speakers in the audio stream.
• Tsinghua University, Beijing 09/2004 - 07/2007
Research Assistant, Advisor: Prof. Jia Liu
Master thesis work: implemented a real-time Speech-To-Text
system with a non-speech input rejection frontend on chip. Developed a
non-speech removal frontend for national '863' and '242' keyword
• UFIDA Software Corp., Beijing 02/2004 - 06/2004
Software Intern, Supervisor: Mr. Yu Zhu
– Bachelor thesis work: created the index of the digital map for
an on-vehicle GPS software system..
updated: June 21th, 2012.
The Statistical Algorithm for F0 Estimation (SAFE) toolkit
is available for download from here! Welcome to test
The Rocky Mountain
Biological Laboratory Robin song (RMBL-Robin) database is available for
download from here! Feedbacks are
Chen (now with University of Missouri,
Dian Gong (now with University of
Yiting Liao (now with Intel)
Chanwoo Kim (now with Microsoft)
Chao Qin (now with University of
Long Qin (now with Carnegie Mellon
Marek Vondrak (now with Brown
Xin Yan (now with
Pennsylvania State University)
Qiao Yu (now with
China Academy of Science, Shenzhen, China)
Yue Zhao (now with University of
California, Los Angeles)
Xiaodan Zhuang (now with BBN)
• Speech Processing Tutorial:
ICSI Speech FAQ
• Hidden Markov Model
* L. Rabiner
tutorial on hidden Markov models and selected applications in speech
recognition," Proceedings of the IEEE, 77 (2), pp.
257–286, February 1989.
HTK BOOK 3.4 (for
my own use)
* Mark Hasegawa-Johnson's
Speech Mini Course and HTK
• Gaussian Mixture Model
- Tutorial: Reynolds, Douglas A.,
Quatieri, Thomas F., and Dunn, Robert B., "Speaker
verification using adapted Gaussian mixture models," Digital
Signal Processing, Vol. 10, No. 1-3, pp. 19-41, January 2000.
- Toolkit: My GMM classifier written in C (available for
• Support Vector Machine
- Tutorial: SVM on wiki
- Toolkit: LIBSVM
• Large-Margin Training
- Fei Sha, "Large
margin training of acoustic models for speech recognition,", PhD
Hui Jiang, "Large
margin hidden Markov models for speech recognition", IEEE Trans. On Audio, Speech and Language
Processing, pp.1584-1595, Vol. 14, No. 5, September 2006.
F0 Tracking or Pitch Detection Algorithm
* L. Rabiner, M. Cheng, A. Rosenberg, and C.
comparative performance study of several pitch detection algorithms,"
IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 24, no.
5, pp. 399–418, 1976.
(a statistical algorithm for F0 estimation)
(has visualization function)
ESPS get_f0: D. Talkin, "Robust algorithm for pitch tracking,"
Speech Coding and Synthesis, pp. 497–518, 1995. Wavesurfer (use ESPS
get_f0, with visualization function)
TEMPO (a part of STRAIGHT toolkit):
* YIN (for
the pitch of music)
(a multi-pitch tracker)