Children's Speech Data
Improving children's automatic speech recognition technology is a difficult problem.
One reason for this is the lack of available children's speech data.
This page lists efforts by our lab and our collaborators at Georgia State University
to create useful chidren's speech datasets.
The databases listed here are currently under transcription and some portions are
available to be shared in limited capacity. If your organization would like use of
the data described here, please fill out the data interest
This work is supported in part
Children's Speech, ASR, Speech Database
UCLA JIBO Kids' Database
This dataset contains recordings of approximately 130 children between the ages of 4 and 7
years old, the critical age range for early acquisition of literacy. The children
were recorded while they performed educational exercises in reading and pronunciation.
Each child was recorded in 3 sessions each lasting about 15 minutes, totaling
approximately 90 hours of audio data. The children converse with the social robot,
Jibo , following a protocol created by experts in early childhood education .
A facilitator was also present at each session and intervened verbally if the child had
difficulty interacting with the social robot. The child sat approximately two feet
away from the robot with a microphone placed equidistantly between them. The children
then were administered a portion of the Goldman Fristoe Test of Articulation-3 (GFTA3),
a common oral assessment used by speech-language pathologists, as well as exercises
in counting and spelling. All children recruited to the study lived in Southern
California and were proficient in English. Many of these children spoke second languages
at home. The audio was recorded by a Logitech C920 Webcam microphone with a sampling rate of
48kHz in 16-bit wav format. The recordings took place in offices at the student
s' schools during the school day and include background noise
as one would find in a real use case. The recordings contain
both scripted and spontaneous children's speech.
GSU Kids' Database
This database was compiled by Prof. Robin Morris at the Center for Research on Atypical
Development and Learning at Georgia State University. This dataset contains
recordings of approximately 200 children between the ages of 8 and 10 years old.
The children were recorded while performing educational exercises in reading, language,
pronunciation with a facilitator. Eachchild was recorded in 5 sessions each lasting
about 2 to 10 minutes, totaling approximately 80 hours of audio data. The children
then were administered a portion of the GFTA as well as other assessments used by
speech-language pathologists, as well as exercises in counting and spelling.
All children recruited to the study lived in the Atlanta, Georgia area and were native
English speakers. The audio was recorded by a computer microphone with a sampling
rate of 44.1kHz. The recordings took place in offices at the students'
schools during the school day and include background noise as one would
find in a real use case. The recordings contain both scripted and spontaneous children's speech.
 ”Jibo Robot - He can’t wait to meet you,” Boston, MA, 2017. [Online].Available: https://www.jibo.com
 Gary Yeung, Alison L. Bailey, Amber Afshan, Morgan Tinkler, Marlen Q. Pérez, Alejandra Martin, Anahit A. Pogossian, Samuel Spaulding, Hae Won Park, Manushaqe Muco, Abeer Alwan and Cynthia Breazeal, "A robotic interface for the administration of language, literacy, and speech pathology assessments for children", SLATE, 2019, pp. 41-42.
Back to SPAPL Home Page.