Children's Speech Data

Project Summary

Improving children's automatic speech recognition technology is a difficult problem. One reason for this is the lack of available children's speech data. This page lists efforts by our lab and our collaborators at Georgia State University to create useful chidren's speech datasets.

The databases listed here are currently under transcription

This work is supported in part by NSF.

Keywords

Children's Speech, ASR, Speech Database

Databases

UCLA JIBO Kids' Database

This dataset contains recordings of approximately 130 children between the ages of 4 and 7 years old, the critical age range for early acquisition of literacy. The children were recorded while they performed educational exercises in reading and pronunciation. Each child was recorded in 3 sessions each lasting about 15 minutes, totaling approximately 90 hours of audio data. The children converse with the social robot, Jibo [1], following a protocol created by experts in early childhood education [2]. A facilitator was also present at each session and intervened verbally if the child had difficulty interacting with the social robot. The child sat approximately two feet away from the robot with a microphone placed equidistantly between them. The children then were administered a portion of the Goldman Fristoe Test of Articulation-3 (GFTA3), a common oral assessment used by speech-language pathologists, as well as exercises in counting and spelling. All children recruited to the study lived in Southern California and were proficient in English. Many of these children spoke second languages at home. The audio was recorded by a Logitech C920 Webcam microphone with a sampling rate of 48kHz in 16-bit wav format. The recordings took place in offices at the student s' schools during the school day and include background noise as one would find in a real use case. The recordings contain both scripted and spontaneous children's speech.

GSU Kids' Database

This database was compiled by Prof. Robin Morris at the Center for Research on Atypical Development and Learning at Georgia State University. This dataset contains recordings of approximately 200 children between the ages of 8 and 10 years old. The children were recorded while performing educational exercises in reading, language, pronunciation with a facilitator. Eachchild was recorded in 5 sessions each lasting about 2 to 10 minutes, totaling approximately 80 hours of audio data. The children then were administered a portion of the GFTA as well as other assessments used by speech-language pathologists, as well as exercises in counting and spelling. All children recruited to the study lived in the Atlanta, Georgia area and were native English speakers. The audio was recorded by a computer microphone with a sampling rate of 44.1kHz. The recordings took place in offices at the students' schools during the school day and include background noise as one would find in a real use case. The recordings contain both scripted and spontaneous children's speech.

[1] ”Jibo Robot - He can’t wait to meet you,” Boston, MA, 2017. [Online].Available: https://www.jibo.com
[2] Gary Yeung, Alison L. Bailey, Amber Afshan, Morgan Tinkler, Marlen Q. Pérez, Alejandra Martin, Anahit A. Pogossian, Samuel Spaulding, Hae Won Park, Manushaqe Muco, Abeer Alwan and Cynthia Breazeal, "A robotic interface for the administration of language, literacy, and speech pathology assessments for children", SLATE, 2019, pp. 41-42.

Back to SPAPL Home Page.

Abeer Alwan (alwan@ee.ucla.edu)