Center for Language Research
University of Aizu
   English  /  日本語     

Search this site
Ultrasound Tongue in Context
The ultrasound image of the surface of the tongue moving during speech can be difficult to interpret. This is because the ultrasound image usually shows only the tongue and not the opposing surfaces (i.e., the hard and soft palates and the teeth). This project involved the combination of ultrasound, CT, and video images, with the help of motion capture data. The resultant movies have been useful for illustrating the physical context of ultrasound data.
・Construction of Midsagittal Vocal Tract Videos from CT, Ultrasound and Motion Capture Data
This paper describes the creation of a movie file of an English speech sample from the Speech Accent Archive. The movie was created by combining ultrasound movies of the tongue, CT images of the vocal tract, and Vicon motion capture data from the skull and jaw. This movie shows all parts of the vocal tract that are related to speech and were recorded successfully. The movie is beneficial in that it places the ultrasound tongue movie in to the vocal tract context. When the subject pronounces [t], the ultrasound tongue line clearly makes contact with the alveolar ridge from the CT image. The movie is anticipated to be helpful to English as a Second Language learners in the acquisition of pronunciation, and future research should confirm the degree to which it is useful.
・Design of an Interactive GUI for Pronunciation Evaluation and Training
Although language learners often desire to improve their pronunciation of a foreign language, the software to help them do so is limited in scope. Most commercial software for pronunciation evaluation and training focuses on the acoustic signal in the evaluation and training of a learner. However, few systems, if any, give visual feedback of the learner’s articulators (lips, tongue, and jaw). In this paper, we describe the ongoing development of a GUI that is programmed in Objective-C for Mac OS X. Our software uses QTKit framework for video recording and playing, and some open source libraries for audio recording, audio playing, and pitch detection. The GUI incorporates and links together many kinds of phonetic data for the pronunciation learner - for example, realtime frontal video of the learner, recorded frontal and side videos of a native speaker’s face during pronunciation, an ultrasound movie of the tongue moving in the mouth, and MRI images of the native speaker’s tongue during the production of all the sounds in the training text.

Synthesis of three movies + palate contour (yellow)
Video camera(front)
Video camera(side)
Ultrasound movie(side)