Prosody


Abstract
 
Most research on speech recognition focuses on accurate transcription of words or phonemes. However, prosodic features, including pitch, loudness, and duration, play a crucial role in spoken language understanding. In tone languages, the pitch of syllables determines word meaning; the pitch contour of an utterance can distinguish a question from a statement. However, most current speech recognition systems vuew prosodic variation as a source of noise to be normalized away. In contrast, in this talk we will exploit prosody as a key source of information for language understanding.

I will describe the role of prosody in language understanding across the linguistic levels from lexical to syntactic to pragmatic and discourse. I will demonstrate the use of prosodic evidence to resolve challenges in speech processing in areas including spoken dialogue, discourse, and tone and pitch accent recognition. I will emphasize common prosodic phenomena and approaches across diverse language families and identify the importance of contextual modeling to compensate for surface variation in prosodic realization. I will also discuss the utility of unsupervised and semi-supervised techniques to overcome the difficulty of obtaining labeled data for some tasks. Finally, I will present remaining challenges and avenues for future work.



Back to Schedule