|
Most research on speech recognition focuses on accurate transcription
of words or phonemes. However, prosodic features, including pitch,
loudness, and duration, play a crucial role in spoken language
understanding. In tone languages, the pitch of syllables determines
word meaning; the pitch contour of an utterance can distinguish
a question from a statement. However, most current speech recognition
systems vuew prosodic variation as a source of noise to be normalized
away. In contrast, in this talk we will exploit prosody as a key
source of information for language understanding.
I will describe the role of prosody in language understanding across the linguistic levels from lexical to syntactic to pragmatic and discourse. I will demonstrate the use of prosodic evidence to resolve challenges in speech processing in areas including spoken dialogue, discourse, and tone and pitch accent recognition. I will emphasize common prosodic phenomena and approaches across diverse language families and identify the importance of contextual modeling to compensate for surface variation in prosodic realization. I will also discuss the utility of unsupervised and semi-supervised techniques to overcome the difficulty of obtaining labeled data for some tasks. Finally, I will present remaining challenges and avenues for future work. |