Matalon et al. (open source) offer a fascinating study illustrating the linguistic structuring of prosody -the communication of meaning through the tone and inflection of our speaking:
Significance
In
conversation, prosody complements words, forming a structured
communication system distinct from, yet connected to, text. By analyzing
large datasets of spontaneous conversations and clustering similar
snippets of speech, we identify the fundamental building blocks of this
system. Our findings reveal a prosodic vocabulary of a few hundred
patterns (far fewer than the thousands of words in a core verbal
vocabulary), which fulfill interactional and attitudinal functions. Just
as syntax governs word combinations, we observe recurring prosodic
structures where certain patterns follow others more frequently than
chance. Such ubiquitous pairs were not detected in scripted speech.
These results provide data-driven support for the analogy of prosody to a
linguistic system with its own vocabulary, semantics, and a simple
syntax.
Abstract
Prosody,
the musical facet of speech, is pivotal in human communication, and its
structure and meaning remain subjects of ongoing research. In this
study, we introduce a data-driven model for English prosody, based on
large-scale analysis of spontaneous conversations. As a first step, we
identify approximately 200 discernible prosodic patterns—which we view
as building blocks of the prosodic vocabulary—and outline their
properties and range of meanings. Next, we reveal a Markovian logic,
akin to a syntax, for concatenating these elementary building blocks
into coherent utterances. We identify distinct compound functions
associated with pairs of consecutive patterns and show that the
Markovian syntax is more prevalent in spontaneous prosody, as compared
to scripted speech. These findings offer invaluable insights into the
underlying mechanisms of conversational prosody: They empirically inform
and refine existing theoretical concepts. The methodology we present,
combining unsupervised analysis of large datasets of spontaneous speech
with manual sampling of the results, could guide future research aimed
at refining our model and expanding it to other languages.