In social encounters with strangers, human beings are able to form high-level social representations from very thin slices of expressive behavior and quickly determine whether the other is a friend or a foe and whether they have the ability to enact their good or bad intentions. While much is already known about how facial features contribute to such evaluations, determinants of social judgments in the auditory modality remain poorly understood.
Anthropologists, linguists, and psychologists have noted regularities of pitch contours in social speech for decades. Notably, patterns of high or rising pitch are associated with social traits such as submissiveness or lack of confidence, and low or falling pitch with dominance or self-confidence, a code that has been proposed to be universal across species. Unfortunately, because these observations stem either from acoustic analysis of a limited number of actor-produced utterances or from the linguistic analysis of small ecological corpora, it has remained difficult to attest of their generality and causality in cognitive mechanisms, and we still do not know what exact pitch contour maximally elicits social percepts.
Inspired by a recent series of powerful data-driven studies in visual cognition in which facial prototypes of social traits were derived from human judgments of thousands of computer-generated visual stimuli, we developed a voice-processing algorithm able to manipulate the temporal pitch dynamics of arbitrary recorded voices in a way that is both fully parametric and realistic and used this technique to generate thousands of novel, natural-sounding variants of the same word utterance, each with a randomly manipulated pitch contour. We then asked human listeners to evaluate the social state of the speakers for each of these manipulated stimuli and reconstructed their mental representation of what speech prosody drives such judgments, using the psychophysical technique of reverse correlation.Here is their full abstract:
Human listeners excel at forming high-level social representations about each other, even from the briefest of utterances. In particular, pitch is widely recognized as the auditory dimension that conveys most of the information about a speaker’s traits, emotional states, and attitudes. While past research has primarily looked at the influence of mean pitch, almost nothing is known about how intonation patterns, i.e., finely tuned pitch trajectories around the mean, may determine social judgments in speech. Here, we introduce an experimental paradigm that combines state-of-the-art voice transformation algorithms with psychophysical reverse correlation and show that two of the most important dimensions of social judgments, a speaker’s perceived dominance and trustworthiness, are driven by robust and distinguishing pitch trajectories in short utterances like the word “Hello,” which remained remarkably stable whether male or female listeners judged male or female speakers. These findings reveal a unique communicative adaptation that enables listeners to infer social traits regardless of speakers’ physical characteristics, such as sex and mean pitch. By characterizing how any given individual’s mental representations may differ from this generic code, the method introduced here opens avenues to explore dysprosody and social-cognitive deficits in disorders like autism spectrum and schizophrenia. In addition, once derived experimentally, these prototypes can be applied to novel utterances, thus providing a principled way to modulate personality impressions in arbitrary speech signals.