The data clearly show that both low-frequency phase (delta-theta, 1–7 Hz) and high gamma power (70–150 Hz) yield consistent trial-to-trial responses to speech. Other frequency bands do not, nor does low frequency power—adding weight to the argument that speech tracking is partly due to entrainment of endogenous rhythms. However, these effects are not equally distributed across cortical areas. The high-gamma tracking tends to be clustered in the superior temporal lobe and the low-frequency phase response is more widespread, including superior and anterior temporal regions and inferior parietal and frontal lobes. Across electrodes though, both the low-frequency phase and high-gamma power showed more consistent responses to the attended versus the ignored speech. Corroborating this observation, speech envelope acoustics could only be reconstructed from neural responses for the attended talker, not the unattended.The Zion et al. abstract:
The ability to focus on and understand one talker in a noisy social environment is a critical social-cognitive capacity, whose underlying neuronal mechanisms are unclear. We investigated the manner in which speech streams are represented in brain activity and the way that selective attention governs the brain’s representation of speech using a “Cocktail Party” paradigm, coupled with direct recordings from the cortical surface in surgical epilepsy patients. We find that brain activity dynamically tracks speech streams using both low-frequency phase and high-frequency amplitude fluctuations and that optimal encoding likely combines the two. In and near low-level auditory cortices, attention “modulates” the representation by enhancing cortical tracking of attended speech streams, but ignored speech remains represented. In higher-order regions, the representation appears to become more “selective,” in that there is no detectable tracking of ignored speech. This selectivity itself seems to sharpen as a sentence unfolds.