Perception cannot rely solely on bottom-up processes, whereby patterns of receptor stimulation are passed up the hierarchy to generate a corresponding awareness. Such bottom-up processes would always generate experiences that are out-of-date and saturated by noise. Predictive processes are thought to play a key role in overcoming these problems, allowing us to generate best guesses concerning the likely sensorium, and highlighting quickly when the world is not as we expect. Action provides a crucial predictive source and a mechanism for us to resolve uncertainty and surprise, but further complicates our understanding due to further predictive cues and continuous change of sensory input. Another agent who can also change the world and who we seek to understand adds another layer of complexity yet. How can we understand the predictive mechanisms supporting social interaction and understanding, with such a multitude of moving and interacting components? In this special issue, Keysers et al. (2024) outline how predictive coding can be applied to understanding the actions and emotions of others, with Mayo and Shamay-Tsoory (2024) discussing how these mutual predictions might shape social learning. They suggest that such social learning might be supported by interbrain synchronization and Antonelli et al. (2024) discuss the critical role of emotion in shaping these multibrain dynamics.While it is clearly crucial that we understand the nature of the mechanisms underlying social interactions, we wish to highlight the challenges of this complexity for scientific progress. Particularly, how to find ways to properly test, refute, and improve our models, when the assumed supporting mechanisms are so complex.How predictions shape neural processing is thought to differ across space and time, even for processing of the simplest (non-social; static) elements of our environment. Keysers et al. (2024) highlight the assumed neural interactions across cortical layers, such that predictions are passed down the hierarchy to hypothesis units in deep (and perhaps superficial) cortical layers, input arrives in middle layers, and error signals are calculated and represented in superficial layers. This idea is supported by recent 7 T MRI work from our lab demonstrating increased decoding of predicted Gabor orientations in deep layers of primary visual cortex, with an advantage for unpredicted orientations in superficial layers (Thomas et al., 2024). Recent evidence suggests opposing influences at the temporal level as well (McDermott et al., 2024). This electroencephalography (EEG) study found that early perceptual processing is biased towards what we expect (< 200 ms; optimizing veridicality) with the advantage flipping in later timeranges (> 200 ms; optimizing informativeness – in line with the opposing process account proposed in Press et al., 2020). Building testable mechanistic accounts of these interactions across time and space – even for the simple perception of deterministic sequences of Gabor patches – represents a continued puzzle for future work.In the social domain, the stimuli are by their nature highly complex and dynamic (Keysers et al., 2024). Therefore, these above interactions across space and time must be continuously updated. Despite this complexity, there is some evidence cited by Keysers et al. (2024) inline with the above laminar conclusions in simpler environments. Specifically, there is increased deep-layer information about observed actions in parietal cortex when presented in a predictable order, mediated via feedback connections (from premotor cortex). Social domains also yield multiple sources of prediction about the self and other (Mayo and Shamay-Tsoory, 2024) and we must determine how we weight the precision, or reliability, of these different sources, as well as how we render information about the self and other separable. Is this achieved by different cell populations coding information about the self and other (Mayo and Shamay-Tsoory, 2024)? Or could mechanisms similar to those proposed to distinguish products of imagination from reality (similarly internal vs external sources), also help in determining the information source in social situations?Social predictions might be supported by interbrain synchronization (measured via hyperscanning), as discussed by Mayo and Shamay-Tsoory (2024); focus on social learning) and Antonelli et al. (2024); focus on emotion). We propose that one key challenge for this approach is determining the role played by different event-related inputs and responses in the effects: Interpretation of hyperscanning data is plagued by the problem that brains will be “in synch” if two individuals are either perceiving the same events or producing the same behaviour. The brain’s responses to moving our arm or looking at a face are remarkably similar across individuals, such that if two of us perceive or produce the same event our neural response will be matched. Fluctuations in synchronisation according to, e.g., dominance of individuals or levels of excitement on stage, could be determined by fluctuations in whether we attend to, or produce, the same events. It is crucial to understand the fascinating influence of these effects on synchronisation.