...we created an absurd factor for illustrative purposes, what we call XYZ contagion, and tested whether the number of X’s, Y’s and Z’s included in messages’ text predicted diffusion...Our analysis found XYZ contagion to be present in four of our six corpora such that the presence of the letters X, Y and Z predicted an increase in message diffusion: COVID-19 tweets...#MeToo tweets...#MuellerReport tweets...2016 US Election tweets...While there was no positive relationship between the presence of X, Y and Z and message diffusion in the #WomensMarch and Post-Brexit tweets, the finding that XYZ contagion passes a key test of robustness, viz. out-of-sample prediction, demonstrates the potential of large-scale social media datasets to contain spurious correlationsAbstract
The ubiquity of social media use and the digital data traces it produces has triggered a potential methodological shift in the psychological sciences away from traditional, laboratory-based experimentation. The hope is that, by using computational social science methods to analyse large-scale observational data from social media, human behaviour can be studied with greater statistical power and ecological validity. However, current standards of null hypothesis significance testing and correlational statistics seem ill-suited to markedly noisy, high-dimensional social media datasets. We explore this point by probing the moral contagion phenomenon, whereby the use of moral-emotional language increases the probability of message spread. Through out-of-sample prediction, model comparisons and specification curve analyses, we find that the moral contagion model performs no better than an implausible XYZ contagion model. This highlights the risks of using purely correlational evidence from large observational datasets and sounds a cautionary note for psychology’s merge with big data.