Dodds et al.
have constructed 24 corpora (collections of writing) spread across 10 languages: English, Spanish, French, German, Brazilian Portuguese, Korean, Chinese (Simplified), Russian, Indonesian, and Arabic, including books, news outlets, social media, the, television and movie subtitles, and music lyrics. They note the most commonly used words, and how those words are perceived by individuals (on a happiness scale of 1-9) to provide a clear confirmation of the Pollyanna hypothesis suggested in 1969 by Boucher and Osgood - that there is a universal positivity bias in human communication. The authors illustrate the use of their "hedonometer", a language-based instrument for measuring expressed happiness, by constructing “happiness time series” for three famous works of literature, evaluated in their original languages of English, Russian, and French, respectively: Melville’s Moby Dick, Dostoyevsky’s Crime and Punishment, and Dumas’ The Count of Monte Cristo. Their abstract:
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a universal positivity bias, (ii) the estimated emotional content of words is consistent between languages under translation, and (iii) this positivity bias is strongly independent of frequency of word use. Alongside these general regularities, we describe interlanguage variations in the emotional spectrum of languages that allow us to rank corpora. We also show how our word evaluations can be used to construct physical-like instruments for both real-time and offline measurement of the emotional content of large-scale texts.