With billions of users and hundreds of billions of tweets and posts every year, social media has brought big data to social science. It has also opened an unprecedented opportunity to use artificial intelligence (AI) to glean meaning from the mass of human communications, psychologist Martin Seligman has recognized. At the University of Pennsylvania's Positive Psychology Center, he and more than 20 psychologists, physicians, and computer scientists in the World Well-Being Project use machine learning and natural language processing to sift through gobs of data to gauge the public's emotional and physical health.
That's traditionally done with surveys. But social media data are “unobtrusive, it's very inexpensive, and the numbers you get are orders of magnitude greater,” Seligman says. It is also messy, but AI offers a powerful way to reveal patterns.
In one recent study, Seligman and his colleagues looked at the Facebook updates of 29,000 users who had taken a self-assessment of depression. Using data from 28,000 of the users, a machine-learning algorithm found associations between words in the updates and depression levels. It could then successfully gauge depression in the other users based only on their updates.
In another study, the team predicted county-level heart disease mortality rates by analyzing 148 million tweets; words related to anger and negative relationships turned out to be risk factors. The predictions from social media matched actual mortality rates more closely than did predictions based on 10 leading risk factors, such as smoking and diabetes. The researchers have also used social media to predict personality, income, and political ideology, and to study hospital care, mystical experiences, and stereotypes. The team has even created a map coloring each U.S. county according to well-being, depression, trust, and five personality traits, as inferred from Twitter (wwbp.org).
“There's a revolution going on in the analysis of language and its links to psychology,” says James Pennebaker, a social psychologist at the University of Texas in Austin. He focuses not on content but style, and has found, for example, that the use of function words in a college admissions essay can predict grades. Articles and prepositions indicate analytical thinking and predict higher grades; pronouns and adverbs indicate narrative thinking and predict lower grades. He also found support for suggestions that much of the 1728 play Double Falsehood was likely written by William Shakespeare: Machine-learning algorithms matched it to Shakespeare's other works based on factors such as cognitive complexity and rare words. “Now, we can analyze everything that you've ever posted, ever written, and increasingly how you and Alexa talk,” Pennebaker says. The result: “richer and richer pictures of who people are.”