The United States spends more than $250 million each year on the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed several years. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may become an increasingly practical supplement to the ACS. Here, we present a method that estimates socioeconomic characteristics of regions spanning 200 US cities by using 50 million images of street scenes gathered with Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22 million automobiles in total (8% of all automobiles in the United States), were used to accurately estimate income, race, education, and voting patterns at the zip code and precinct level. (The average US precinct contains ∼1,000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographics may effectively complement labor-intensive approaches, with the potential to measure demographics with fine spatial resolution, in close to real time.From the summary by Ingraham:
...The 22 million vehicles in the Google Street View database comprise roughly 8 percent of all vehicles in the United States...the researchers first paired the Zip code-level vehicle data with numbers on race, income and education from the U.S. Census Bureau'sAmerican Community Survey. They did this for a random 15 percent of the Zip codes in their data set to create a “training set.” They then created another algorithm to go through the training set to see how vehicle characteristics correlated with neighborhood characteristics: What kinds of vehicles are disproportionately likely to appear in white neighborhoods, or black ones? Low-income vs. high-income? Highly-educated areas vs. less-educated ones?
You can do similar exercises for other demographic characteristics, like educational attainment. People with graduate degrees were more likely to drive Audi hatchbacks with high city MPG. Those with less than a high school education, on the other hand, were more likely to drive cars made by U.S. manufacturers in the 1990s.
“We found a strong correlation between our results and ACS [American Community Survey] values for every demographic statistic we examined,” the researchers wrote. They plotted the algorithm's demographic estimates against the actual numbers from the ACS and measured their correlation coefficient: a number from zero (no correlation) to 1 (perfect correlation) that measures how accurately one set of numbers can predict the variation in a separate set of numbers.
At the city level, the algorithm did a particularly good job of predicting the percent of Asians (correlation coefficient of 0.87), blacks (0.82) and whites (0.77). It also predicted median household income (0.82) quite well. On measures of educational attainment, the correlation coefficients ran from about 0.54 to 0.70 — again, not perfect, but fairly impressive accuracy considering the predictions derived solely from auto information and nothing else.