A fascinating perspective from Conwell et al. (open source). Affectless visual machines explain a majority of variance in human visually evoked affect:
Significance
Human
visual experience is defined not only by the light reflecting on our
eyes (sensation), but by the feelings (affect) we feel concurrently.
Psychological theories about where these feelings come from often focus
mostly on the role of changes in our bodily states (physiology) or on
our conscious thoughts about the things we are seeing (cognition). Far
less frequently do these theories focus on the role of seeing itself
(perception). In this research, we show that machine vision
systems—which have neither bodily states nor conscious thoughts—can
predict with remarkable accuracy how humans will feel about the things
they look at. This suggests that perceptual processes (built on rich
sensory experiences) may shape what we feel about the world around us
far more than many psychological theories suggest.
Abstract
Looking
at the world often involves not just seeing things, but feeling things.
Modern feedforward machine vision systems that learn to perceive the
world in the absence of active physiology, deliberative thought, or any
form of feedback that resembles human affective experience offer tools
to demystify the relationship between seeing and feeling, and to assess
how much of visually evoked affective experiences may be a
straightforward function of representation learning over natural image
statistics. In this work, we deploy a diverse sample of 180
state-of-the-art deep neural network models trained only on canonical
computer vision tasks to predict human ratings of arousal, valence, and
beauty for images from multiple categories (objects, faces, landscapes,
art) across two datasets. Importantly, we use the features of these
models without additional learning, linearly decoding human affective
responses from network activity in much the same way neuroscientists
decode information from neural recordings. Aggregate analysis across our
survey, demonstrates that predictions from purely perceptual models
explain a majority of the explainable variance in average ratings of
arousal, valence, and beauty alike. Finer-grained analysis within our
survey (e.g. comparisons between shallower and deeper layers, or between
randomly initialized, category-supervised, and self-supervised models)
point to rich, preconceptual abstraction (learned from diversity of
visual experience) as a key driver of these predictions. Taken together,
these results provide further computational evidence for an
information-processing account of visually evoked affect linked directly
to efficient representation learning over natural image statistics, and
hint at a computational locus of affective and aesthetic valuation
immediately proximate to perception.
No comments:
Post a Comment