Recent work has shown that observers can parse streams of syllables, tones, or visual shapes and learn statistical regularities in them without conscious intent (e.g., learn that A is always followed by B). Here, we demonstrate that these statistical-learning mechanisms can operate at an abstract, conceptual level. In Experiments 1 and 2, observers incidentally learned which semantic categories of natural scenes covaried (e.g., kitchen scenes were always followed by forest scenes). Stimuli: In Experiments 3 and 4, category learning with images of scenes transferred to words that represented the categories. In each experiment, the category of the scenes was irrelevant to the task. Together, these results suggest that statistical-learning mechanisms can operate at a categorical level, enabling generalization of learned regularities using existing conceptual knowledge. Such mechanisms may guide learning in domains as disparate as the acquisition of causal knowledge and the development of cognitive maps from environmental exploration.From their description of the first experiment:
Stimuli: Twelve scene categories were used (see figure): bathroom, bedroom, bridge, building, coast, field, forest, kitchen, living room, mountain, street, and waterfall.To examine the role of category-level semantics in statistical learning, the authors then moved on to experiments in which the same string of images was never presented twice, but a pattern occurred at the categorical level.
Each category contained 120 different full-color images. For each observer, 1 picture was drawn from each of the 12 categories at random, resulting in a set of 12 different images...Each of the 12 selected images was randomly assigned a position in one of four triplets (e.g., ABC)—sequences of three images that always appeared in the same order. Then a sequence of images was generated by randomly interleaving 75 repetitions of each triplet, with the constraints that the same triplet could never appear twice in a row and the same set of two triplets could never appear twice in a row (e.g., ABCGHIABCGHI was disallowed). In addition, 100 repeat images were inserted into the stream such that sometimes either the first or third image in a triplet repeated immediately (e.g., ABCCGHI or ABCGGHI). Allowing only the first or third image in a triplet to repeat served to keep the triplet structure intact, yet prevented the repeat images from being informative for delineating triplets from one another.
Procedure: Observers watched a 20-min sequence of 1,000 images, presented one at a time for 300 ms each with a 700-ms interstimulus interval (ISI). During this sequence, the task was to detect back-to-back repeats of the same image and to indicate repeats as quickly as possible by hitting the space bar. This cover task was intended to help prevent observers from becoming explicitly aware of the structure in the stream (Turk-Browne et al., 2005), and also avoided having observers simply view the stream passively (which would make it unclear what they were processing). Note that they were never informed that there was any structure in the stream of images...Following this study period, observers were asked if they had recognized any structure in the stream and then were given a surprise forced-choice familiarity test. On each test trial, observers viewed two 3-image test sequences, presented sequentially at the center of the screen with the same ISI as during the study phase and segmented from each other by an additional 1,000-ms pause. One of these test sequences was always a triplet of images that had been seen in the stream (e.g., ABC), and another was a foil constructed from images from three different triplets (e.g., AEI). After the presentation of the two test sequences, observers were told to press either the "1" or the "2" key to indicate whether the first or second test sequence seemed more familiar from the initial study period. Each of the four triplets was tested eight times, paired twice with each of four different foil sequences (AEI, DHL, GKC, JBF), for a total of 32 test trials. Observers' ability to discriminate triplet sequences from foil sequences was used as a measure of statistical learning.
Results and Discussion: All 10 of the observers completed the repeat-detection task during the study period with few errors, detecting an average of 91% of the repetitions (SD= 5%) and committing between one and five false alarms. These results demonstrate that observers were attending to the sequence of images. However, when asked, no observers reported explicitly noticing that the study stream had any structure.1 Nonetheless, performance on the familiarity test indicated very robust statistical learning, with triplets being successfully discriminated from foils (86.6% of the test sequences chosen were triplets, and 13.4% were foils), t(9) = 8.72, p= .00001.
These results extend previous demonstrations of visual statistical learning in two ways. First, they demonstrate visual statistical learning for scene stimuli, which are more complicated and information rich than the stimuli for which statistical learning has been demonstrated previously. Second, choosing the correct triplets at test in this experiment required not just forming episodic associations between the correct pictures, but also overcoming prior knowledge about how the scenes represented are associated in the world (e.g., bridges are rarely associated with living rooms).
In this experiment, learning likely occurred at the image level, because identical stimuli were repeated throughout the learning and test phases (and statistical learning has been previously demonstrated for shape and color.