In the absence of other options, we use trial-and-error (reinforcement) learning to discover which of our actions are most likely to yield rewards. We can avoid multiple errors, however, if we receive some instruction on our choice selections. Li et al., (open access) observe the brain areas whose activations correlate with these two approaches by designing an experiment with two sessions using a simple probabilistic reward task. In the “feedback” session, participants’ choices were only based on the win/loss feedback, and in the “instructed” session participants could also incorporate the correct cue-reward probability information provided by experimenter to guide choice behavior (see Figure 1 for experimental design). The bottom line is that we use our dorsolateral prefrontal cortex to dynamically adjust outcome responses in valuation regions depending on the usefulness of action-outcome information. Here is their abstract:
Recent research in neuroeconomics has demonstrated that the reinforcement learning model of reward learning captures the patterns of both behavioral performance and neural responses during a range of economic decision-making tasks. However, this powerful theoretical model has its limits. Trial-and-error is only one of the means by which individuals can learn the value associated with different decision options. Humans have also developed efficient, symbolic means of communication for learning without the necessity for committing multiple errors across trials. In the present study, we observed that instructed knowledge of cue-reward probabilities improves behavioral performance and diminishes reinforcement learning-related blood-oxygen level-dependent (BOLD) responses to feedback in the nucleus accumbens, ventromedial prefrontal cortex, and hippocampal complex. The decrease in BOLD responses in these brain regions to reward-feedback signals was functionally correlated with activation of the dorsolateral prefrontal cortex (DLPFC). These results suggest that when learning action values, participants use the DLPFC to dynamically adjust outcome responses in valuation regions depending on the usefulness of action-outcome information.