Conférences Bordeaux Neurocampus

Mehdi Khamassi

Exploiting individual differences to inform computational models of dopamine in reinforcement learning.

Mehdi Khamassi

Sorbonne Université, CNRS, Institute of Intelligent Systems and Robotics, F-75005 Paris, France.

The model-free reinforcement learning (MFRL) framework, where agents learn local, cached, implicit action values without trying to estimate a model of their environment, has been successfully applied to Neuroscience in the last two decades. It can account for most dopamine reward prediction error signals (RPEs) in Pavlovian and instrumental tasks. However, it is still not clear why in the Pavlovian autoshaping paradigm RPEs can been recorded in some individuals but not in others. Moreover, the role of dopamine in other functions not related to learning, such as exploration, is still not understood. Here we present a computational model of dopamine-based learning and exploration regulation, and show how it can account for inter-individual differences, termed sign-tracking and goal-tracking, in a Pavlovian lever-autoshaping procedure. The model combines MFRL with a model-based (MBRL) component which estimates an internal model of different actions’ consequences in the environment. This model can explain inter-individual differences by a different relative contribution weight of MFRL and MBRL in choice behavior: The behavior of sign-trackers is mostly led by MFRL, displaying dopamine RPEs and pushing them towards reward predicting stimuli; In contrast the behavior of goal-trackers is mostly led by MBRL, where dopamine RPEs are absent, and pushing them towards the outcome of their behavioral responses. Moreover, the model suggests that injection of flupenthixol in goal-trackers impairs the exploration-exploitation trade-off and thus blocks the expression of a covert dopamine-independent MBRL learning process. This model led to a series of novel predictions produced through simulations in variants of the classic autoshaping procedure. Most notably, the model predicts that sign-trackers would be less sensitive to outcome devaluation than goal-trackers, which has recently been confirmed experimentally. The model also predicts that changing the duration of the inter-trial interval should change the relative proportion of sign- and goal-trackers in the population as well as the dopamine RPE profile in these individuals. Recent experimental results collected by the group of Matthew R. Roesch at the University of Maryland, USA, partially confirming these predictions will be presented. Other recent results collected by the group of Etienne Coutureau and Alain Marchand at the CNRS in Bordeaux, also confirm some of the model predictions, namely that dopamine blockage impairs exploration regulation mechanisms in rats. Together these results suggest a variety of mechanisms on which dopamine can impact, which can be progressively better understood through the tight collaboration between experimentation and computational modeling. Moreover, sign- and goal-tracking behavior having been related to different vulnerability degrees to drug seeking, this computational model may have impacts on the understanding of some individuals’ transition to addiction.