A computational model to study the dynamics of representations of rewards in the orbital and medial frontal cortex
Inria-Labri UMR 5800 Talence
IMN - UMR 5293 Bordeaux Neurocampus
In Machine Learning, value prediction and decision making are implemented in classical models of reinforcement learning evaluating state and action values. Considering limitations of these models, inspiration from neuroscience can be seen as a valuable way to revisit these algorithms. But a thorough study of the literature in neuroscience indicates diverging interpretations about the roles of the main regions of the orbital and medial frontal cortex, collectively reported to play a central role in extracting and representing these values. A reason for these divergences is that reported experiments were not associated to the same kind of behaviors and most of the time in non ecological conditions. We report here about a large scale model that has been designed to associate different frontal regions (and not to focus on a specific one) and to implement an ecological behavior associating foraging and decision making when an interesting place has been detected. This global model allows to re-interpret several observations and is consistent with a wide series of behavioral experiments. Particularly, it reveals an interesting dynamics between the lateral and medial regions of the orbitofrontal cortex, respectively representing the sensory and the rewarding values of outcomes. This dynamics can be also exploited in a purely Machine Learning perspective to design an artificial agent able to autonomously identify goals and to exploit them in various perspectives.