Model-based predictions for dopamine
Departement of psychology
Princeton Neuroscience Institute
Anticipating the timing of rewards is as crucial to adaptive behavior as predicting what those rewards will be. In the brain, reward learning is understood to depend on dopamine signals that convey a prediction error whenever reward predictions do not accord with reality. Neural and behavioral correlates of reward prediction errors indicate that predictions in the brain can be remarkably temporally precise. Prominent temporal-difference reinforcement-learning models suggest learning is ‘model-free’, and purport to explain how the timing of rewards is learned based on a simplified temporal representation of sequential, momentary, reward predictions that may not be tenable in the biological circuits that support reward learning. This leaves an important question unaddressed: how are temporally precise reward predictions dynamically represented and learned in the brain?
Motivated by recent experimental results that demonstrate a neural dissociation between predictions about the amount and timing of upcoming rewards, I will present a computational framework for reward prediction and learning in which both the value and duration of hidden task states are learned concurrently. This framework proposes a mechanism for learning a representation of hidden task states by tracking the elapsed time between events within a task and suggests that predictions about reward timing act to ‘gate’ the broadcast of reward prediction errors, providing a testable mechanism for the dynamic influence of temporal predictions on the neural computations necessary for reward prediction and learning.