Rethinking Rewards in Reinforcement Learning
Add to Google Calendar
In the computational reinforcement learning (RL) framework, rewards—more specifically, reward functions—determine the problem the learning agent is trying to solve. Properties of the reward function influence how easy or hard the problem is, and how well an agent may do, but RL theory and algorithms are completely insensitive to the source of rewards. This is a strength of the framework because of the generality it confers, but it is also a weakness because it defers key questions about the nature of reward functions. In this talk, I address this weakness from two directions. First, I consider the role of evolution in determining where rewards come from in natural agents. Specifically, I present a computational framework in which evolved rewards capture regularities across environments leaving the agent to learn regularities within its environment during its lifetime. Second, I describe how in designing artificial agents the current use of rewards confounds their role in defining preferences over behaviors and their role as parameters of actual agent behavior (RL agents act so as to maximize reward). Disentangling this "preferences parameters confound" can be beneficial in designing artificial agents. I will present many empirical illustrations of both of these aspects of rethinking rewards in RL.
This talk describes joint work with Richard Lewis, Andrew G. Barto, Jonathan Sorg and Akram Helou.