Learning to Learn In Reinforcement Learning
Add to Google Calendar
As an alternative to highly pessimistic worst-case evaluations and overly optimistic single example empirical demonstrations, we advocate evaluating reinforcement-learning algorithms relative to a distribution over target environments. We formalize the problem of optimizing a learner based on environments sampled from this distribution as "meta-reinforcement learning" . In addition to experimental explorations, we provide a formal mechanism for assessing the risk due to selecting an algorithm from a class based on too small a sample of environments. We call our approach "sample-optimized Rademacher complexity" . It is akin to VC-dimension but can be estimated empirically using samples of environments (a property inherited from Rademacher complexity) and samples of learning algorithms (a novel property) instead of requiring exhaustive search in the form of existential and universal quantifiers. This method may be of independent interest because of its applicability to a broad range of distribution-sensitive meta-optimization settings.
Michael L. Littman's research in machine learning examines algorithms for decision making under uncertainty. He has earned multiple awards for teaching and his research has been recognized with three best-paper awards on the topics of meta-learning for computer crossword solving, complexity analysis of planning under uncertainty, and algorithms for efficient reinforcement learning. Littman has served on the editorial boards for the Journal of Machine Learning Research and the Journal of Artificial Intelligence Research. He was general chair of International Conference on Machine Learning 2013 and program chair of the Association for the Advancement of Artificial Intelligence Conference 2013.