Abstract: Reinforcement learning agents aim to learn a policy, ie a way of behaving, from interaction with an environment. But in practice, there can be limits on the type of interaction allowed. For example, agents may not be able to gather data interactively due to safety constraints, or they may need to leverage batch data that has alreeady been collected. Another important case is that of learning optimal control, in which an agent has to follow an exploratory policy but is interested in finding and evaluation an optimal policy. Finally, we might want to use a single stream of data in order to learn about many different things. Off-policy reinforcement learning represents a very broad class of methods that allow an agent to learn about a desired, target policy, based on data collected using a different way of behaving. In this tutorial, we will review the theoretical ideas underpinning off-policy learning and discuss state-of-art algorithms that rely on off-policy learning. We will highlight practical applications of off-policy learning algorithms, and also discuss limitations of current methods and open problems.
Doina Precup splits her time between McGill University/MILA, where she holds a Canada-CIFAR AI chair, and DeepMind Montreal, where she has led the research team since its formation in October 2017. Her research interests are in the areas of reinforcement learning, deep learning, time series analysis, and diverse applications of machine learning in health care, automated control, and other fields. Dr. Precup is also involved in activities supporting the organization of the Montreal, Quebec and Canadian AI ecosystems.