Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Relaxation schemes for min max generalization in batch mode reinforcement learning.

PDE and Applied Math Seminar

Speaker: Prof. Quentin Louveaux, Univ. Liege Belgique + UC Davis
Location: 3106 MSB
Start time: Tue, May 6 2014, 3:10PM

Reinforcement learning is a control paradigm where an agent tries to interact with its environment in order to maximize a reward. We assume that the space in which the agent lies is a discrete-time Markov process whose only knowledge is given through a batch collection of trajectories. In this talk, we are interested in providing a worst-case performance guarantee of a given policy. It was shown that such a guarantee can be modeled through a quadratically constrained quadratic program. We show that such a problem is NP-hard and we propose two tractable relaxation schemes to tackle it. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those previously proposed in the literature. This is a joint work with Raphaël Fonteneau, Bernard Boigelot and Damien Ernst.