Daphne Lafleur

Alumni

Publications

Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

A. Chandar

Gilles Pesant

While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On t… (see more)he other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each value and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus’ style and the music theory constraints.

2021-12-31

International Conference on Principles and Practice of Constraint Programming (published)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Daphne Lafleur

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Daphne Lafleur

Publications