Laurent Charlin

Scaling adaptive traffic signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning atte… (voir plus)mpts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-network architectures—dominating in the multi-agent setting—do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic signal controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks and traffic distributions, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane level and the vehicle level. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

2022-07-01

IEEE Transactions on Intelligent Transportation Systems (publié)

Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning

Max B. Paulus

Giulia Zarpellon

Andreas Krause

Chris J. Maddison

Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal so… (voir plus)lution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

A New Era: Intelligent Tutoring Systems Will Transform Online Learning for Millions

Francois St-Hilaire

Dung D. Vu

Antoine Frau

Nathan J. Burns

Farid Faraji

Joseph Potochny

Stephane Robert

Arnaud Roussel

Selene Zheng

Taylor Glazier

Junfel Vincent Romano

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Tommy Delarosbil

Seulmin Ahn

Simon Eden-Walker

Kritika Sony

Ansona Onyi Ching

Sabina Elkins … (voir 11 de plus)

A. Stepanyan

Adela Matajova

Victor Chen

Hossein Sahraei

Robert Larson

N. Markova

Andrew Barkett

Yoshua Bengio

Iulian V. Serban

Ekaterina Kochmar

2022-03-03

ArXiv (prépublication)

COIL: A Deep Architecture for Column Generation

Behrouz Babaki

Sanjay Dominik Jena

. Column generation is a popular method to solve large-scale linear programs with an exponential number of variables. Several important appl… (voir plus)ications, such as the vehicle routing problem, rely on this technique in order to be solved. However, in practice, column generation methods suffer from slow convergence (i.e. they require too many iterations). Stabilization techniques, which carefully select the column to add at each iteration, are commonly used to improve convergence. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. Our architecture is inspired by stabilization techniques and predicts the optimal duals, which are then used to select the columns to add. We proposed architecture, trained using imitation learning. Exemplified on the Vehicle Routing Problem, we show that several machine learning models yield good performance in predicting the optimal duals and that our architecture outperforms them as well as a popular state-of-the-art stabilization technique. Further, the architecture approach can generalize to instances larger than those observed during training.

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Oleksiy Ostapenko

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

Arthur Douillard

Irina Rish

2022-01-01

CoLLAs (publié)

Scaling the Number of Tasks in Continual Learning

Timothee LESORT

Oleksiy Ostapenko

Diganta Misra

Md Rifat Arefin

Pau Rodriguez

Irina Rish

2022-01-01

arXiv.org (prépublication)

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline

Massimo Caccia

Jonas Mueller

Taesup Kim

Rasool Fakoor

We study task-agnostic continual reinforcement learning (TACRL) in which standard RL challenges are compounded with partial observability st… (voir plus)emming from task agnosticism, as well as additional difﬁculties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. Here we compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL) methods which do not have to deal with non-stationary data distributions, as well as task-aware methods, which are allowed to operate under full observability . We consider a previously unexplored and straightforward baseline for TACRL, replay-based recurrent RL (3RL), in which we augment an RL algorithm with recurrent mechanisms to address partial observability and experience replay mechanisms to address catastrophic forgetting in CL. Studying empirical performance in a sequence of RL tasks, we ﬁnd surprising occurrences of 3RL matching and overcoming the MTL and task-aware soft upper bounds. We lay out hypotheses that could explain this inﬂection point of continual and task-agnostic learning research. Our hypotheses are empirically tested in continuous control tasks via a large-scale study of the popular multi-task and continual learning benchmark Meta-World. By analyzing different training statistics including gradient conﬂict, we ﬁnd evidence that 3RL’s outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer.

2022-01-01

arXiv.org (prépublication)

Neural Column Generation for Capacitated Vehicle Routing

Behrouz Babaki

Sanjay Dominik Jena

The column generation technique is essential for solving linear programs with an exponential number of variables. Many important application… (voir plus)s such as the vehicle routing problem (VRP) now require it. However, in practice, getting column generation to converge is challenging. It often ends up adding too many columns. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. The architecture, inspired by stabilization techniques, first predicts the optimal duals. These predictions are then used to obtain the columns to add. We show using VRP instances that in this setting several machine learning models yield good performance on the task and that our proposed architecture learned using imitation learning outperforms a modern stabilization technique.

2021-12-16

AAAI.org/2022/Workshop/ML4OR-22 (poster)

openreview.net

Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations

Pau Rodriguez

Massimo Caccia

Alexandre Lacoste

Lee Zamparo

Issam Hadj Laradji

David Vazquez

Explainability for machine learning models has gained considerable attention within the research community given the importance of deploying… (voir plus) more reliable machine-learning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model’s input to change its prediction, providing details about the model’s decision-making. Current methods tend to generate trivial counterfactuals about a model’s decisions, as they often suggest to exaggerate or remove the presence of the attribute being classified. For the machine learning practitioner, these types of counterfactuals offer little value, since they provide no new information about undesired model or data biases. In this work, we identify the problem of trivial counterfactual generation and we propose DiVE to alleviate it. DiVE learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model’s prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. Code is available at https://github.com/ElementAI/beyond-trivial-explanations.

2021-10-10

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

Sequoia: A Software Framework to Unify Continual Learning Research

Fabrice Normandin

Florian Golemo

Oleksiy Ostapenko

Pau Rodriguez

Matthew D Riemer

J. Hurtado

Lucas Cecchi

Khimya Khetarpal

Dominic Zhao

Ryan Lindeborg

Timothee LESORT

David Vazquez

Irina Rish

Massimo Caccia

The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non… (voir plus)-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a taxonomy of settings, where each setting is described as a set of assumptions. A tree-shaped hierarchy emerges from this view, where more general settings become the parents of those with more restrictive assumptions. This makes it possible to use inheritance to share and reuse research, as developing a method for a given setting also makes it directly applicable onto any of its children. We instantiate this idea as a publicly available software framework called Sequoia, which features a wide variety of settings from both the Continual Supervised Learning (CSL) and Continual Reinforcement Learning (CRL) domains. Sequoia also includes a growing suite of methods which are easy to extend and customize, in addition to more specialized methods from external libraries. We hope that this new paradigm and its first implementation can help unify and accelerate research in CL. You can help us grow the tree by visiting (this GitHub URL).

2021-08-02

ArXiv (prépublication)

openreview.net

Comparative Study of Learning Outcomes for Online Learning Platforms

Francois St-Hilaire

Nathan J. Burns

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Dung D. Vu

Antoine Frau

Joseph Potochny

Farid Faraji

Vincent Pavero

Neroli Ko

Ansona Onyi Ching

Sabina Elkins

A. Stepanyan

Adela Matajova

Yoshua Bengio

Iulian V. Serban

Ekaterina Kochmar

2021-06-12

Lecture Notes in Computer Science (publié)

Continual Learning via Local Module Composition

Oleksiy Ostapenko

Pau Rodriguez

Massimo Caccia