Laurent Charlin

Irina Rish

2022-01-01

arXiv.org (prépublication)

Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline

Massimo Caccia

Jonas Mueller

Taesup Kim

Rasool Fakoor

We study task-agnostic continual reinforcement learning (TACRL) in which standard RL challenges are compounded with partial observability st… (voir plus)emming from task agnosticism, as well as additional difﬁculties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. Here we compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL) methods which do not have to deal with non-stationary data distributions, as well as task-aware methods, which are allowed to operate under full observability . We consider a previously unexplored and straightforward baseline for TACRL, replay-based recurrent RL (3RL), in which we augment an RL algorithm with recurrent mechanisms to address partial observability and experience replay mechanisms to address catastrophic forgetting in CL. Studying empirical performance in a sequence of RL tasks, we ﬁnd surprising occurrences of 3RL matching and overcoming the MTL and task-aware soft upper bounds. We lay out hypotheses that could explain this inﬂection point of continual and task-agnostic learning research. Our hypotheses are empirically tested in continuous control tasks via a large-scale study of the popular multi-task and continual learning benchmark Meta-World. By analyzing different training statistics including gradient conﬂict, we ﬁnd evidence that 3RL’s outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer.

2022-01-01

arXiv.org (prépublication)

Neural Column Generation for Capacitated Vehicle Routing

Behrouz Babaki

Sanjay Dominik Jena

The column generation technique is essential for solving linear programs with an exponential number of variables. Many important application… (voir plus)s such as the vehicle routing problem (VRP) now require it. However, in practice, getting column generation to converge is challenging. It often ends up adding too many columns. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. The architecture, inspired by stabilization techniques, first predicts the optimal duals. These predictions are then used to obtain the columns to add. We show using VRP instances that in this setting several machine learning models yield good performance on the task and that our proposed architecture learned using imitation learning outperforms a modern stabilization technique.

2021-12-16

AAAI.org/2022/Workshop/ML4OR-22 (poster)

Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations

Pau Rodriguez

Massimo Caccia

Alexandre Lacoste

Lee Zamparo

Issam Hadj Laradji

David Vazquez

Explainability for machine learning models has gained considerable attention within the research community given the importance of deploying… (voir plus) more reliable machine-learning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model’s input to change its prediction, providing details about the model’s decision-making. Current methods tend to generate trivial counterfactuals about a model’s decisions, as they often suggest to exaggerate or remove the presence of the attribute being classified. For the machine learning practitioner, these types of counterfactuals offer little value, since they provide no new information about undesired model or data biases. In this work, we identify the problem of trivial counterfactual generation and we propose DiVE to alleviate it. DiVE learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model’s prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. Code is available at https://github.com/ElementAI/beyond-trivial-explanations.

2021-10-10

2021 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

Sequoia: A Software Framework to Unify Continual Learning Research

Fabrice Normandin

Florian Golemo

Oleksiy Ostapenko

Pau Rodriguez

Matthew D Riemer

J. Hurtado

Lucas Cecchi

Khimya Khetarpal

Dominic Zhao

Ryan Lindeborg

Timothee LESORT

David Vazquez

Irina Rish

Massimo Caccia

The field of Continual Learning (CL) seeks to develop algorithms that accumulate knowledge and skills over time through interaction with non… (voir plus)-stationary environments. In practice, a plethora of evaluation procedures (settings) and algorithmic solutions (methods) exist, each with their own potentially disjoint set of assumptions. This variety makes measuring progress in CL difficult. We propose a taxonomy of settings, where each setting is described as a set of assumptions. A tree-shaped hierarchy emerges from this view, where more general settings become the parents of those with more restrictive assumptions. This makes it possible to use inheritance to share and reuse research, as developing a method for a given setting also makes it directly applicable onto any of its children. We instantiate this idea as a publicly available software framework called Sequoia, which features a wide variety of settings from both the Continual Supervised Learning (CSL) and Continual Reinforcement Learning (CRL) domains. Sequoia also includes a growing suite of methods which are easy to extend and customize, in addition to more specialized methods from external libraries. We hope that this new paradigm and its first implementation can help unify and accelerate research in CL. You can help us grow the tree by visiting (this GitHub URL).

2021-08-02

ArXiv (prépublication)

Comparative Study of Learning Outcomes for Online Learning Platforms

Francois St-Hilaire

Nathan J. Burns

Robert Belfer

Muhammad Shayan

Ariella Smofsky

Dung D. Vu

Antoine Frau

Joseph Potochny

Farid Faraji

Vincent Pavero

Neroli Ko

Ansona Onyi Ching

Sabina Elkins

A. Stepanyan

Adela Matajova

Yoshua Bengio

Iulian V. Serban

Ekaterina Kochmar

2021-06-12

Lecture Notes in Computer Science (publié)

Continual Learning via Local Module Composition

Oleksiy Ostapenko

Pau Rodriguez

Massimo Caccia

Modularity is a compelling solution to continual learning (CL), the problem of modeling sequences of related tasks. Learning and then compos… (voir plus)ing modules to solve different tasks provides an abstraction to address the principal challenges of CL including catastrophic forgetting, backward and forward transfer across tasks, and sub-linear model growth. We introduce local module composition (LMC), an approach to modular CL where each module is provided a local structural component that estimates a module's relevance to the input. Dynamic module composition is performed layer-wise based on local relevance scores. We demonstrate that agnosticity to task identities (IDs) arises from (local) structural learning that is module-specific as opposed to the task- and/or model-specific as in previous works, making LMC applicable to more CL settings compared to previous works. In addition, LMC also tracks statistics about the input distribution and adds new modules when outlier samples are detected. In the first set of experiments, LMC performs favorably compared to existing methods on the recent Continual Transfer-learning Benchmark without requiring task identities. In another study, we show that the locality of structural learning allows LMC to interpolate to related but unseen tasks (OOD), as well as to compose modular networks trained independently on different task sequences into a third modular network without any fine-tuning. Finally, in search for limitations of LMC we study it on more challenging sequences of 30 and 100 tasks, demonstrating that local module selection becomes much more challenging in presence of a large number of candidate modules. In this setting best performing LMC spawns much fewer modules compared to an oracle based baseline, however, it reaches a lower overall accuracy. The codebase is available under https://github.com/oleksost/LMC.

DATA-EFFICIENT REINFORCEMENT LEARNING

Philip Bachman

Data efficiency poses a major challenge for deep reinforcement learning. We approach this issue from the perspective of self-supervised repr… (voir plus)esentation learning, leveraging reward-free exploratory data to pretrain encoder networks. We employ a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning. We demonstrate that our method scales well with network capacity and pretraining data. When evaluated on the Atari 100k data-efficiency benchmark, our approach significantly outperforms previous methods combining unsupervised pretraining with task-specific finetuning, and approaches human-level performance.

Pretraining Representations for Data-Efficient Reinforcement Learning

Philip Bachman

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (voir plus)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Maxime Gasse

Simon Bowly

Quentin Cappart

Jonas Charfreitag

Didier Chételat

Antonia Chmiela

Justin Dumouchelle

Ambros Gleixner

Aleksandr Kazachkov

Elias Boutros Khalil

Paweł Lichocki

Andrea Lodi

Miles Lubin

Chris J. Maddison

Christopher Morris

D. Papageorgiou

Augustin Parjadis

Sebastian Pokutta

Antoine Prouvost … (voir 22 de plus)

Lara Scavuzzo

Giulia Zarpellon

Linxin Yangm

Sha Lai

Akang Wang

Xiaodong Luo

Xiang Zhou

Haohan Huang

Sheng Cheng Shao

Yuanming Zhu

Dong Dong Zhang

Tao Manh Quan

Zixuan Cao

Yang Xu

Zhewei Huang

Shuchang Zhou

C. Binbin

He Minggui

Haoren Ren Hao

Zhang Zhiyu

An Zhiwu

Mao Kun

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (voir plus)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.

2021-01-01

NeurIPS (Competition and Demos) (publié)

Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles

Yao Lu

Yue Dong

Multi-document summarization is a challenging task for which there exists little large-scale datasets. We propose Multi-XScience, a large-sc… (voir plus)ale multi-document summarization dataset created from scientific articles. Multi-XScience introduces a challenging multi-document summarization task: writing the related-work section of a paper based on its abstract and the articles it references. Our work is inspired by extreme summarization, a dataset construction protocol that favours abstractive modeling approaches. Descriptive statistics and empirical results—using several state-of-the-art models trained on the Multi-XScience dataset—reveal that Multi-XScience is well suited for abstractive models.

2020-11-01

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (publié)