Publications

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning
Jinsoo Yoo
Yunpeng Liu
Frank N. Wood
Geoff Pleiss
Learnable Filters for Geometric Scattering Modules
Dhananjay Bhaskar
Kincaid MacDonald
Jackson Grady
Michael Perlmutter
Learning conditional policies for crystal design using offline reinforcement learning
Conservative Q-learning for band-gap conditioned crystal design with DFT evaluations – the model is trained on trajectories constructed fr… (see more)om crystals in the Materials Project. Results indicate promising performance for lower band gap targets.
Learning Lagrangian Multipliers for the Travelling Salesman Problem
Augustin Parjadis
Bistra Dilkina
Aaron M. Ferber
Louis-Martin Rousseau
Learning Precedences for Scheduling Problems with Graph Neural Networks
Hélène Verhaeghe
Gilles Pesant
Claude-Guy Quimper
Learning to repeatedly solve routing problems
Mouad Morabit
Guy Desaulniers
Andrea Lodi
In the last years, there has been a great interest in machine‐learning‐based heuristics for solving NP‐hard combinatorial optimization… (see more) problems. The developed methods have shown potential on many optimization problems. In this paper, we present a learned heuristic for the reoptimization of a problem after a minor change in its data. We focus on the case of the capacited vehicle routing problem with static clients (i.e., same client locations) and changed demands. Given the edges of an original solution, the goal is to predict and fix the ones that have a high chance of remaining in an optimal solution after a change of client demands. This partial prediction of the solution reduces the complexity of the problem and speeds up its resolution, while yielding a good quality solution. The proposed approach resulted in solutions with an optimality gap ranging from 0% to 1.7% on different benchmark instances within a reasonable computing time.
Learning Successor Features the Simple Way
Christos Kaplanis
Blake Aaron Richards
In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (see more)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.
Learning Tabu Search Algorithms: A Scheduling Application
Nazgol Niroumandrad
Nadia Lahrichi
Andrea Lodi
. Metaheuristics are widely recognized as efficient approaches for many combinatorial problems. Studies to improve the performance of metahe… (see more)uristics have increasingly relied on the use of various methods either combining different metaheuristics or methods originating outside of the metaheuristic field. This paper presents a learning algorithm to improve tabu search by reducing its search space and the evaluation effort. We study the performance of a learning tabu search algorithm using classification methods in an attempt to select moves through the search space more wisely. The experimental results demonstrate the benefit of using a learning mechanism under deterministic and stochastic conditions.
Learning the Game: Decoding the Differences between Novice and Expert Players in a Citizen Science Game with Millions of Players.
Eddie Cai
Roman Sarrazin-Gendron
Renata Mutalova
Parham Ghasemloo Gheidari
Alexander Butyaev
Gabriel Richard
Sébastien Caisse
Rob Knight
Attila Szantner
Jérôme Waldispühl
In recent years, video games have surged in popularity, attracting millions of players across platforms. Citizen science games (CSGs) levera… (see more)ge the processing power of gamers to solve computational and scientific problems. Borderlands Science (BLS) is a mini-game within the mass market game Borderlands 3 that turns multiple sequence alignment (MSA) problems into puzzles. Parallel research demonstrated that BLS players outperformed classical approaches solving small sequence alignment tasks. This study aims to analyze the strategical differences in player solutions in BLS as they gain experience. Through the many collected player solutions from players of different experience level, we gained insights into players’ strategies, differences between expert and non-expert players, and how strategies evolve. We developed a Markov chain trained on solutions from players of different experience levels to understand their actions and outcomes. Results indicate that expert players utilize more gaps and achieve more matches, gradually improving and converging toward unique strategies. Our findings reveal distinct and evolving player strategies. For future citizen science projects, it will be important to consider the identification of player strategies and their evolution over time to improve the game design and data processing.
On learning Whittle index policy for restless bandits with scalable regret
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system mod… (see more)el is unknown. However, the cumulative regret of most RL algorithms scales as ˜ O(S
Less or More from Teacher: Exploiting Trilateral Geometry for Knowledge Distillation
Xi Chen
Xi Chen
Boyu Wang
Jun Yan
Xue Liu
Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision fro… (see more)m ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction (
Linear Weight Interpolation Leads to Transient Performance Gains