Publications

Learning Successor Features the Simple Way

Christos Kaplanis

Blake Aaron Richards

In Deep Reinforcement Learning (RL), it is a challenge to learn representations that do not exhibit catastrophic forgetting or interference … (voir plus)in non-stationary environments. Successor Features (SFs) offer a potential solution to this challenge. However, canonical techniques for learning SFs from pixel-level observations often lead to representation collapse, wherein representations degenerate and fail to capture meaningful variations in the data. More recent methods for learning SFs can avoid representation collapse, but they often involve complex losses and multiple learning phases, reducing their efficiency. We introduce a novel, simple method for learning SFs directly from pixels. Our approach uses a combination of a Temporal-difference (TD) loss and a reward prediction loss, which together capture the basic mathematical definition of SFs. We show that our approach matches or outperforms existing SF learning techniques in both 2D (Minigrid), 3D (Miniworld) mazes and Mujoco, for both single and continual learning scenarios. As well, our technique is efficient, and can reach higher levels of performance in less time than other approaches. Our work provides a new, streamlined technique for learning SFs directly from pixel observations, with no pretraining required.

2023-12-31

Advances in Neural Information Processing Systems 37 (publié)

doi.org

openreview.net

Learning Tabu Search Algorithms: A Scheduling Application

Nazgol Niroumandrad

Nadia Lahrichi

Andrea Lodi

. Metaheuristics are widely recognized as efficient approaches for many combinatorial problems. Studies to improve the performance of metahe… (voir plus)uristics have increasingly relied on the use of various methods either combining different metaheuristics or methods originating outside of the metaheuristic field. This paper presents a learning algorithm to improve tabu search by reducing its search space and the evaluation effort. We study the performance of a learning tabu search algorithm using classification methods in an attempt to select moves through the search space more wisely. The experimental results demonstrate the benefit of using a learning mechanism under deterministic and stochastic conditions.

2023-12-31

Comput. Oper. Res. (publié)

doi.org

Learning the Game: Decoding the Differences between Novice and Expert Players in a Citizen Science Game with Millions of Players.

Eddie Cai

Roman Sarrazin-Gendron

Renata Mutalova

Parham Ghasemloo Gheidari

Alexander Butyaev

Gabriel Richard

Sébastien Caisse

Rob Knight

Mathieu Blanchette

Attila Szantner

Jérôme Waldispühl

In recent years, video games have surged in popularity, attracting millions of players across platforms. Citizen science games (CSGs) levera… (voir plus)ge the processing power of gamers to solve computational and scientific problems. Borderlands Science (BLS) is a mini-game within the mass market game Borderlands 3 that turns multiple sequence alignment (MSA) problems into puzzles. Parallel research demonstrated that BLS players outperformed classical approaches solving small sequence alignment tasks. This study aims to analyze the strategical differences in player solutions in BLS as they gain experience. Through the many collected player solutions from players of different experience level, we gained insights into players’ strategies, differences between expert and non-expert players, and how strategies evolve. We developed a Markov chain trained on solutions from players of different experience levels to understand their actions and outcomes. Results indicate that expert players utilize more gaps and achieve more matches, gradually improving and converging toward unique strategies. Our findings reveal distinct and evolving player strategies. For future citizen science projects, it will be important to consider the identification of player strategies and their evolution over time to improve the game design and data processing.

2023-12-31

FDG (publié)

doi.org

On learning Whittle index policy for restless bandits with scalable regret

Nima Akbarzadeh

Aditya Mahajan

Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system mod… (voir plus)el is unknown. However, the cumulative regret of most RL algorithms scales as ˜ O(S

2023-12-31

IEEE Transactions on Control of Network Systems (publié)

doi.org

arxiv.org

Less or More from Teacher: Exploiting Trilateral Geometry for Knowledge Distillation

Xi Chen

Jun Yan

Boyu Wang

Xue Liu

Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision fro… (voir plus)m ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction (

2023-12-31

ICLR (publié)

doi.org

openreview.net

Linear Weight Interpolation Leads to Transient Performance Gains

Gaurav Iyer

Gintare Karolina Dziugaite

David Rolnick

2023-12-31

Trans. Mach. Learn. Res. (publié)

openreview.net

List Comprehension Versus for Loops Performance in Real Python Projects: Should we Care?

Cyrine Zid

François Belias

Massimiliano Di Penta

Foutse Khomh

Giuliano Antoniol

List comprehensions are a Pythonic functional construct allowing developers to express in a concise way loops to build and manipulate lists.… (voir plus) Previous studies point to a gain in speed when list comprehensions are adopted. This paper reports the results of a study that compares the execution time performance of Python code written using list comprehensions as opposed to equivalent imperative programming. To this aim, we have developed a set of transformation rules to map Python for loops into list comprehensions. On the one hand, on artificial code snippets, we found list comprehensions to be faster than procedural code, with differences becoming evident if amplifying the tests, i.e., executing the code fragment thousands of times. On the other hand, this does not happen when executing real-world Python projects, where the performance may or may not improve, depending on the projects' features and the nature of the manipulated objects.

2023-12-31

SANER (publié)

doi.org

Local Search GFlowNets

Sungsoo Ahn

Jinkyoo Park

Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their re… (voir plus)wards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via backtracking and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme, which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: https://github.com/dbsxodud-11/ls_gfn.

2023-12-31

ICLR (publié)

doi.org

openreview.net

Low-Dimensional Embeddings of High-Dimensional Data: Algorithms and Applications (Dagstuhl Seminar 24122).

Dmitry Kobak

Fred Hamprecht

Smita Krishnaswamy

Gal Mishne

Sebastian Damrich

2023-12-31

Dagstuhl Reports (publié)

doi.org

Machine Learning and Information Theory Concepts Towards an AI Mathematician

Yoshua Bengio

Nikolay Malkin

The current state-of-the-art in artificial intelligence is impressive, especially in terms of mastery of language, but not so much in terms … (voir plus)of mathematical reasoning. What could be missing? Can we learn something useful about that gap from how the brains of mathematicians go about their craft? This essay builds on the idea that current deep learning mostly succeeds at system 1 abilities -- which correspond to our intuition and habitual behaviors -- but still lacks something important regarding system 2 abilities -- which include reasoning and robust uncertainty estimation. It takes an information-theoretical posture to ask questions about what constitutes an interesting mathematical statement, which could guide future work in crafting an AI mathematician. The focus is not on proving a given theorem but on discovering new and interesting conjectures. The central hypothesis is that a desirable body of theorems better summarizes the set of all provable statements, for example by having a small description length while at the same time being close (in terms of number of derivation steps) to many provable statements.

2023-12-31

arXiv (prépublication)

doi.org

arxiv.org

Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

Kenza Benkirane

Laura Gongas

Shahar Pelles

Naomi Fuchs

Joshua Darmon

Pontus Stenetorp

David Ifeoluwa Adelani

Eduardo Sánchez

La plateforme Mila Ventures

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

Publications

La plateforme Mila Ventures

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications