Publications

Opening the Scope of Openness in AI
Tamara Paris
Relative Explanations for Contextual Problems with Endogenous Uncertainty: An Application to Competitive Facility Location
Jasone Ram'irez-Ayerbe
A Survey of State Representation Learning for Deep Reinforcement Learning
Representation learning methods are an important tool for addressing the challenges posed by complex observations spaces in sequential decis… (see more)ion making problems. Recently, many methods have used a wide variety of types of approaches for learning meaningful state representations in reinforcement learning, allowing better sample efficiency, generalization, and performance. This survey aims to provide a broad categorization of these methods within a model-free online setting, exploring how they tackle the learning of state representations differently. We categorize the methods into six main classes, detailing their mechanisms, benefits, and limitations. Through this taxonomy, our aim is to enhance the understanding of this field and provide a guide for new researchers. We also discuss techniques for assessing the quality of representations, and detail relevant future directions.
The challenge of hidden gifts in multi-agent reinforcement learning
Cooperation between people is not always obvious. Sometimes we benefit from actions that others have taken even when we are unaware that the… (see more)y took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These “hidden gifts” represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit to your own actions correctly when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a very simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus the act of dropping the key for others is a “hidden gift”. We show that several different state-of-the-art RL algorithms, including MARL algorithms, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that independent model-free policy gradient agents can solve the task when we provide them with information about their action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for these independent agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show how credit assignment in multi-agent settings can be particularly challenging in the presence of “hidden gifts”, and demonstrate that learning awareness can benefit these settings
Towards Sustainable Investment Policies Informed by Opponent Shaping
Addressing climate change requires global coordination, yet rational economic actors often prioritize immediate gains over collective welfar… (see more)e, resulting in social dilemmas. InvestESG is a recently proposed multi-agent simulation that captures the dynamic interplay between investors and companies under climate risk. We provide a formal characterization of the conditions under which InvestESG exhibits an intertemporal social dilemma, deriving theoretical thresholds at which individual incentives diverge from collective welfare. Building on this, we apply Advantage Alignment, a scalable opponent shaping algorithm shown to be effective in general-sum games, to influence agent learning in InvestESG. We offer theoretical insights into why Advantage Alignment systematically favors socially beneficial equilibria by biasing learning dynamics toward cooperative outcomes. Our results demonstrate that strategically shaping the learning processes of economic agents can result in better outcomes that could inform policy mechanisms to better align market incentives with long-term sustainability goals.
Rethinking Full Finetuning from Pretraining Checkpoints in Active Learning for African Languages
Bonaventure F. P. Dossou
Revisiting Laplacian Representations for Value Function Approximation in Deep RL
Proto-value functions (PVFs) introduced Laplacian embeddings as an effective feature basis for value-function approximation; however, their … (see more)utility remained limited to small, fully known state spaces. Recent work has scaled Laplacian embeddings to high-dimensional inputs, using them for reward shaping and option discovery in goal-directed tasks, yet only as auxiliary signals, rather than directly using them as features for value functions. In this paper, we learn Laplacian eigenvectors online and employ them as features for Q-learning in 23 Atari games. We empirically demonstrate that these online–learned embeddings substantially improve model-free RL in large, high-dimensional domains. We demonstrate that enriching state representations with action embeddings yields additional gains under both behavior-policy and uniform-random policies. Additionally, we introduce the Fusion architecture, which augments the representation with useful inductive bias at the embedding level. To assess the usefulness of each embedding used in the Fusion architecture, we use Shapley values analysis.
Training PPO-Clip with Parallelized Data Generation: A Case of Fixed-Point Convergence
In recent years, with the increase in the compute power of GPUs, parallelized data collection has become the dominant approach for training … (see more)reinforcement learning (RL) agents. Proximal Policy Optimization (PPO) is one of the widely-used on-policy methods for training RL agents. In this paper, we focus on the training behavior of PPO-Clip with the increase in the number of parallel environments. In particular, we show that as we increase the amount of data used to train PPO-Clip, the optimized policy would converge to a fixed distribution. We use the results to study the behavior of PPO-Clip in two case studies: the effect of change in the minibatch size and the effect of increase in the number of parallel environments versus the increase in the rollout lengths. The experiments show that settings with high-return PPO runs result in slower convergence to the fixed-distribution and higher consecutive KL divergence changes. Our results aim to offer a better understanding for the prediction of the performance of PPO with the scaling of the parallel environments.
Alveolar epithelial cell plasticity and injury memory in human pulmonary fibrosis
Taylor S Adams
Jonas C Schupp
Agshin Balayev
Johad Khoury
Aurelien Justet
Fadi Nikola
Laurens J De Sadeleer
De Sadeleer J Laurens
Juan Cala Garcia
Marta Zapata-Ortega
Panayiotis V Benos
Benos V Panayiotis
P.V. Benos
John E McDonough
Farida Ahangari
Melanie Königshoff
Robert J Homer
Ivan O Rosas
Xiting Yan … (see 3 more)
Bart M Vanaudenaerde
Wim A Wuyts
Naftali Kaminski
Neuromorphic hierarchical modular reservoirs
Filip Milisav
Andrea I Luppi
Laura E Suárez
Bratislav Mišić
Behavioral Suite Analysis of Self-Supervised Learning in Atari
A deep generative model for deciphering cellular dynamics and in silico drug discovery in complex diseases.
Yumin Zheng
Jonas C Schupp
Taylor S Adams
Geremy Clair
Aurelien Justet
Farida Ahangari
Xiting Yan
Paul Hansen
Marianne Carlon
Emanuela Cortesi
Marie Vermant
Robin Vos
De Sadeleer J Laurens
Ivan O Rosas
Ricardo Pineda
John Sembrat
Melanie Königshoff
John E McDonough
Bart M. Vanaudenaerde
Wim A Wuyts … (see 2 more)
Naftali Kaminski