Publications

Dealing with Non-Stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios wh… (see more)ere centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.

2023-11-19

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

doi.org

proceedings.mlr.press

Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning

Safa Alver

Doina Precup

Learning models of the environment from pure interaction is often considered an essential component of building lifelong reinforcement learn… (see more)ing agents. However, the common practice in model-based reinforcement learning is to learn models that model every aspect of the agent’s environment, regardless of whether they are important in coming up with optimal decisions or not. In this paper, we argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios and we propose new kinds of models that only model the relevant aspects of the environment, which we call \emph{minimal value-equivalent partial models}. After providing a formal definition for these models, we provide theoretical results demonstrating the scalability advantages of performing planning with such models and then perform experiments to empirically illustrate our theoretical results. Then, we provide some useful heuristics on how to learn these kinds of models with deep learning architectures and empirically demonstrate that models learned in such a way can allow for performing planning that is robust to distribution shifts and compounding model errors. Overall, both our theoretical and empirical results suggest that minimal value-equivalent partial models can provide significant benefits to performing scalable and robust planning in lifelong reinforcement learning scenarios.

2023-11-19

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

doi.org

proceedings.mlr.press

Responsible AI Research Needs Impact Statements Too

A.R. Olteanu

Michael Ekstrand

Carlos Castillo

Jina Suh

All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence … (see more)(RAI), ethical AI, or ethics in AI is no exception.

2023-11-19

ArXiv (preprint)

doi.org

arxiv.org

Substituting Data Annotation with Balanced Neighbourhoods and Collective Loss in Multi-label Text Classification

Muberra Ozmen

Joseph Cotnareanu

Mark J. Coates

Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text, and has a wide range of application domains… (see more). Most existing approaches require an enormous amount of annotated data to learn a classifier and/or a set of well-defined constraints on the label space structure, such as hierarchical relations which may be complicated to provide as the number of labels increases. In this paper, we study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels. Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph, driven with a collective loss function that injects the information of expected label frequency and average multi-label cardinality of predictions. The experiments show that the proposed framework achieves effective performance under low supervision settings with almost imperceptible computational and memory overheads added to the usage of pre-trained language model outperforming its initial performance by 70% in terms of example-based F1 score.

2023-11-19

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges

Massimo Caccia

Jonas Mueller

Taesup Kim

Laurent Charlin

Rasool Fakoor

2023-11-19

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

proceedings.mlr.press

Towards Few-Shot Coordination: Revisiting Ad-Hoc Teamplay Challenge in the Game of Hanabi

Hadi Nekoei

Xutong Zhao

Janarthanan Rajendran

Miao Liu

Sarath Chandar

Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in rece… (see more)nt years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods. In particular, we created a diverse set of pre-trained agents and defined a new metric called adaptation regret that measures the agent's ability to efficiently adapt and improve its coordination performance when paired with some held-out pool of partners on top of its ZSC performance. After evaluating several SOTA algorithms using our framework, our experiments reveal that naive Independent Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC algorithm Off-Belief Learning (OBL). This finding raises an interesting research question: How to design MARL algorithms with high ZSC performance and capability of fast adaptation to unseen partners. As a first step, we studied the role of different hyper-parameters and design choices on the adaptability of current MARL algorithms. Our experiments show that two categories of hyper-parameters controlling the training data diversity and optimization process have a significant impact on the adaptability of Hanabi agents.

2023-11-19

Proceedings of The 2nd Conference on Lifelong Learning Agents (published)

doi.org

proceedings.mlr.press

Assessing the Security of GitHub Copilot's Generated Code - A Targeted Replication Study

Vahid Majdinasab

Michael Joshua Bishop

Shawn Rasheed

Arghavan Moradi Dakhel

Amjed Tahir

Foutse Khomh

2023-11-17

ArXiv (preprint)

doi.org

arxiv.org

MUDiff: Unified Diffusion for Complete Molecule Generation

Chenqing Hua

Sitao Luan

Minkai Xu

Zhitao Ying

Rex Ying

Jie Fu

Stefano Ermon

Doina Precup

2023-11-17

logconference.io/LOG/2023/Conference (poster)

doi.org

proceedings.mlr.press

The evidence mismatch in pediatric surgical practice

Marina Broomfield

Zena Agabani

Elena Guadagno

Dan Poenaru

Robert Baird

2023-11-17

Pediatric surgery international (Print) (published)

doi.org

Aperiodic activity as a central neural feature of hypnotic susceptibility outside of hypnosis

Mathieu Landry

Jason da Silva Castanheira

Catherine Boisvert

Floriane Rousseaux

Jérôme Sackur

Amir Raz

Philippe Richebé

David Ogez

Pierre Rainville

Karim Jerbi

Hypnotic phenomena reflect the ability to alter one’s subjective experiences based on targeted verbal suggestions. This ability varies gre… (see more)atly in the population. The brain correlates to explain this variability remain elusive. Addressing this gap, our study employs machine learning to predict hypnotic susceptibility. By recording electroencephalography (EEG) before and after a hypnotic induction and analyzing diverse neurophysiological features, we were able to determine that several features differentiate between high and low hypnotic susceptible individuals both at baseline and during hypnosis. Our analysis revealed that the paramount discriminative feature is non-oscillatory EEG activity before the induction—a new finding in the field. This outcome aligns with the idea that hypnotic susceptibility represents a latent trait observable through a plain five-minutes resting-state EEG.

2023-11-16

bioRxiv (preprint)

doi.org

Differentiable visual computing for inverse problems and machine learning

Andrew Spielberg

Fangcheng Zhong

Konstantinos Rematas

Krishna Murthy

Cengiz Oztireli

Tzu-Mao Li

D. Nowrouzezahrai

2023-11-16

Nature Machine Intelligence (published)

doi.org

arxiv.org

Adaptive Integration of Categorical and Multi-relational Ontologies with EHR Data for Medical Concept Embedding

Chin Wang Cheong

Kejing Yin

William K. Cheung

Benjamin C. M. Fung

Jonathan Poon

2023-11-13

ACM Transactions on Intelligent Systems and Technology (published)

doi.org

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Publications

Mila on Udemy

Disinformation 2.0: When AI Blurs the Lines

AI Policy Fellowship Publications

Popular keywords:

Publications