Publications

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Skander Moalla

Andrea Miele

Daniil Pyatko

Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend … (see more)on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks trained under non-stationarity exhibit an inability to continue learning, termed loss of plasticity, and eventually a collapse in performance. For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss. Although this correlation has generally been attributed to neural network learning under non-stationarity, the connection to representation dynamics has not been carefully studied in on-policy policy optimization methods. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and capacity loss. We show that this is aggravated by stronger non-stationarity, ultimately driving the actor's performance to collapse, regardless of the performance of the critic. We ask why the trust region, specific to methods like PPO, cannot alleviate or prevent the collapse and find a connection between representation collapse and the degradation of the trust region, one exacerbating the other. Finally, we present Proximal Feature Optimization (PFO), a novel auxiliary loss that, along with other interventions, shows that regularizing the representation dynamics mitigates the performance collapse of PPO agents.

2024-04-30

ArXiv (preprint)

doi.org

arxiv.org

Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

Aleksandr Beznosikov

David Dobre

Gauthier Gidel

2024-04-30

International Conference on Machine Learning (poster)

doi.org

proceedings.mlr.press

A self-attention-based CNN-Bi-LSTM model for accurate state-of-charge estimation of lithium-ion batteries

Zeinab Sherkatghanad

Amin Ghazanfari

Vladimir Makarenkov

2024-04-30

Journal of Energy Storage (published)

doi.org

Successor Features for Efficient Multi-Subject Controlled Text Generation

Meng Cao

Mehdi Fatemi

Jackie Chi Kit Cheung

Samira Shabanian

While large language models (LLMs) have achieved impressive performance in generating fluent and realistic text, controlling the generated t… (see more)ext so that it exhibits properties such as safety, factuality, and non-toxicity remains challenging. Existing decoding-based controllable text generation methods are static in terms of the dimension of control; if the target subject is changed, they require new training. Moreover, it can quickly become prohibitive to concurrently control multiple subjects. To address these challenges, we first show that existing methods can be framed as a reinforcement learning problem, where an action-value function estimates the likelihood of a desired attribute appearing in the generated text. Then, we introduce a novel approach named SF-Gen, which leverages the concept of successor features to decouple the dynamics of LLMs from task-specific rewards. By employing successor features, our method proves to be memory-efficient and computationally efficient for both training and decoding, especially when dealing with multiple target subjects. To the best of our knowledge, our research represents the first application of successor features in text generation. In addition to its computational efficiency, the resultant language produced by our method is comparable to the SOTA (and outperforms baselines) in both control measures as well as language quality, which we demonstrate through a series of experiments in various controllable text generation tasks.

2024-04-30

ICML.cc/2024/Conference (poster)

proceedings.mlr.press

A Tensor Decomposition Perspective on Second-order RNNS

Maude Lizaire

Michael Rizvi-Martel

Marawan Gamal Abdel Hameed

Guillaume Rabusseau

Second-order Recurrent Neural Networks (2RNNs) extend RNNs by leveraging second-order interactions for sequence modelling. These models are … (see more)provably more expressive than their first-order counterparts and have connections to well-studied models from formal language theory. However, their large parameter tensor makes computations intractable. To circumvent this issue, one approach known as MIRNN consists in limiting the type of interactions used by the model. Another is to leverage tensor decomposition to diminish the parameter count. In this work, we study the model resulting from parameterizing 2RNNs using the CP decomposition, which we call CPRNN. Intuitively, the rank of the decomposition should reduce expressivity. We analyze how rank and hidden size affect model capacity and show the relationships between RNNs, 2RNNs, MIRNNs, and CPRNNs based on these parameters. We support these results empirically with experiments on the Penn Treebank dataset which demonstrate that, with a fixed parameter budget, CPRNNs outperforms RNNs, 2RNNs, and MIRNNs with the right choice of rank and hidden size.

2024-04-30

ICML.cc/2024/Conference (spotlight)

doi.org

proceedings.mlr.press

The Impact of Educational Materials on Parental Anxiety and Productivity: A Clinical Trial in Pediatric Appendicitis

Julia Ferreira

Nadia Safa

Fabio Botelho

Robin Petroze

Hussein Wissanji

Dan Poenaru

Pramod Puligandla

Kenneth Shaw

Maeve Trudeau

Sherif Emil

Elena Guadagno

Jean-Martin Laberge

2024-04-30

Journal of Pediatric Surgery (published)

doi.org

Weblinx: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

Siva Reddy

We propose the problem of conversational web navigation, where a digital agent controls a web browser and follows user instructions to solve… (see more) real-world tasks in a multi-turn dialogue fashion. To support this problem, we introduce WEBLINX - a large-scale benchmark of 100K interactions across 2300 expert demonstrations of conversational web navigation. Our benchmark covers a broad range of patterns on over 150 real-world websites and can be used to train and evaluate agents in diverse scenarios. Due to the magnitude of information present, Large Language Models (LLMs) cannot process entire web pages in real-time. To solve this bottleneck, we design a retrieval-inspired model that efficiently prunes HTML pages by ranking relevant elements. We use the selected elements, along with screenshots and action history, to assess a variety of models for their ability to replicate human behavior when navigating the web. Our experiments span from small text-only to proprietary multimodal LLMs. We find that smaller finetuned decoders surpass the best zero-shot LLMs (including GPT-4V), but also larger finetuned multimodal models which were explicitly pretrained on screenshots. However, all finetuned models struggle to generalize to unseen websites. Our findings highlight the need for large multimodal models that can generalize to novel settings. Our code, data and models are available for research: https://mcgill-nlp.github.io/weblinx

2024-04-30

ICML.cc/2024/Conference (spotlight)

doi.org

proceedings.mlr.press

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Dylan Green

William Harvey

Saeid Naderiparizi

Matthew Niedoba

Yunpeng Liu

Xiaoxuan Liang

Jonathan Wilder Lavington

Ke Zhang

Vasileios Lioutas

Setareh Dabiri

Adam Ścibior

Berend Zwartsenberg

Frank N. Wood

Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions… (see more) by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

2024-04-29

ArXiv (preprint)

doi.org

openreview.net

295. Rare Variant Genetic Architecture of the Human Cortical MRI Phenotypes in General Population

Kuldeep Kumar

Sayeh Kazem

Zhijie Liao

Jakub Kopal

Guillaume Huguet

Thomas Renne

Martineau Jean‐Louis

Zhe Xie

Zohra Saci

Laura Almasy

David C. Glahn

Tomáš Paus

Guillaume Dumas

Carrie E. Bearden

Paul M. Thompson

Richard A. I. Bethlehem

Varun Warrier

Sébastien Jacquemont

2024-04-28

Biological Psychiatry (unknown)

doi.org

8-inch Wafer-scale Epitaxial Monolayer MoS2.

Hua Yu

Liangfeng Huang

Lanying Zhou

Yalin Peng

Xiuzhen Li

Peng Yin

Jiaojiao Zhao

Min Zhu

Shuopei Wang

Jieying Liu

Hongyue Du

Jian Tang

Songge Zhang

Yuchao Zhou

Nianpeng Lu

Kaihui Liu

Na Li

Guangyu Zhang

2024-04-28

Advances in Materials (published)

doi.org

MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection

Heitor Rapela Medeiros

David Latortue

Fidel A. Guerrero Peña

Eric Granger

Marco Pedersoli

,

2024-04-28

ArXiv (preprint)

doi.org

arxiv.org

Sequential predictive learning is a unifying theory for hippocampal representation and replay

Aleksei Efremov

The mammalian hippocampus contains a cognitive map that represents an animal’s position in the environment 1 … (see more) and generates offline “replay” 2,3 for the purposes of recall 4 , planning 5,6 , and forming long term memories 7 . Recently, it’s been found that artificial neural networks trained to predict sensory inputs develop spatially tuned cells 8 , aligning with predictive theories of hippocampal function 9–11 . However, whether predictive learning can also account for the ability to produce offline replay is unknown. Here, we find that spatially-tuned cells, which robustly emerge from all forms of predictive learning, do not guarantee the presence of a cognitive map with the ability to generate replay. Offline simulations only emerged in networks that used recurrent connections and head-direction information to predict multi-step observation sequences, which promoted the formation of a continuous attractor reflecting the geometry of the environment. These offline trajectories were able to show wake-like statistics, autonomously replay recently experienced locations, and could be directed by a virtual head direction signal. Further, we found that networks trained to make cyclical predictions of future observation sequences were able to rapidly learn a cognitive map and produced sweeping representations of future positions reminiscent of hippocampal theta sweeps 12 . These results demonstrate how hippocampal-like representation and replay can emerge in neural networks engaged in predictive learning, and suggest that hippocampal theta sequences reflect a circuit that implements a data-efficient algorithm for sequential predictive learning. Together, this framework provides a unifying theory for hippocampal functions and hippocampal-inspired approaches to artificial intelligence.

2024-04-28

bioRxiv (preprint)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications