Publications

Memory Efficient Neural Processes via Constant Memory Attention Block

Frederick Tung

Hossein Hajimirsadeghi

Mohamed Osama Ahmed

Neural Processes (NPs) are popular meta-learning methods for efficiently modelling predictive uncertainty. Recent state-of-the-art methods, … (voir plus)however, leverage expensive attention mechanisms, limiting their applications, particularly in low-resource settings. In this work, we propose Constant Memory Attention Block (CMAB), a novel general-purpose attention block that (1) is permutation invariant, (2) computes its output in constant memory, and (3) performs updates in constant computation. Building on CMAB, we propose Constant Memory Attentive Neural Processes (CMANPs), an NP variant which only requires \textbf{constant} memory. Empirically, we show CMANPs achieve state-of-the-art results on popular NP benchmarks (meta-regression and image completion) while being significantly more memory efficient than prior methods.

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

MixEHR-SurG: a joint proportional hazard and guided topic model for inferring mortality-associated topics from electronic health records

Yixuan Li

Ariane Marelli

Archer Yang

Yue Li

Survival models can help medical practitioners to evaluate the prognostic importance of clinical variables to patient outcomes such as morta… (voir plus)lity or hospital readmission and subsequently design personalized treatment regimes. Electronic Health Records (EHRs) hold the promise for large-scale survival analysis based on systematically recorded clinical features for each patient. However, existing survival models either do not scale to high dimensional and multi-modal EHR data or are difficult to interpret. In this study, we present a supervised topic model called MixEHR-SurG to simultaneously integrate heterogeneous EHR data and model survival hazard. Our contributions are three-folds: (1) integrating EHR topic inference with Cox proportional hazards likelihood; (2) integrating patient-specific topic hyperparameters using the PheCode concepts such that each topic can be identified with exactly one PheCode-associated phenotype; (3) multi-modal survival topic inference. This leads to a highly interpretable survival topic model that can infer PheCode-specific phenotype topics associated with patient mortality. We evaluated MixEHR-SurG using a simulated dataset and two real-world EHR datasets: the Quebec Congenital Heart Disease (CHD) data consisting of 8211 subjects with 75,187 outpatient claim records of 1767 unique ICD codes; the MIMIC-III consisting of 1458 subjects with multi-modal EHR records. Compared to the baselines, MixEHR-SurG achieved a superior dynamic AUROC for mortality prediction, with a mean AUROC score of 0.89 in the simulation dataset and a mean AUROC of 0.645 on the CHD dataset. Qualitatively, MixEHR-SurG associates severe cardiac conditions with high mortality risk among the CHD patients after the first heart failure hospitalization and critical brain injuries with increased mortality among the MIMIC-III patients after their ICU discharge. Together, the integration of the Cox proportional hazards model and EHR topic inference in MixEHR-SurG not only leads to competitive mortality prediction but also meaningful phenotype topics for in-depth survival analysis. The software is available at GitHub: https://github.com/li-lab-mcgill/MixEHR-SurG.

2024-05-01

Journal of Biomedical Informatics (publié)

doi.org

arxiv.org

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Johan Samir Obando Ceron

Ghada Sokar

Timon Willi

Clare Lyle

Jesse Farebrother

Jakob Nicolaus Foerster

Gintare Karolina Dziugaite

Doina Precup

Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (voir plus)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Nash Learning from Human Feedback

Remi Munos

Michal Valko

Daniele Calandriello

Mohammad Gheshlaghi Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Thomas Mesnard

Côme Fiegel

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM's policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Nash Learning from Human Feedback

Remi Munos

Michal Valko

Daniele Calandriello

Mohammad Gheshlaghi Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Thomas Mesnard

Côme Fiegel

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM’s policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Nilearn and Big Data Facilitates Transdiagnostic Brain Biomarkers

Hao-Ting Wang

Natasha Clarke

Quentin Dessain

Fraçois Paugam

Lune Bellec

2024-05-01

Biological Psychiatry (publié)

doi.org

Overcoming boundaries: Interdisciplinary challenges and opportunities in cognitive neuroscience

Arnaud Brignol

Anita Paas

Luis Sotelo-Castro

David St-Onge

Giovanni Beltrame

Emily B.J. Coffey

2024-05-01

Neuropsychologia (publié)

doi.org

Patient-Centered Surgical Care for Children in Low and Lower-Middle Income Countries (LMICs) - A Systematic Scoping Review of the Literature

Riya Sawhney

Kacylia Roy Proulx

Ayla Gerk

Elena Guadagno

Dan Poenaru

2024-05-01

Journal of Pediatric Surgery (publié)

doi.org

Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada

Mehdi Mousavi

Shabnam Shafiee

Jason M. Harley

Jackie Cheung

Samira Abbasgholizadeh-Rahimi

Introduction The application of large language models such as generative pre-trained transformers (GPTs) has been promising in medical educa… (voir plus)tion, and its performance has been tested for different medical exams. This study aims to assess the performance of GPTs in responding to a set of sample questions of short-answer management problems (SAMPs) from the certification exam of the College of Family Physicians of Canada (CFPC). Method Between August 8th and 25th, 2023, we used GPT-3.5 and GPT-4 in five rounds to answer a sample of 77 SAMPs questions from the CFPC website. Two independent certified family physician reviewers scored AI-generated responses twice: first, according to the CFPC answer key (ie, CFPC score), and second, based on their knowledge and other references (ie, Reviews’ score). An ordinal logistic generalised estimating equations (GEE) model was applied to analyse repeated measures across the five rounds. Result According to the CFPC answer key, 607 (73.6%) lines of answers by GPT-3.5 and 691 (81%) by GPT-4 were deemed accurate. Reviewer’s scoring suggested that about 84% of the lines of answers provided by GPT-3.5 and 93% of GPT-4 were correct. The GEE analysis confirmed that over five rounds, the likelihood of achieving a higher CFPC Score Percentage for GPT-4 was 2.31 times more than GPT-3.5 (OR: 2.31; 95% CI: 1.53 to 3.47; p0.001). Similarly, the Reviewers’ Score percentage for responses provided by GPT-4 over 5 rounds were 2.23 times more likely to exceed th

2024-05-01

Family Medicine and Community Health (publié)

doi.org

Performance of generative pre-trained transformers (GPTs) in Certification Examination of the College of Family Physicians of Canada

Mehdi Mousavi

Shabnam Shafiee

Jason M Harley

Jackie Cheung

Samira Abbasgholizadeh-Rahimi

Introduction The application of large language models such as generative pre-trained transformers (GPTs) has been promising in medical educa… (voir plus)tion, and its performance has been tested for different medical exams. This study aims to assess the performance of GPTs in responding to a set of sample questions of short-answer management problems (SAMPs) from the certification exam of the College of Family Physicians of Canada (CFPC). Method Between August 8th and 25th, 2023, we used GPT-3.5 and GPT-4 in five rounds to answer a sample of 77 SAMPs questions from the CFPC website. Two independent certified family physician reviewers scored AI-generated responses twice: first, according to the CFPC answer key (ie, CFPC score), and second, based on their knowledge and other references (ie, Reviews’ score). An ordinal logistic generalised estimating equations (GEE) model was applied to analyse repeated measures across the five rounds. Result According to the CFPC answer key, 607 (73.6%) lines of answers by GPT-3.5 and 691 (81%) by GPT-4 were deemed accurate. Reviewer’s scoring suggested that about 84% of the lines of answers provided by GPT-3.5 and 93% of GPT-4 were correct. The GEE analysis confirmed that over five rounds, the likelihood of achieving a higher CFPC Score Percentage for GPT-4 was 2.31 times more than GPT-3.5 (OR: 2.31; 95% CI: 1.53 to 3.47; p0.001). Similarly, the Reviewers’ Score percentage for responses provided by GPT-4 over 5 rounds were 2.23 times more likely to exceed th

2024-05-01

Family Medicine and Community Health (publié)

doi.org

Position: Application-Driven Innovation in Machine Learning

David Rolnick

Alan Aspuru-Guzik

Sara Beery

Bistra Dilkina

Priya L. Donti

Marzyeh Ghassemi

Hannah Kerner

Claire Monteleoni

Esther Rolf

Milind Tambe

Adam White

2024-05-01

ICML.cc/2024/Conference (poster)

proceedings.mlr.press

openreview.net

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Skander Moalla

Andrea Miele

Daniil Pyatko

Razvan Pascanu

Caglar Gulçehre

Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend … (voir plus)on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks trained under non-stationarity exhibit an inability to continue learning, termed loss of plasticity, and eventually a collapse in performance. For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss. Although this correlation has generally been attributed to neural network learning under non-stationarity, the connection to representation dynamics has not been carefully studied in on-policy policy optimization methods. In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and capacity loss. We show that this is aggravated by stronger non-stationarity, ultimately driving the actor's performance to collapse, regardless of the performance of the critic. We ask why the trust region, specific to methods like PPO, cannot alleviate or prevent the collapse and find a connection between representation collapse and the degradation of the trust region, one exacerbating the other. Finally, we present Proximal Feature Optimization (PFO), a novel auxiliary loss that, along with other interventions, shows that regularizing the representation dynamics mitigates the performance collapse of PPO agents.

2024-05-01

ArXiv (prépublication)

doi.org

arxiv.org

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Publications

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Mots-clés populaires:

Publications