Publications

Enhancing Multi-Agent Multi-Modal Collaboration with Fine-Grained Reward Modeling

Weixiang Yan

Multi-Modal Large Language Models (MLLMs) have significantly advanced multi-modal reasoning but still struggle with compositional reasoning … (see more)tasks. Multi-agent collaboration provides a promising solution by leveraging the distinct capabilities of different agents. Specifically, a decomposer agent to handle task breakdown and an answerer agent to generate responses. While there have been efforts to adaptively decompose tasks based on the answerer agent's capabilities, such as using in-context learning, these methods often prove insufficient for fully effective decomposition. We address this issue by enhancing collaboration through fine-grained reward modeling, where each generated sub-question is assigned a specialized reward without requiring extra annotation or tuning of a reward model. Our proposed method dynamically optimizes the decomposition process, enabling better alignment between agents. Experimental results on four vision-language tasks demonstrate consistent improvements, with a 5.5\% absolute increase in mean performance over traditional approaches. These findings highlight the efficacy of fine-grained reward modeling for enhancing multi-agent, multi-modal collaboration.

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)

openreview.net

Evaluating Interventional Reasoning Capabilities of Large Language Models

Tejas Kasetty

Divyat Mahajan

Gintare Karolina Dziugaite

Alexandre Drouin

Dhanya Sridhar

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (see more)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. Our analysis on four LLMs highlights that while GPT- 4 models show promising accuracy at predicting the intervention effects, they remain sensitive to distracting factors in the prompts.

2024-10-10

NeurIPS.cc/2024/Workshop/CALM (poster)

doi.org

openreview.net

Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs

Reza Asad

Reza Babanezhad Harikandeh

Issam Hadj Laradji

Nicolas Le Roux

Sharan Vaswani

We analyze the convergence of a novel policy gradient algorithm (referred to as SPMA) for multi-armed bandits and tabular Markov decision pr… (see more)ocesses (MDPs). SPMA is an instantiation of mirror ascent and uses the softmax parameterization with a log-sum-exp mirror map. Given access to the exact policy gradients, we prove that SPMA with a constant step-size requires

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (published)

openreview.net

Faster, More Efficient RLHF through Off-Policy Asynchronous Learning

Shengyi Huang

To achieve state-of-the-art chatbots, large language models are finetuned with reinforcement learning (RL), frequently to optimize human fee… (see more)dback (RLHF). This process is computationally expensive and can take weeks. Offline approaches, like DPO, learn on a static dataset and are efficient but not performant. The dominant paradigm, online and on-policy---synchronously generating from the model, labelling with a reward model, and learning on feedback from the model's own outputs---is performant but not efficient. Following prior work in the generall deep RL setting, we propose separating the actor and learner in RLHF. This enables the asynchronously generation of new samples while learning on prior samples, thus leading to overall faster training and better scaling. But this requires a novel regime for RLHF, online but off-policy: learning on samples from a previous version of our model. We ask a fundamental question: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? We find that a contrastive loss, Online DPO, is most robust to off-policy data and that robustness increases with the scale of the policy model. We show even further compute optimizations but demonstrate that they come at a performance cost, giving rise to a trade-off. Finally, we verify our design choices by training LLaMA 3.1 8B with RLHF as a helpful chatbot in half the time of a synchronous run while matching final performance.

2024-10-10

NeurIPS.cc/2024/Workshop/FITML (poster)

openreview.net

GraphText: Graph Reasoning in Text Space

Jianan Zhao

Le Zhuo

Yikang Shen

Meng Qu

Kai Liu

Michael M. Bronstein

Zhaocheng Zhu

Jian Tang

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)

doi.org

openreview.net

High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems

Andrew Nicholas Cheng

Kiwon Lee

Courtney Paquette

We analyze the dynamics of general mini-batch first order algorithms on the …

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (published)

openreview.net

How Learning Rates Shape Neural Network Focus: Insights from Example Ranking

Ekaterina Lobacheva

Keller Jordan

Aristide Baratin

Nicolas Le Roux

The learning rate is a key hyperparameter that affects both the speed of training and the generalization performance of neural networks. Th… (see more)rough a new {\it loss-based example ranking} analysis, we show that networks trained with different learning rates focus their capacity on different parts of the data distribution, leading to solutions with different generalization properties. These findings, which hold across architectures and datasets, provide new insights into how learning rates affect model performance and example-level dynamics in neural networks.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

Input Space Mode Connectivity in Deep Neural Networks

Jakub Vrabel

Ori Shem-Ur

Yaron Oz

David Scott Krueger

We extend the concept of loss landscape mode connectivity to the input space of deep neural networks. Initially studied in parameter space, … (see more)mode connectivity describes the existence of low-loss paths between solutions (loss minimizers) found via gradient descent. We present theoretical and empirical evidence of its presence in the input space of deep networks, thereby highlighting the broader nature of the phenomenon. We observe that different input images with similar predictions are generally connected, and for trained models, the path tends to be simple, with only a small deviation from being a linear path. We conjecture that input space mode connectivity in high-dimensional spaces is a geometric phenomenon, present even in untrained models, and can be explained by percolation theory. We exploit mode connectivity to obtain new insights about adversarial examples and show its potential for adversarial detection and interpretability.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (oral)

openreview.net

Introducing Brain Foundation Models

Mohammad Javad Darvishi Bayazi

Hena Ghonia

Roland Riachi

Bruno Aristimunha

Arian Khorasani

Md Rifat Arefin

Sylvain Chevallier

Amin Darabi

Guillaume Dumas

Irina Rish

Brain function represents one of the most complex systems driving our world. Decoding its signals poses significant challenges, particularly… (see more) due to the limited availability of data and the high cost of recordings. The existence of large hospital datasets and laboratory collections partially mitigates this issue. However, the lack of standardized recording protocols, varying numbers of channels, diverse setups, scenarios, and recording devices further complicate the task. This work addresses these challenges by introducing the Brain Foundation Model (BFM), a suite of open-source models trained on brain signals. These models serve as foundational tools for various types of time-series neuroimaging tasks. This work presents the first model of the BFM series, which is trained on electroencephalogram signal data. Our results demonstrate that BFM-EEG can generate signals more accurately than other models. Upon acceptance, we will release the model weights and pipeline.

2024-10-10

NeurIPS.cc/2024/Workshop/TSALM (published)

openreview.net

Language model scaling laws and zero-sum learning

Andrei Mircea

Ekaterina Lobacheva

Supriyo Chakraborty

Nima Chitsazan

Irina Rish

This work aims to understand how, in terms of training dynamics, scaling up language model size yields predictable loss improvements. We fin… (see more)d that these improvements can be tied back to loss deceleration, an abrupt transition in the rate of loss improvement, characterized by piece-wise linear behavior in log-log space. Notably, improvements from increased model size appear to be a result of (1) improving the loss at which this transition occurs; and (2) improving the rate of loss improvement after this transition. As an explanation for the mechanism underlying this transition (and the effect of model size on loss it mediates), we propose the zero-sum learning (ZSL) hypothesis. In ZSL, per-token gradients become systematically opposed, leading to degenerate training dynamics where the model can't improve loss on one token without harming it on another; bottlenecking the overall rate at which loss can improve. We find compelling evidence of ZSL, as well as unexpected results which shed light on other factors contributing to ZSL.

2024-10-10

NeurIPS.cc/2024/Workshop/SciForDL (poster)

openreview.net

A Layer Selection Approach to Test Time Adaptation

Sabyasachi Sahoo

Mostafa ElAraby

Jonas Ngnawe

Yann Batiste Pequignot

Frederic Precioso

Christian Gagné

Test Time Adaptation (TTA) addresses the problem of distribution shift by adapting a pretrained model to a new domain during inference. When… (see more) faced with challenging shifts, most methods collapse and perform worse than the original pretrained model. In this paper, we find that not all layers are equally receptive to the adaptation, and the layers with the most misaligned gradients often cause performance degradation. To address this, we propose GALA, a novel layer selection criterion to identify the most beneficial updates to perform during test time adaptation. This criterion can also filter out unreliable samples with noisy gradients. Its simplicity allows seamless integration with existing TTA loss functions, thereby preventing degradation and focusing adaptation on the most trainable layers. This approach also helps to regularize adaptation to preserve the pretrained features, which are crucial for handling unseen domains. Through extensive experiments, we demonstrate that the proposed layer selection framework improves the performance of existing TTA approaches across multiple datasets, domain shifts, model architectures, and TTA losses.

2024-10-10

NeurIPS.cc/2024/Workshop/FITML (poster)

openreview.net

Learning Robust Representations for Transfer in Reinforcement Learning

Faisal Mohamed

Roger Creus Castanyer

Hongyao Tang

Zahra Sheikhbahaee

Glen Berseth

Learning transferable representations for deep reinforcement learning (RL) is a challenging problem due to the inherent non-stationarity, di… (see more)stribution shift, and unstable training dynamics. To be useful, a transferable representation needs to be robust to such factors. In this work, we introduce a new architecture and training strategy for learning robust representations for transfer learning in RL. We propose leveraging multiple CNN encoders and training them not to specialize in areas of the state space but instead to match each other's representation. We find that learned representations transfer well across many Atari tasks, resulting in better transfer learning performance and data efficiency than training from scratch.

2024-10-10

NeurIPS.cc/2024/Workshop/FITML (poster)

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications