Publications

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Pooneh Mousavi

Yingzhi Wang

Mirco Ravanelli

Cem Subakan

2025-05-01

arXiv (published)

Aligning Protein Conformation Ensemble Generation with Physical Feedback

Jiarui Lu

Xiaoyin Chen

Stephen Zhewen Lu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Jian Tang

Protein dynamics play a crucial role in protein biological functions and properties, and their traditional study typically relies on time-co… (see more)nsuming molecular dynamics (MD) simulations conducted in silico. Recent advances in generative modeling, particularly denoising diffusion models, have enabled efficient accurate protein structure prediction and conformation sampling by learning distributions over crystallographic structures. However, effectively integrating physical supervision into these data-driven approaches remains challenging, as standard energy-based objectives often lead to intractable optimization. In this paper, we introduce Energy-based Alignment (EBA), a method that aligns generative models with feedback from physical models, efficiently calibrating them to appropriately balance conformational states based on their energy differences. Experimental results on the MD ensemble benchmark demonstrate that EBA achieves state-of-the-art performance in generating high-quality protein ensembles. By improving the physical plausibility of generated structures, our approach enhances model predictions and holds promise for applications in structural biology and drug discovery.

2025-05-01

ICML.cc/2025/Conference (poster)

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change

Manuela Gonz'alez-Gonz'alez

Soufiane Belharbi

Muhammad Osama Zeeshan

Masoumeh Sharafi

Muhammad Haseeb Aslam

Marco Pedersoli

Alessandro Lameiras Koerich

Simon Bacon

Eric Granger

Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of … (see more)digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H. This paper introduces a first Behavioural Ambivalence/Hesitancy (BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data. We include results baselines for BAH at frame- and video-level recognition in multi-modal setups, in addition to zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.

2025-05-01

arXiv (published)

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava

Siamak Ravanbakhsh

Adam M. Oberman

2025-05-01

arXiv (published)

Bidirectional Information Flow (BIF) - A Sample Efficient Hierarchical Gaussian Process for Bayesian Optimization

Juan David Guerra

Thomas Garbay

Guillaume Lajoie

Marco Bonizzato

Hierarchical Gaussian Process (H-GP) models divide problems into different subtasks, allowing for different models to address each part, mak… (see more)ing them well-suited for problems with inherent hierarchical structure. However, typical H-GP models do not fully take advantage of this structure, only sending information up or down the hierarchy. This one-way coupling limits sample efficiency and slows convergence. We propose Bidirectional Information Flow (BIF), an efficient H-GP framework that establishes bidirectional information exchange between parent and child models in H-GPs for online training. BIF retains the modular structure of hierarchical models - the parent combines subtask knowledge from children GPs - while introducing top-down feedback to continually refine children models during online learning. This mutual exchange improves sample efficiency, enables robust training, and allows modular reuse of learned subtask models. BIF outperforms conventional H-GP Bayesian Optimization methods, achieving up to 85% and 5x higher

2025-05-01

arXiv (published)

Building spatial world models from sparse transitional episodic memories

Zizhan He

Maxime Daigle

Pouya Bashivan

Many animals possess a remarkable capacity to rapidly construct flexible mental models of their environments. These world models are crucial… (see more) for ethologically relevant behaviors such as navigation, exploration, and planning. The ability to form episodic memories and make inferences based on these sparse experiences is believed to underpin the efficiency and adaptability of these models in the brain. Here, we ask: Can a neural network learn to construct a spatial model of its surroundings from sparse and disjoint episodic memories? We formulate the problem in a simulated world and propose a novel framework, the Episodic Spatial World Model (ESWM), as a potential answer. We show that ESWM is highly sample-efficient, requiring minimal observations to construct a robust representation of the environment. It is also inherently adaptive, allowing for rapid updates when the environment changes. In addition, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training.

2025-05-01

arXiv (published)

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Yingzhi Wang

Anas Alhmoud

Saad Alsahly

Muhammad Alqurishi

Mirco Ravanelli

2025-05-01

arXiv (published)

Caption This, Reason That: VLMs Caught in the Middle

Zihan Weng

Lucas Gomez

Taylor Whittington Webb

Pouya Bashivan

2025-05-01

arXiv (published)

Compositional Risk Minimization

Divyat Mahajan

Mohammad Pezeshki

Charles Arnal

Ioannis Mitliagkas

Kartik Ahuja

Pascal Vincent

2025-05-01

ICML.cc/2025/Conference (poster)

Context is Key: A Benchmark for Forecasting with Essential Textual Information

Andrew Robert Williams

Arjun Ashok

Étienne Marcotte

Valentina Zantedeschi

Jithendaraa Subramanian

Roland Riachi

James Requeima

Alexandre Lacoste

Irina Rish

Nicolas Chapados

Alexandre Drouin

2025-05-01

ICML.cc/2025/Conference (poster)

Dimension-adapted Momentum Outscales SGD

Damien Ferbach

Katie Everett

Gauthier Gidel

Elliot Paquette

Courtney Paquette

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (see more)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

2025-05-01

arXiv (published)