Publications

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Stanczak

Nicholas Meade

Mehar Bhatia

Hattie Zhou

Konstantin Böttinger

Jeremy Barnes

Jason Stanley

Jessica Montgomery

Richard Zemel

Nicolas Papernot

Nicolas Chapados

Denis Therien

Timothy P. Lillicrap

Ana Marasovic

Sylvie Delacroix

Gillian K. Hadfield

Siva Reddy

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values… (see more) - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

2025-03-05

ICLR.cc/2025/Workshop/Bi-Align (poster)

doi.org

openreview.net

On the Role of Prompt Multiplicity in LLM Hallucination Evaluation

Prakhar Ganesh

Reza Shokri

Golnoosh Farnadi

Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Existing hallucination benchmarks often o… (see more)verlook prompt sensitivity, due to stable accuracy scores despite prompt variations. However, such stability can be misleading. In this work, we introduce prompt multiplicity--the multiplicity of individual hallucinations depending on the input prompt--and study its role in LLM hallucination benchmarks. We find severe multiplicity, with even more than 50% of responses changing between correct and incorrect answers simply based on the prompt for certain benchmarks, like Med-HALT. Prompt multiplicity also gives us the lens to distinguish between randomness in generation and consistent factual inaccuracies, providing a more nuanced understanding of LLM hallucinations and their real-world harms. By situating our discussion within existing hallucination taxonomies--supporting their quantification--and exploring its relationship with uncertainty in generation, we highlight how prompt multiplicity fills a critical gap in the literature on LLM hallucinations.

2025-03-05

ICLR.cc/2025/Workshop/BuildingTrust (accepted)

openreview.net

Training Plug n' Play Knowledge Modules with Deep Context Distillation

Lucas Caccia

Alan Ansell

Ivan Vulić

Edoardo Ponti

Alessandro Sordoni

Dynamically integrating new or rapidly evolving information after Language Model (LM) pre-training remains challenging, particularly in low-… (see more)data scenarios or when dealing with private and specialized documents. In-context learning and retrieval-augmented generation (RAG) face limitations, including their high inference costs and their inability to capture global document information. In this paper, we propose a way of modularizing knowledge by training Knowledge Modules (KMs). KMs are lightweight components implemented as parameter-efficient LoRA modules, which are trained to store information about new documents and can be easily plugged into models on demand. We show that next-token prediction performs poorly in training KMs. We instead propose Deep Context Distillation: we learn KMs parameters such as to simulate hidden states and logits of a teacher that takes the document in context. Our method outperforms standard next-token prediction and pre-instruction training techniques, across two datasets. Finally, we highlight synergies between KMs and retrieval-augmented generation.

2025-03-05

ICLR.cc/2025/Workshop/MCDC (accepted)

openreview.net

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

Lucas Caccia

Alan Ansell

Edoardo Ponti

Ivan Vulić

Alessandro Sordoni

2025-03-05

ICLR.cc/2025/Workshop/MCDC (accepted)

doi.org

openreview.net

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

Yuhe Jiang

Xun Deng

Jiacheng Yang

Honghua Dong

Gennady Pekhimenko

Fan Long

Xujie Si

Type inference for dynamic languages like Python is a persistent challenge in software engineering. While large language models (LLMs) have … (see more)shown promise in code understanding, their type inference capabilities remain underexplored. We introduce `TypyBench`, a benchmark designed to evaluate LLMs' type inference across entire Python repositories. `TypyBench` features two novel metrics: `TypeSim`, which captures nuanced semantic relationships between predicted and ground truth types, and `TypeCheck`, which assesses type consistency across codebases. Our evaluation of various LLMs on a curated dataset of 50 high-quality Python repositories reveals that, although LLMs achieve decent `TypeSim` scores, they struggle with complex nested types and exhibit significant type consistency errors. These findings suggest that future research should shift focus from improving type similarity to addressing repository-level consistency. `TypyBench` provides a foundation for this new direction, offering insights into model performance across different type complexities and usage contexts.

2025-03-05

ICLR.cc/2025/Workshop/DL4C (published)

openreview.net

Understanding (Un)Reliability of Steering Vectors in Language Models

Joschka Braun

Carsten Eickhoff

David Scott Krueger

Seyed Ali Bahrainian

Dmitrii Krasheninnikov

Steering vectors are a lightweight method to control language model behavior by adding a learned bias to the activations at inference time. … (see more)Although steering demonstrates promising performance, recent work shows that it can be unreliable or even counterproductive in some cases. This paper studies the influence of prompt types and the geometry of activation differences on steering reliability. First, we find that all seven prompt types used in our experiments produce a net positive steering effect, but exhibit high variance across samples, and often give an effect opposite of the desired one. No prompt type clearly outperforms the others, and yet the steering vectors resulting from the different prompt types often differ directionally (as measured by cosine similarity). Second, we show that higher cosine similarity between training set activation differences predicts more effective steering. Finally, we observe that datasets where positive and negative activations are better separated are more steerable. Our results suggest that vector steering is unreliable when the target behavior is not represented by a coherent direction.

2025-03-05

ICLR.cc/2025/Workshop/Bi-Align (poster)

doi.org

openreview.net

UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Alireza Dehghanpour Farashah

Aditi Khandelwal

Negar Rostamzadeh

Golnoosh Farnadi

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language re… (see more)sources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

2025-03-05

ICLR.cc/2025/Workshop/BuildingTrust (accepted)

openreview.net

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Rabiul Awal

Mahsa Massoud

Zichao Li

Aarash Feizi

Suyuchen Wang

Chris Pal

Aishwarya Agrawal

David Vazquez

Siva Reddy

Juan A. Rodriguez

Perouz Taslakian

Spandana Gella

Sai Rajeswar

Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks addre… (see more)ss isolated web-based tasks—such as website-based Visual Question Answering (VQA) and UI-to-code generation—they lack a unified evaluation suite for assessing web agents that interact with and reason about web environments. We introduce WebMMU, a large-scale benchmark for evaluating AI-driven web agents across multilingual website VQA, HTML/CSS/JavaScript code editing, and sketch-to-code generation. WebMMU provides a comprehensive evaluation suite with real-world website data, multi-step reasoning tasks, and functional UI understanding. Benchmarking state-of-the-art multimodal models on WebMMU reveals significant limitations in web-based reasoning, layout understanding, and structured code generation, particularly in preserving UI hierarchy, handling multilingual content, and producing robust, functional code. While most existing models are optimized for English-only settings, WebMMU highlights the challenges of cross-lingual adaptation in real-world web development. These findings expose critical gaps in current models’ ability to understand website structures, execute user instructions, and generate high-quality web code, underscoring the need for more advanced multimodal reasoning in AI-driven web understanding and development.

2025-03-05

ICLR.cc/2025/Workshop/DL4C (published)

openreview.net

Automated diagnosis of usual interstitial pneumonia on chest CT via the mean curvature of isophotes

Peter Savadjiev

Morteza Rezanejad

Sahir Bhatnagar

David Camirand

Claude Kauffmann

Kaleem Siddiqi

Ronald J Dandurand

Patrick Bourgouin

Carl Chartrand-Lefebvre

Alexandre Semionov

2025-03-04

medRxiv (preprint)

doi.org

AI Automatons: AI Systems Intended to Imitate Humans

Alexandra Olteanu

Solon Barocas

Su Lin Blodgett

Lisa Egede

Alicia DeVrio

Myra Cheng

There is a growing proliferation of AI systems designed to mimic people's behavior, work, abilities, likenesses, or humanness -- systems we … (see more)dub AI automatons. Individuals, groups, or generic humans are being simulated to produce creative work in their styles, to respond to surveys in their places, to probe how they would use a new system before deployment, to provide users with assistance and companionship, and to anticipate their possible future behavior and interactions with others, just to name a few applications. The research, design, deployment, and availability of such AI systems have, however, also prompted growing concerns about a wide range of possible legal, ethical, and other social impacts. To both 1) facilitate productive discussions about whether, when, and how to design and deploy such systems, and 2) chart the current landscape of existing and prospective AI automatons, we need to tease apart determinant design axes and considerations that can aid our understanding of whether and how various design choices along these axes could mitigate -- or instead exacerbate -- potential adverse impacts that the development and use of AI automatons could give rise to. In this paper, through a synthesis of related literature and extensive examples of existing AI systems intended to mimic humans, we develop a conceptual framework to help foreground key axes of design variations and provide analytical scaffolding to foster greater recognition of the design choices available to developers, as well as the possible ethical implications these choices might have.

2025-03-04

ArXiv (preprint)

arxiv.org

Beyond Cosine Decay: On the effectiveness of Infinite Learning Rate Schedule for Continual Pre-training

Vaibhav Singh

Paul Janson

Paria Mehrbod

Adam Ibrahim

Irina Rish

Eugene Belilovsky

Benjamin Thérien

The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. Whi… (see more)le self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.

2025-03-04

ArXiv (preprint)

arxiv.org

Considerations and recommendations from the ISMRM diffusion study group for preclinical diffusion MRI: Part 2-Ex vivo imaging: Added value and acquisition.

Kurt G Schilling

Francesco Grussu

Andrada Ianus

Brian Hansen

Amy F. D. Howard

Rachel L. C. Barrett

Fatima Nasrallah

Manisha Aggarwal

Stijn Michielse

Warda Syeda

Nian Wang

Andrew F. Bagdasarian

Jelle Veraart

Alard Roebroeck

Cornelius Eichner

Farshid Sepehrband

Jan Zimmermann

Lucas Soustelle

Christien Bowman

Benjamin C. Tendler … (see 38 more)

Andreea Hertanu

Ben Jeurissen

Marleen Verhoye

Lucio Frydman

Yohan van de Looij

David Hike

Jeff F. Dunn

Karla Miller

Bennett Landman

Noam Shemesh

Arthur Anderson

Emilie McKinnon

Shawna Farquharson

Mathieu D. Santin

Flavio Dell’Acqua

Carlo Pierpaoli

Samuel C. Grant

Ivana Drobnjak

Andre Obenaus

Alexander Leemans

Kevin D. Harkins

Maxime Descoteaux

Duan Xu

Hao Huang

Gene S. Kim

Dan Wu

Denis Le Bihan

Stephen J. Blackband

Matthew D. Budde

Luisa Ciobanu

Els Fieremans

Ruiliang Bai

Trygve B. Leergaard

Jiangyang Zhang

Tim B. Dyrby

G. Allan Johnson

Julien Cohen-Adad

Ileana O. Jelescu

The value of preclinical diffusion MRI (dMRI) is substantial. While dMRI enables in vivo non-invasive characterization of tissue, ex vivo d… (see more)MRI is increasingly being used to probe tissue microstructure and brain connectivity. Ex vivo dMRI has several experimental advantages including higher SNR and spatial resolution compared to in vivo studies, and enabling more advanced diffusion contrasts for improved microstructure and connectivity characterization. Another major advantage of ex vivo dMRI is the direct comparison with histological data, as a crucial methodological validation. However, there are a number of considerations that must be made when performing ex vivo experiments. The steps from tissue preparation, image acquisition and processing, and interpretation of results are complex, with many decisions that not only differ dramatically from in vivo imaging of small animals, but ultimately affect what questions can be answered using the data. This work represents "Part 2" of a three-part series of recommendations and considerations for preclinical dMRI. We describe best practices for dMRI of ex vivo tissue, with a focus on the value that ex vivo imaging adds to the field of dMRI and considerations in ex vivo image acquisition. We first give general considerations and foundational knowledge that must be considered when designing experiments. We briefly describe differences in specimens and models and discuss why some may be more or less appropriate for different studies. We then give guidelines for ex vivo protocols, including tissue fixation, sample preparation, and MR scanning. In each section, we attempt to provide guidelines and recommendations, but also highlight areas for which no guidelines exist (and why), and where future work should lie. An overarching goal herein is to enhance the rigor and reproducibility of ex vivo dMRI acquisitions and analyses, and thereby advance biomedical knowledge.

2025-03-04

Magnetic Resonance in Medicine (published)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications