Publications

Generalization Bounds via Meta-Learned Model Representations: PAC-Bayes and Sample Compression Hypernetworks

Benjamin Leblanc

Mathieu Bazinet

Nathaniel D'Amours

Alexandre Drouin

Pascal Germain

Both PAC-Bayesian and Sample Compress learning frameworks have been shown instrumental for deriving tight (non-vacuous) generalization bound… (see more)s for neural networks. We leverage these results in a meta-learning scheme, relying on a hypernetwork that outputs the parameters of a downstream predictor from a dataset input. The originality of our approach lies in the investigated hypernetwork architectures that encode the dataset before decoding the parameters: (1) a PAC-Bayesian encoder that expresses a posterior distribution over a latent space, (2) a Sample Compress encoder that selects a small sample of the dataset input along with a message from a discrete set, and (3) a hybrid between both approaches motivated by a new Sample Compress theorem handling continuous messages. The latter theorem exploits the pivotal information transiting at the encoder-decoder junction in order to compute generalization guarantees for each downstream predictor obtained by our meta-learning scheme.

2025-05-01

ICML.cc/2025/Conference (poster)

Generative AI: Hype, Hope, and Responsible Use in Science and Everyday Life

Doina Precup

2025-05-01

Biological Psychiatry (published)

Grokking Beyond the Euclidean Norm of Model Parameters

Tikeng Notsawo Pascal Junior

Pascal Notsawo

Guillaume Dumas

Guillaume Rabusseau

Grokking refers to a delayed generalization following overfitting when optimizing artificial neural networks with gradient-based methods. I… (see more)n this work, we demonstrate that grokking can be induced by regularization, either explicit or implicit. More precisely, we show that when there exists a model with a property

2025-05-01

ICML.cc/2025/Conference (poster)

Half Search Space is All You Need

Pavel Rumiantsev

Mark Coates

2025-05-01

arXiv (published)

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He

Rishabh Anand

Hiren Madhu

Ali Maatouk

Smita Krishnaswamy

Leandros Tassiulas

Menglin Yang 0001

Rex Ying

2025-05-01

arXiv (published)

Impact of through‐slice gradient optimization for dynamic slice‐wise shimming in the cervico‐thoracic spinal cord

Arnaud Breheret

Alexandre D'Astous

Yixin Ma

Jason P. Stockmann

Julien Cohen-Adad

2025-05-01

Magnetic Resonance in Medicine (published)

Improving Multilingual Math Reasoning for African Languages

Odunayo Ogundepo

Akintunde Oladipo

Kelechi Ogueji

Esther Adenuga

David Ifeoluwa Adelani

Jimmy Lin

Researchers working on low-resource languages face persistent challenges due to limited data availability and restricted access to computati… (see more)onal resources. Although most large language models (LLMs) are predominantly trained in high-resource languages, adapting them to low-resource contexts, particularly African languages, requires specialized techniques. Several strategies have emerged for adapting models to low-resource languages in todays LLM landscape, defined by multi-stage pre-training and post-training paradigms. However, the most effective approaches remain uncertain. This work systematically investigates which adaptation strategies yield the best performance when extending existing LLMs to African languages. We conduct extensive experiments and ablation studies to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations. Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.

2025-05-01

arXiv (published)

Improving the Scaling Laws of Synthetic Data with Deliberate Practice

Reyhane Askari Hemmat

Mohammad Pezeshki

Elvis Dohmatob

Florian Bordes

Pietro Astolfi

Melissa Hall

Jakob Verbeek

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

ICML.cc/2025/Conference (oral)

In-context learning and Occam's razor

Eric Elmoznino

Tom Marty

Tejas Kasetty

Leo Gagnon

Sarthak Mittal

Mahan Fathi

Dhanya Sridhar

Guillaume Lajoie

A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (see more)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.

2025-05-01

ICML.cc/2025/Conference (poster)

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen

Dongyan Lin

Mandana Samiei

Doina Precup

Blake Richards

Rob Fergus

Kenneth Marino

Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisi… (see more)ons. A crucial cognitive skill for such agents is the efficient exploration and understanding of the causal structure of the world -- key to robust, scientifically grounded reasoning. Yet, it remains unclear whether LMs possess this capability or exhibit systematic biases leading to erroneous conclusions. In this work, we examine LMs' ability to explore and infer causal relationships, using the well-established"Blicket Test"paradigm from developmental psychology. We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally (or sometimes even more) evidenced conjunctive ones. This"disjunctive bias"persists across model families, sizes, and prompting strategies, and performance further declines as task complexity increases. Interestingly, an analogous bias appears in human adults, suggesting that LMs may have inherited deep-seated reasoning heuristics from their training data. To this end, we quantify similarities between LMs and humans, finding that LMs exhibit adult-like inference profiles (but not children-like). Finally, we propose a test-time sampling method which explicitly samples and eliminates hypotheses about causal relationships from the LM. This scalable approach significantly reduces the disjunctive bias and moves LMs closer to the goal of scientific, causally rigorous reasoning.

2025-05-01

arXiv (published)

Language Models over Canonical Byte-Pair Encodings

Tim Vieira

Tianyu Liu

Clemente Pasti

Yahya Emara

Brian DuSell

Benjamin LeBrun

Mario Giulianelli

Juan Luis Gastaldi

Timothy O'Donnell

Ryan Cotterell

2025-05-01

ICML.cc/2025/Conference (poster)