Younesse Kaddar

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e

Yoshua Bengio

Sujets de recherche

Apprentissage de la programmation

Apprentissage profond

Modèles génératifs

Modèles probabilistes

Raisonnement

Site web

GitHub

Publications

Can a Bayesian Oracle Prevent Harm from an Agent?

Yoshua Bengio

Michael K. Cohen

Nikolay Malkin

Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the … (voir plus)long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.

2025-05-06

Conference on Uncertainty in Artificial Intelligence (poster)

doi.org

proceedings.mlr.press

Amortizing Intractable Inference in Large Language Models

Edward J. Hu

Nikolay Malkin

Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This l… (voir plus)imits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use.

2024-01-15

ICLR.cc/2024/Conference (présentation orale)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Younesse Kaddar

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Younesse Kaddar

Publications