Kusha Sareen

The Role of Symmetry in Optimizing Overparameterized Networks

Sékou-Oumar Kaba

Overparameterization is central to the success of deep learning, yet the mechanisms by which it improves optimization remain incompletely un… (voir plus)derstood. We analyze weight-space symmetries in neural networks and show that overparameterization introduces additional symmetries that benefit optimization in two distinct ways. First, we prove that these symmetries act as a form of diagonal preconditioning on the Hessian, enabling the existence of better-conditioned minima within each equivalence class of functionally identical solutions. Second, we show that overparameterization increases the probability mass of global minima near typical initializations, making these favorable solutions more reachable. Teacher-student network experiments validate our theoretical predictions: as width increases, the Hessian trace decreases, condition numbers improve, and convergence accelerates. Our analysis provides a unified framework for understanding overparameterization and width growth as a geometric transformation of the loss landscape.

2026-04-27

arXiv (prépublication)

doi.org

arxiv.org

CUBE: A Standard for Unifying Agent Benchmarks

Alexandre Lacoste

Nicolas Gontier

Oleh Shliazhko

Aman Jaiswal

Kusha Sareen

Shailesh Nanisetty

Joan Cabezas

Manuel Del Verme

Omar G. Younis

Simone Baratta

Matteo Avalle

Imene Kerboua

Xing Han Lu

Elron Bandel

Michal Shmueli-Scheuer

Asaf Yehudai

Leshem Choshen

Jonathan Lebensold

Sean Hughes

Massimo Caccia … (voir 6 de plus)

Alexandre Drouin

Siva Reddy

Tao Yu

Yu Su

Graham Neubig

Dawn Song

The proliferation of agent benchmarks has created critical fragmentation that threatens research productivity. Each new benchmark requires s… (voir plus)ubstantial custom integration, creating an "integration tax" that limits comprehensive evaluation. We propose CUBE (Common Unified Benchmark Environments), a universal protocol standard built on MCP and Gym that allows benchmarks to be wrapped once and used everywhere. By separating task, benchmark, package, and registry concerns into distinct API layers, CUBE enables any compliant platform to access any compliant benchmark for evaluation, RL training, or data generation without custom integration. We call on the community to contribute to the development of this standard before platform-specific implementations deepen fragmentation as benchmark production accelerates through 2026.

2026-03-15

arXiv (prépublication)

doi.org

arxiv.org

Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models

Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (voir plus)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS

2025-12-02

Neural Information Processing Systems (Accept (poster))

doi.org

openreview.net

Energy Loss Functions for Physical Systems

Sékou-Oumar Kaba

Kusha Sareen

Daniel Levy

Siamak Ravanbakhsh

Effectively leveraging prior knowledge of a system's physics is crucial for applications of machine learning to scientific domains. Previous… (voir plus) approaches mostly focused on incorporating physical insights at the architectural level. In this paper, we propose a framework to leverage physical information directly into the loss function for prediction and generative modeling tasks on systems like molecules and spins. We derive energy loss functions assuming that each data sample is in thermal equilibrium with respect to an approximate energy landscape. By using the reverse KL divergence with a Boltzmann distribution around the data, we obtain the loss as an energy difference between the data and the model predictions. This perspective also recasts traditional objectives like MSE as energy-based, but with a physically meaningless energy. In contrast, our formulation yields physically grounded loss functions with gradients that better align with valid configurations, while being architecture-agnostic and computationally efficient. The energy loss functions also inherently respect physical symmetries. We demonstrate our approach on molecular generation and spin ground-state prediction and report significant improvements over baselines.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Morgane M Moss

2025-07-06

Conference on Language Modeling (accepté)

doi.org

openreview.net

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen

Daniel Levy

Arnab Kumar Mondal

Sékou-Oumar Kaba

Tara Akhound-Sadegh

Siamak Ravanbakhsh

Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (voir plus)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.

2024-10-22

NeurIPS.cc/2024/Workshop/NeurReps (poster)

doi.org

openreview.net

Boussole des politiques en IA

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Kusha Sareen

Publications

Boussole des politiques en IA

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Kusha Sareen

Publications