Publications

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Mats Leon Richter

Oliver Sonnentag

Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO…

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Celo: Training Versatile Learned Optimizers on a Compute Diet

Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned upda… (voir plus)te rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.

2025-01-22

ArXiv (prépublication)

arxiv.org

Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery

Amin Abyaneh

Mahrokh Ghoddousi Boroujeni

Hsiu-Chin Lin

Giancarlo Ferrari-Trecate

Imitation learning is a data-driven approach to learning policies from expert behavior, but it is prone to unreliable outcomes in out-of-sam… (voir plus)ple (OOS) regions. While previous research relying on stable dynamical systems guarantees convergence to a desired state, it often overlooks transient behavior. We propose a framework for learning policies using modeled by contractive dynamical systems, ensuring that all policy rollouts converge regardless of perturbations, and in turn, enable efficient OOS recovery. By leveraging recurrent equilibrium networks and coupling layers, the policy structure guarantees contractivity for any parameter choice, which facilitates unconstrained optimization. Furthermore, we provide theoretical upper bounds for worst-case and expected loss terms, rigorously establishing the reliability of our method in deployment. Empirically, we demonstrate substantial OOS performance improvements in robotics manipulation and navigation tasks in simulation.

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Credit-based self organizing maps: training deep topographic networks with minimal performance degradation.

Amirozhan Dehghani

Xinyu Qian

Asa Borzabadi Farahani

Pouya Bashivan

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Credit-based self organizing maps: training deep topographic networks with minimal performance degradation

Amirozhan Dehghani

Xinyu Qian

Asa Farahani

Pouya Bashivan

In the primate neocortex, neurons with similar function are often found to be spatially close. Kohonen's self-organizing map (SOM) has been … (voir plus)one of the most influential approaches for simulating brain-like topographical organization in artificial neural network models. However, integrating these maps into deep neural networks with multitude of layers has been challenging, with self-organized deep neural networks suffering from substantially diminished capacity to perform visual recognition. We identified a key factor leading to the performance degradation in self-organized topographical neural network models: the discord between predominantly bottom-up learning updates in the self-organizing maps, and those derived from top-down, credit-based learning approaches. To address this, we propose an alternative self organization algorithm, tailored to align with the top-down learning processes in deep neural networks. This model not only emulates critical aspects of cortical topography but also significantly narrows the performance gap between non-topographical and topographical models. This advancement underscores the substantial importance of top-down assigned credits in shaping topographical organization. Our findings are a step in reconciling topographical modeling with the functional efficacy of neural network models, paving the way for more brain-like neural architectures.

2025-01-22

ICLR.cc/2025/Conference (poster)

openreview.net

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Ghada Sokar

Johan Samir Obando Ceron

Aaron Courville

Hugo Larochelle

Pablo Samuel Castro

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While sof… (voir plus)t mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.

2025-01-22

ICLR.cc/2025/Conference (spotlight)

doi.org

openreview.net

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

DiJia Su

Sainbayar Sukhbaatar

Michael Rabbat

Yuandong Tian

Qinqing Zheng

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Zhen Liu

Tim Z. Xiao

Weiyang Liu

Yoshua Bengio

Dinghuai Zhang

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetun… (voir plus)e pretrained diffusion models on some reward functions that are either designed by experts or learned from small-scale datasets. Existing methods for finetuning diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. Inspired by recent successes in generative flow networks (GFlowNets), a class of probabilistic models that sample with the unnormalized density of a reward function, we propose a novel GFlowNet method dubbed Nabla-GFlowNet (abbreviated as

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Matthew D Riemer

Gopeshh Subbaraj

Glen Berseth

Irina Rish

Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (voir plus)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokémon and Tetris.

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Expressivity of Neural Networks with Random Weights and Learned Biases

Avery Hee-Woon Ryoo

Luca Mazzucato

2025-01-22

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu

Yossiri Adulyasak

Nima Akbarzadeh

Erick Delage

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (voir plus)esource constraints couple the action spaces of

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

openreview.net

Fast Convergence of Softmax Policy Mirror Ascent

Reza Asad

Reza Babanezhad Harikandeh

Issam Hadj Laradji

Nicolas Le Roux

Sharan Vaswani

Natural policy gradient (NPG) is a common policy optimization algorithm and can be viewed as mirror ascent in the space of probabilities. Re… (voir plus)cently, Vaswani et al. (2021) introduced a policy gradient method that corresponds to mirror ascent in the dual space of logits. We refine this algorithm, removing its need for a normalization across actions and analyze the resulting method (referred to as SPMA). For tabular MDPs, we prove that SPMA with a constant step-size matches the linear convergence of NPG and achieves a faster convergence than constant step-size (accelerated) softmax policy gradient. To handle large state-action spaces, we extend SPMA to use a log-linear policy parameterization. Unlike that for NPG, generalizing SPMA to the linear function approximation (FA) setting does not require compatible function approximation. Unlike MDPO, a practical generalization of NPG, SPMA with linear FA only requires solving convex softmax classification problems. We prove that SPMA achieves linear convergence to the neighbourhood of the optimal value function. We extend SPMA to handle non-linear FA and evaluate its empirical performance on the MuJoCo and Atari benchmarks. Our results demonstrate that SPMA consistently achieves similar or better performance compared to MDPO, PPO and TRPO.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

proceedings.mlr.press

openreview.net

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications