Publications

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Ho Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

A. Irpan

Sergey Levine

Pablo Samuel Castro

Aleksandra Faust

Aviral Kumar

Rishabh Agarwal

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (voir plus)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

2024-03-06

ArXiv (prépublication)

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong

Xiaoyin Chen

Yash More

Vedant Shah

Yoshua Bengio

2024-03-05

ICLR.cc/2024/Workshop/AGI (poster)

Optimisation of quantitative brain diffusion-relaxation MRI acquisition protocols with physics-informed machine learning.

Álvaro Planchuelo-Gómez

Maxime Descoteaux

Hugo Larochelle

Jana Hutter

Derek K. Jones

C. Tax

2024-03-05

Medical Image Analysis (publié)

Plant invasion in Mediterranean Europe: current hotspots and future scenarios

Luigi Cao Pinna

Laure Gallien

Laura J. Pollock

Irena Axmanová

Milan Chytrý

Marco Malavasi

Alicia T. R. Acosta

Juan Antonio Campos

Marta Carboni

The Mediterranean Basin has historically been subject to alien plant invasions that threaten its unique biodiversity. This seasonally dry an… (voir plus)d densely populated region is undergoing severe climatic and socioeconomic changes, and it is unclear whether these changes will worsen or mitigate plant invasions. Predictions are often biased, as species may not be in equilibrium in the invaded environment, depending on their invasion stage and ecological characteristics. To address future predictions uncertainty, we identified invasion hotspots across multiple biased modelling scenarios and ecological characteristics of successful invaders. We selected 92 alien plant species widespread in Mediterranean Europe and compiled data on their distribution in the Mediterranean and worldwide. We combined these data with environmental and propagule pressure variables to model global and regional species niches, and map their current and future habitat suitability. We identified invasion hotspots, examined their potential future shifts, and compared the results of different modelling strategies. Finally, we generalised our findings by using linear models to determine the traits and biogeographic features of invaders most likely to benefit from global change. Currently, invasion hotspots are found near ports and coastlines throughout Mediterranean Europe. However, many species occupy only a small portion of the environmental conditions to which they are preadapted, suggesting that their invasion is still an ongoing process. Future conditions will lead to declines in many currently widespread aliens, which will tend to move to higher elevations and latitudes. Our trait models indicate that future climates will generally favour species with conservative ecological strategies that can cope with reduced water availability, such as those with short stature and low specific leaf area. Taken together, our results suggest that in future environments, these conservative aliens will move farther from the introduction areas and upslope, threatening mountain ecosystems that have been spared from invasions so far.

2024-03-05

Ecography (publié)

The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa

Mercy Nyamewaa Asiedu

Awa Dieng

Alexander Haykel

Negar Rostamzadeh

Stephen R. Pfohl

Chirag Nagpal

Maria Nagawa

Abigail Oppong

Sanmi Koyejo

Katherine Heller

With growing application of machine learning (ML) technologies in healthcare, there have been calls for developing techniques to understand … (voir plus)and mitigate biases these systems may exhibit. Fair-ness considerations in the development of ML-based solutions for health have particular implications for Africa, which already faces inequitable power imbalances between the Global North and South.This paper seeks to explore fairness for global health, with Africa as a case study. We conduct a scoping review to propose axes of disparities for fairness consideration in the African context and delineate where they may come into play in different ML-enabled medical modalities. We then conduct qualitative research studies with 672 general population study participants and 28 experts inML, health, and policy focused on Africa to obtain corroborative evidence on the proposed axes of disparities. Our analysis focuses on colonialism as the attribute of interest and examines the interplay between artificial intelligence (AI), health, and colonialism. Among the pre-identified attributes, we found that colonial history, country of origin, and national income level were specific axes of disparities that participants believed would cause an AI system to be biased.However, there was also divergence of opinion between experts and general population participants. Whereas experts generally expressed a shared view about the relevance of colonial history for the development and implementation of AI technologies in Africa, the majority of the general population participants surveyed did not think there was a direct link between AI and colonialism. Based on these findings, we provide practical recommendations for developing fairness-aware ML solutions for health in Africa.

2024-03-05

ArXiv (prépublication)

The World Health Organization as an engine of ideational robustness

Jean-Louis Denis

Gaelle Foucault

Pierre Larouche

Catherine Régis

Miriam Cohen

Marie-Andree Girard

2024-03-05

Policy and Society (publié)

Enhancing and Evaluating Logical Reasoning Abilities of Large Language Models

Shujie Deng

Honghua Dong

Xujie Si

2024-03-04

ICLR.cc/2024/Workshop/SeT_LLM (publié)

A Generative Model of Symmetry Transformations

James U. Allingham

Bruno Mlodozeniec

Shreyas Padhy

Javier Antor'an

David Scott Krueger

Richard E. Turner

Eric T. Nalisnick

Jos'e Miguel Hern'andez-Lobato

Correctly capturing the symmetry transformations of data can lead to efficient models with strong generalization capabilities, though method… (voir plus)s incorporating symmetries often require prior knowledge. While recent advancements have been made in learning those symmetries directly from the dataset, most of this work has focused on the discriminative setting. In this paper, we construct a generative model that explicitly aims to capture symmetries in the data, resulting in a model that learns which symmetries are present in an interpretable way. We provide a simple algorithm for efficiently learning our generative model and demonstrate its ability to capture symmetries under affine and color transformations. Combining our symmetry model with existing generative models results in higher marginal test-log-likelihoods and robustness to data sparsification.

2024-03-04

ArXiv (prépublication)

MagicClay: Sculpting Meshes With Generative Neural Fields

Amir Barda

Vladimir Kim

Noam Aigerman

Amit H. Bermano

Thibault Groueix

The recent developments in neural fields have brought phenomenal capabilities to the field of shape generation, but they lack crucial proper… (voir plus)ties, such as incremental control - a fundamental requirement for artistic work. Triangular meshes, on the other hand, are the representation of choice for most geometry related tasks, offering efficiency and intuitive control, but do not lend themselves to neural optimization. To support downstream tasks, previous art typically proposes a two-step approach, where first a shape is generated using neural fields, and then a mesh is extracted for further processing. Instead, in this paper we introduce a hybrid approach that maintains both a mesh and a Signed Distance Field (SDF) representations consistently. Using this representation, we introduce MagicClay - an artist friendly tool for sculpting regions of a mesh according to textual prompts while keeping other regions untouched. Our framework carefully and efficiently balances consistency between the representations and regularizations in every step of the shape optimization; Relying on the mesh representation, we show how to render the SDF at higher resolutions and faster. In addition, we employ recent work in differentiable mesh reconstruction to adaptively allocate triangles in the mesh where required, as indicated by the SDF. Using an implemented prototype, we demonstrate superior generated geometry compared to the state-of-the-art, and novel consistent control, allowing sequential prompt-based edits to the same mesh for the first time.

2024-03-04

ArXiv (prépublication)

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Tikeng Notsawo Pascal Junior

Pascal Notsawo

Hattie Zhou

Mohammad Pezeshki

Irina Rish

Guillaume Dumas

2024-03-04

ICLR.cc/2024/Workshop/ME-FoMo (poster)

Self-evaluation and self-prompting to improve the reliability of LLMs

Alexandre Piché

Aristides Milios

Dzmitry Bahdanau

Chris Pal

In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (voir plus)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.

2024-03-04

ICLR.cc/2024/Workshop/SeT_LLM (publié)