Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (see more)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment. Tradit… (see more)ionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimizes the reward through repeated RL procedures. This game-solving approach is both computationally expensive and difficult to stabilize. In this work, we propose a novel approach to IRL by direct policy optimization: exploiting a linear factorization of the return as the inner product of successor features and a reward vector, we design an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. Remarkably, our approach works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.
We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (see more)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.
Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained … (see more)using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.
2024-07-07
Proceedings of the 41st International Conference on Machine Learning (published)
This paper contributes a new approach for distributional reinforcement learning which elucidates a clean separation of transition structure … (see more)and reward in the learning process. Analogous to how the successor representation (SR) describes the expected consequences of behaving according to a given policy, our distributional successor measure (SM) describes the distributional consequences of this behaviour. We formulate the distributional SM as a distribution over distributions and provide theory connecting it with distributional and model-based reinforcement learning. Moreover, we propose an algorithm that learns the distributional SM from data by minimizing a two-level maximum mean discrepancy. Key to our method are a number of algorithmic techniques that are independently valuable for learning generative models of state. As an illustration of the usefulness of the distributional SM, we show that it enables zero-shot risk-sensitive policy evaluation in a way that was not previously possible.
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance s… (see more)cales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
We introduce a machine learning approach to determine the transition rates of silicon atoms on a single layer of carbon atoms, when stimulat… (see more)ed by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition rates. These rates are then applied to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
Journal Article Discovering the Electron Beam Induced Transition Rates for Silicon Dopants in Graphene with Deep Neural Networks in the STEM… (see more) Get access Kevin M Roccapriore, Kevin M Roccapriore Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Max Schwarzer, Max Schwarzer Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Joshua Greaves, Joshua Greaves Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Jesse Farebrother, Jesse Farebrother Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Rishabh Agarwal, Rishabh Agarwal Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, CanadaGoogle Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Colton Bishop, Colton Bishop Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Maxim Ziatdinov, Maxim Ziatdinov Center for Nanophase Materials Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, United StatesComputational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States Search for other works by this author on: Oxford Academic Google Scholar Igor Mordatch, Igor Mordatch Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Ekin D Cubuk, Ekin D Cubuk Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Aaron Courville, Aaron Courville Mila - Québec AI Institute, Montréal, QC, CanadaDepartment of Computer Science and Operations Research, Université de Montréal, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar ... Show more Pablo Samuel Castro, Pablo Samuel Castro Google Research, Brain Team Search for other works by this author on: Oxford Academic Google Scholar Marc G Bellemare, Marc G Bellemare Mila - Québec AI Institute, Montréal, QC, CanadaGoogle Research, Brain TeamSchool of Computer Science, McGill University, Montréal, QC, Canada Search for other works by this author on: Oxford Academic Google Scholar Sergei V Kalinin Sergei V Kalinin Department of Materials Science and Engineering, University of Tennessee, Knoxville TN, United States Corresponding author: sergei2@utk.edu Search for other works by this author on: Oxford Academic Google Scholar Microscopy and Microanalysis, Volume 29, Issue Supplement_1, 1 August 2023, Pages 1932–1933, https://doi.org/10.1093/micmic/ozad067.1000 Published: 22 July 2023
In this paper, we derive an algorithm that learns a principal subspace from sample entries, can be applied when the approximate subspace i… (see more)s represented by a neural network, and hence can bescaled to datasets with an effectively infinite number of rows and columns. Our method consistsin defining a loss function whose minimizer is the desired principal subspace, and constructing agradient estimate of this loss whose bias can be controlled.
2023-04-10
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (published)