Publications

Randomized Confidence Bounds for Stochastic Partial Monitoring

Maxime Heuillet

Ola Ahmad

Audrey Durand

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

A Reinforcement Learning Pipeline for Band Gap-directed Crystal Generation

Prashant Govindarajan

Mathieu Reymond

Santiago Miret

Antoine Clavaud

Mariano Phielipp

Sarath Chandar

Property-driven AI-automated material discovery presents unique challenges owing to the complex nature of the chemical structural space and … (see more)computationally expensive simulations. For crystalline solids, the band gap is an important property for designing semiconductors and batteries. However, optimizing crystals for a target band gap is difficult and not well-explored. Reinforcement learning (RL) shows promise towards optimizing crystals, as it can freely explore the chemical space. However, it relies on regular band gap evaluations, which can only be accurately computed through expensive Density Functional Theory (DFT) simulations. In this study, we propose an active learning-inspired pipeline that combines RL and DFT simulations for optimizing crystal compositions given a target band gap. The pipeline includes an RL policy for predicting atom types and a band gap network that is fine-tuned with DFT data. Preliminary results indicate the need for furthering the state-of-the-art to address the inherent challenges of the problem.

2024-07-08

BOKU.ac.at/2024/AI4Mat (poster)

openreview.net

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Erick Delage

Angelos Georghiou

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (see more)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

A Scalable Architecture for Future Regenerative Satellite Payloads

Olfa Ben Yahia

Zineb Garroussi

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

This paper addresses the limitations of current satellite payload architectures, which are predominantly hardware-driven and lack the flexib… (see more)ility to adapt to increasing data demands and uneven traffic. To overcome these challenges, we present a novel architecture for future regenerative and programmable satellite payloads and utilize interconnected modem banks to promote higher scalability and flexibility. We formulate an optimization problem to efficiently manage traffic among these modem banks and balance the load. Additionally, we provide comparative numerical simulation results, considering end-to-end delay and packet loss analysis. The results illustrate that our proposed architecture maintains lower delays and packet loss even with higher traffic demands and smaller buffer sizes.

2024-07-08

ArXiv (preprint)

doi.org

arxiv.org

A Scalable Architecture for Future Regenerative Satellite Payloads

Olfa Ben Yahia

Zineb Garroussi

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

This paper addresses the limitations of current satellite payload architectures, which are predominantly hardware-driven and lack the flexib… (see more)ility to adapt to increasing data demands and uneven traffic. To overcome these challenges, we present a novel architecture for future regenerative and programmable satellite payloads and utilize interconnected modem banks to promote higher scalability and flexibility. We formulate an optimization problem to efficiently manage traffic among these modem banks and balance the load. Additionally, we provide comparative numerical simulation results, considering end-to-end delay and packet loss analysis. The results illustrate that our proposed architecture maintains lower delays and packet loss even with higher traffic demands and smaller buffer sizes.

2024-07-08

ArXiv (preprint)

doi.org

arxiv.org

A Scalable Architecture for Future Regenerative Satellite Payloads

Olfa Ben Yahia

Zineb Garroussi

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

This paper addresses the limitations of current satellite payload architectures, which are predominantly hardware-driven and lack the flexib… (see more)ility to adapt to increasing data demands and uneven traffic. To overcome these challenges, we present a novel architecture for future regenerative and programmable satellite payloads and utilize interconnected modem banks to promote higher scalability and flexibility. We formulate an optimization problem to efficiently manage traffic among these modem banks and balance the load. Additionally, we provide comparative numerical simulation results, considering end-to-end delay and packet loss analysis. The results illustrate that our proposed architecture maintains lower delays and packet loss even with higher traffic demands and smaller buffer sizes.

2024-07-08

ArXiv (preprint)

doi.org

arxiv.org

SelfIE: Self-Interpretation of Large Language Model Embeddings

Haozhe Chen

Carl Vondrick

Chengzhi Mao

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

2024-07-08

TMLR (accepted)

doi.org

openreview.net

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

Matthias Weissenbacher

Rishabh Agarwal

Yoshinobu Kawahara

An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as … (see more)well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT’s superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Stealing part of a production language model

Nicholas Carlini

Daniel Paleka

Krishnamurthy Dj Dvijotham

Thomas Steinke

Jonathan Hayase

A. Feder Cooper

Katherine Lee

Matthew Jagielski

Milad Nasr

Arthur Conmy

Eric Wallace

David Rolnick

Florian Tramèr

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like … (see more)OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \\

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

Stochastic positional embeddings improve masked image modeling

Amir Bar

Florian Bordes

Assaf Shocher

Mahmoud Assran

Pascal Vincent

Nicolas Ballas

Trevor Darrell

Amir Globerson

Yann LeCun

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

openreview.net

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Pablo Samuel Castro

Aleksandra Faust

Aviral Kumar

Rishabh Agarwal

Value functions are an essential component in deep reinforcement learning (RL), that are typically trained via mean squared error regression… (see more) to match bootstrapped target values. However, scaling value-based RL methods to large networks has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We show that training value functions with categorical cross-entropy significantly enhances performance and scalability across various domains, including single-task RL on Atari 2600 games, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that categorical cross-entropy mitigates issues inherent to value-based RL, such as noisy targets and non-stationarity. We argue that shifting to categorical cross-entropy for training value functions can substantially improve the scalability of deep RL at little-to-no cost.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

doi.org

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications