Publications

Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

Maciej Wolczyk

Bartłomiej Cupiał

Mateusz Ostaszewski

Michal Bortkiewicz

Michał Zając

Razvan Pascanu

Lukasz Kuci'nski

Piotr Milo's

Fine-tuning is a widespread technique that allows practitioners to transfer pre-trained capabilities, as recently showcased by the successfu… (see more)l applications of foundation models. However, fine-tuning reinforcement learning (RL) models remains a challenge. This work conceptualizes one specific cause of poor transfer, accentuated in the RL setting by the interplay between actions and observations: forgetting of pre-trained capabilities. Namely, a model deteriorates on the state subspace of the downstream task not visited in the initial phase of fine-tuning, on which the model behaved well due to pre-training. This way, we lose the anticipated transfer benefits. We identify conditions when this problem occurs, showing that it is common and, in many cases, catastrophic. Through a detailed empirical analysis of the challenging NetHack and Montezuma’s Revenge environments, we show that standard knowledge retention techniques mitigate the problem and thus allow us to take full advantage of the pre-trained capabilities. In particular, in NetHack, we achieve a new state-of-the-art for neural models, improving the previous best score from

2023-12-31

ICML (published)

doi.org

proceedings.mlr.press

Fisher Flow Matching for Generative Modeling over Discrete Data

Oscar Davis

Samuel Kessler

Mircea Petrache

.Ismail .Ilkan Ceylan

Michael M. Bronstein

Avishek Bose

Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological … (see more)sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the

2023-12-31

NeurIPS (published)

doi.org

arxiv.org

A Foundation Model for Zero-shot Logical Query Reasoning

Mikhail Galkin

Jincheng Zhou

Bruno Ribeiro

Jian Tang

Zhaocheng Zhu

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional querie… (see more)s comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.

2023-12-31

NeurIPS (published)

doi.org

openreview.net

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

Peter Hase

Ekdeep Singh Lubana

Erik Jenner

Stephen Casper

Oliver Sourbut

Benjamin L. Edelman

Zhaowei Zhang

Mario Günther

Anton Korinek

Jose Hernandez-Orallo

Lewis Hammond

Eric Bigelow

Alexander Pan

Lauro Langosco

Tomasz Korbak … (see 22 more)

Heidi Zhang

Ruiqi Zhong

Seán Ó hÉigeartaigh

Gabriel Recchia

Giulio Corsi

Alan Chan

Markus Anderljung

Lilian Edwards

Aleksandar Petrov

Christian Schroeder de Witt

Sumeet Ramesh Motwani

Samuel Albanie

Yoshua Bengio

Danqi Chen

Philip H.S. Torr

Tegan Maharaj

Jakob Foerster

Florian Tramèr

He He

Atoosa Kasirzadeh

Yejin Choi

David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are o… (see more)rganized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose

2023-12-31

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

fPLSA: Learning Semantic Structures in Document Collections Using Foundation Models

Weijia Xu

Nebojsa Jojic

Nicolas Roux

2023-12-31

arXiv.org (preprint)

doi.org

openreview.net

A framework for fair decision-making over time with time-invariant utilities

Andrea Lodi

Sriram Sankaranarayanan

Guanyi Wang

2023-12-31

Eur. J. Oper. Res. (published)

doi.org

arxiv.org

G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks

Zhaoyu Li

Jinpei Guo

Xujie Si

2023-12-31

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Game On, Hate Off: A Study of Toxicity in Online Multiplayer Environments.

Zachary Yang

Nicolas Grenon-Godbout

Reihaneh Rabbany

The advent of online spaces, particularly social media platforms and video games, has brought forth a significant challenge: the detection a… (see more)nd mitigation of toxic and harmful speech. This issue is not only pervasive but also detrimental to the overall user experience. In this study, we leverage small language models to reliably detect toxicity, achieving an average precision of 0.95. Analyzing eight months of chat data from two Ubisoft games, we uncover patterns and trends in toxic behavior. The insights derived from our research will contribute to the development of healthier online communities and inform preventive measures against toxicity.

2023-12-31

Games Res. Pract. (published)

doi.org

Game Theoretical Formulation for Residential Community Microgrid via Mean Field Theory: Proof of Concept

Mohamad Aziz

Hanane Dagdougui

Issmail ElHallaoui

Incentive-based demand response aggregators are widely recognized as a powerful strategy to increase the flexibility of residential communit… (see more)y MG (RCM) while allowing consumers’ assets to participate in the operation of the power system in critical peak times. RCM implementing demand response approaches are of high interest as collectively, they have a high impact on shaping the demand curve during peak time while providing a wide range of economic and technical benefits to consumers and utilities. The penetration of distributed energy resources such as battery energy storage and photovoltaic systems introduces additional flexibility to manage the community loads and increase revenue. This letter proposes a game theoretical formulation for an incentive-based residential community microgrid, where an incentive-based pricing mechanism is developed to encourage peak demand reduction and share the incentive demand curve with the residential community through the aggregator. The aggregator’s objective is to maximize the welfare of the residential community by finding the optimal community equilibrium electricity price. Each household communicates with each other and with the distributed system operator (DSO) through the aggregator and aims to minimize the local electricity cost.

2023-12-31

IEEE Control Systems Letters (published)

doi.org

Generative Active Learning for the Search of Small-Molecule Protein Binders

Maksym Korablyov

Cheng-Hao Liu

Moksh Jain

Almer Van Der Sloot

Éric Jolicoeur

Edward Ruediger

Andrei Nica

Emmanuel Bengio

Kostiantyn Lapchevskyi

Daniel St-Cyr

Doris Alexandra Schuetz

Victor Ion Butoi

Saikrishna Gottipati

Prateek Gupta

Ladislav Rampasek … (see 14 more)

Sasikanth Avancha

Pierre-Luc Bacon

William Hamilton

Brooks Paige

Sanchit Misra

Stanislaw Jastrzebski

Bharat Kaul

Doina Precup

José Miguel Hernández-Lobato

Marwin Segler

Michael Bronstein

Anne Marinier

Mike Tyers

Yoshua Bengio

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (see more)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

2023-12-31

arXiv (preprint)

doi.org

arxiv.org

Generative Adversarial Neural Networks for Realistic Stock Market Simulations

Badre Labiad

Abdelaziz Berrado

Loubna Benabbou

—Stock market simulations are widely used to create synthetic environments for testing trading strategies before deploying them to real-ti… (see more)me markets. However, the weak realism often found in these simulations presents a significant challenge. Improving the quality of stock market simulations could be facilitated by the availability of rich and granular real Limit Order Books (LOB) data. Unfortunately, access to LOB data is typically very limited. To address this issue, a framework based on Generative Adversarial Networks (GAN) is proposed to generate synthetic realistic LOB data. This generated data can then be utilized for simulating downstream decision-making tasks, such as testing trading strategies, conducting stress tests, and performing prediction tasks. To effectively tackle challenges related to the temporal and local dependencies inherent in LOB structures and to generate highly realistic data, the framework relies on a specific data representation and preprocessing scheme, transformers, and conditional Wasserstein GAN with gradient penalty. The framework is trained using the FI-2010 benchmark dataset and an ablation study is conducted to demonstrate the importance of each component of the proposed framework. Moreover, qualitative and quantitative metrics are proposed to assess the quality of the generated data. Experimental results indicate that the framework outperforms existing benchmarks in simulating realistic market conditions, thus demonstrating its effectiveness in generating synthetic LOB data for diverse downstream tasks.

2023-12-31

International Journal of Advanced Computer Science and Applications (published)

doi.org

GenRL: Multimodal-foundation world models for generalization in embodied agents

Pietro Mazzaglia

Tim Verbelen

Bart Dhoedt

Aaron Courville

Sai Rajeswar

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learni… (see more)ng (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can specify tasks in a more natural way. Current foundation vision-language models (VLMs) generally require fine-tuning or other adaptations to be adopted in embodied contexts, due to the significant domain gap. However, the lack of multimodal data in such domains represents an obstacle to developing foundation models for embodied applications. In this work, we overcome these problems by presenting multimodal-foundation world models, able to connect and align the representation of foundation VLMs with the latent space of generative world models for RL, without any language annotations. The resulting agent learning framework, GenRL, allows one to specify tasks through vision and/or language prompts, ground them in the embodied domain's dynamics, and learn the corresponding behaviors in imagination. As assessed through large-scale multi-task benchmarking in locomotion and manipulation domains, GenRL enables multi-task generalization from language and visual prompts. Furthermore, by introducing a data-free policy learning strategy, our approach lays the groundwork for foundational policy learning using generative world models. Website, code and data: https://mazpie.github.io/genrl/

2023-12-31

NeurIPS (published)

doi.org

openreview.net

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications