Joelle Pineau

2024-03-12

Proceedings of the Symposium on Computer Science and Law (published)

On the Societal Impact of Open Foundation Models

Sayash Kapoor

Rishi Bommasani

Kevin Klyman

Shayne Longpre

Ashwin Ramaswami

Peter Cihon

Aspen Hopkins

Kevin Bankston

Stella Biderman

Miranda Bogen

Rumman Chowdhury

Alex Engler

Peter Henderson

Yacine Jernite

Seth Lazar

Stefano Maffulli

Alondra Nelson

Aviya Skowron

Dawn Song … (see 5 more)

Victor Storchan

Daniel Zhang

Daniel E. Ho

Percy Liang

Arvind Narayanan

2024-02-27

ArXiv (preprint)

A novel and efficient machine learning Mendelian randomization estimator applied to predict the safety and efficacy of sclerostin inhibition

Marc-andr'e Legault

Jason Hartford

Benoît J. Arsenault

Y. Archer

Yang

Mendelian Randomization (MR) enables estimation of causal effects while controlling for unmeasured confounding factors. However, traditional… (see more) MR's reliance on strong parametric assumptions can introduce bias if these are violated. We introduce a new machine learning MR estimator named Quantile Instrumental Variable (IV) that achieves low estimation error in a wide range of plausible MR scenarios. Quantile IV is distinctive in its ability to estimate nonlinear and heterogeneous causal effects and offers a flexible approach for subgroup analysis. Applying Quantile IV, we investigate the impact of circulating sclerostin levels on heel bone mineral density, osteoporosis, and cardiovascular outcomes in the UK Biobank. Employing various MR estimators and colocalization techniques that allow multiple causal variants, our analysis reveals that a genetically predicted reduction in sclerostin levels significantly increases heel bone mineral density and reduces the risk of osteoporosis, while showing no discernible effect on ischemic cardiovascular diseases. Quantile IV contributes to the advancement of MR methodology, and the case study on the impact of circulating sclerostin modulation contributes to our understanding of the on-target effects of sclerostin inhibition.

2024-01-31

medRxiv (preprint)

Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning

Maxime Wabartha

Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. Linear … (see more)policies can justify their decisions while interacting in a dynamic environment, but their reduced expressivity prevents them from solving hard tasks. Instead, we argue for the use of piecewise-linear policies. We carefully study to what extent they can retain the interpretable properties of linear policies while reaching competitive performance with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the decision process of the agent without requiring an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. Moreover, we validate that the restricted model class that the HyperCombinator belongs to is compatible with the algorithmic constraints of various reinforcement learning algorithms.

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning

Maxime Wabartha

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Questions Are All You Need to Train a Dense Passage Retriever

Devendra Singh Sachan

Mike Lewis

Dani Yogatama

Luke Zettlemoyer

Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (see more)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.

2023-06-20

Transactions of the Association for Computational Linguistics (published)

Group Fairness in Reinforcement Learning

Harsh Satija

Alessandro Lazaric

Matteo Pirotta

We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (see more)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.

2023-04-28

TMLR (accepted)

openreview.net

Estimating causal effects with optimization-based methods: A review and empirical comparison

Martin Cousineau

Vedat Verter

Susan A. Murphy

2023-01-01

European Journal of Operational Research (published)

Publisher Correction: Advancing ethics review practices in AI research

Madhulika Srikumar

Rebecca Finlay

Grace M. Abuhamad

Carolyn Ashurst

Rosie Campbell

Emily Campbell-Ratcliffe

Hudson Hongo

Sara Rene Jordan

Joseph Lindley

Aviv Ovadya

2023-01-01

Nature Machine Intelligence (published)

Improving Passage Retrieval with Zero-Shot Question Generation

Devendra Singh Sachan

Mike Lewis

Mandar Joshi

Armen Aghajanyan

Wen-tau Yih

Luke Zettlemoyer

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (published)

Low-Rank Representation of Reinforcement Learning Policies

Bogdan Mazoure

Thang Doan

Tianyu Li

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (see more) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.

2022-10-27

Journal of Artificial Intelligence Research (published)

SPeCiaL: Self-Supervised Pretraining for Continual Learning

Lucas Caccia

2022-09-28

Continual Semi-Supervised Learning (published)