Joelle Pineau

Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. We argu… (voir plus)e for the use of policies that are piecewise-linear. We carefully study to what extent they can retain the interpretable properties of linear policies while performing competitively with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the agent’s decision process without needing an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance.

2024-01-01

International Conference on Learning Representations (publié)

openreview.net

On the Societal Impact of Open Foundation Models

Sayash Kapoor

Rishi Bommasani

Kevin Klyman

Shayne Longpre

Ashwin Ramaswami

Peter Cihon

Aspen Hopkins

Kevin Bankston

Stella Biderman

Miranda Bogen

Rumman Chowdhury

Alex Engler

Peter Henderson

Yacine Jernite

Seth Lazar

Stefano Maffulli

Alondra Nelson

Aviya Skowron

Dawn Song … (voir 5 de plus)

Victor Storchan

Daniel Zhang

Daniel E. Ho

Percy Liang

Arvind Narayanan

2024-01-01

ICML (publié)

Questions Are All You Need to Train a Dense Passage Retriever

Devendra Singh Sachan

Mike Lewis

Dani Yogatama

Luke Zettlemoyer

Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training da… (voir plus)ta. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g., questions and potential answer passages). It uses a new passage-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence passages, and (2) the passages are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both passage and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtains state-of-the-art results on multiple QA retrieval benchmarks with only generic initialization from a pre-trained language model, removing the need for labeled data and task-specific losses.1 Our code and model checkpoints are available at: https://github.com/DevSinghSachan/art.

2023-06-20

Transactions of the Association for Computational Linguistics (publié)

Group Fairness in Reinforcement Learning

Harsh Satija

Alessandro Lazaric

Matteo Pirotta

We pose and study the problem of satisfying fairness in the online Reinforcement Learning (RL) setting. We focus on the group notions of fai… (voir plus)rness, according to which agents belonging to different groups should have similar performance based on some given measure. We consider the setting of maximizing return in an unknown environment (unknown transition and reward function) and show that it is possible to have RL algorithms that learn the best fair policies without violating the fairness requirements at any point in time during the learning process. In the tabular finite-horizon episodic setting, we provide an algorithm that combines the principle of optimism and pessimism under uncertainty to achieve zero fairness violation with arbitrarily high probability while also maintaining sub-linear regret guarantees. For the high-dimensional Deep-RL setting, we present algorithms based on the performance-difference style approximate policy improvement update step and we report encouraging empirical results on various traditional RL-inspired benchmarks showing that our algorithms display the desired behavior of learning the optimal policy while performing a fair learning process.

2023-04-28

TMLR (accepté)

openreview.net

Estimating causal effects with optimization-based methods: A review and empirical comparison

Martin Cousineau

Vedat Verter

Susan A. Murphy

2023-01-01

European Journal of Operational Research (publié)

Publisher Correction: Advancing ethics review practices in AI research

Madhulika Srikumar

Rebecca Finlay

Grace M. Abuhamad

Carolyn Ashurst

Rosie Campbell

Emily Campbell-Ratcliffe

Hudson Hongo

Sara Rene Jordan

Joseph Lindley

Aviv Ovadya

2023-01-01

Nature Machine Intelligence (publié)

Improving Passage Retrieval with Zero-Shot Question Generation

Devendra Singh Sachan

Mike Lewis

Mandar Joshi

Armen Aghajanyan

Wen-tau Yih

Luke Zettlemoyer

2022-12-01

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (publié)

Low-Rank Representation of Reinforcement Learning Policies

Thang Doan

We propose a general framework for policy representation for reinforcement learning tasks. This framework involves finding a low-dimensional… (voir plus) embedding of the policy on a reproducing kernel Hilbert space (RKHS). The usage of RKHS based methods allows us to derive strong theoretical guarantees on the expected return of the reconstructed policy. Such guarantees are typically lacking in black-box models, but are very desirable in tasks requiring stability and convergence guarantees. We conduct several experiments on classic RL domains. The results confirm that the policies can be robustly represented in a low-dimensional space while the embedded policy incurs almost no decrease in returns.

2022-10-27

Journal of Artificial Intelligence Research (publié)

SPeCiaL: Self-Supervised Pretraining for Continual Learning

Lucas Caccia

2022-09-28

Continual Semi-Supervised Learning (publié)

A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions

Anthony GX-Chen

Veronica Chelu

Blake Richards

2022-06-28

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Biomedical Research and Informatics Living Laboratory for Innovative Advances of New Technologies in Community Mobility Rehabilitation: Protocol for Evaluation and Rehabilitation of Mobility Across Continuums of Care

Sara Ahmed

Philippe Archambault

Claudine Auger

Audrey Durand

Joyce Fung

Eva Kehayia

Anouk Lamontagne

Annette Majnemer

Sylvie Nadeau

Alain Ptito

Bonnie Swaine

Background Rapid advances in technologies over the past 10 years have enabled large-scale biomedical and psychosocial rehabilitation researc… (voir plus)h to improve the function and social integration of persons with physical impairments across the lifespan. The Biomedical Research and Informatics Living Laboratory for Innovative Advances of New Technologies (BRILLIANT) in community mobility rehabilitation aims to generate evidence-based research to improve rehabilitation for individuals with acquired brain injury (ABI). Objective This study aims to (1) identify the factors limiting or enhancing mobility in real-world community environments (public spaces, including the mall, home, and outdoors) and understand their complex interplay in individuals of all ages with ABI and (2) customize community environment mobility training by identifying, on a continuous basis, the specific rehabilitation strategies and interventions that patient subgroups benefit from most. Here, we present the research and technology plan for the BRILLIANT initiative. Methods A cohort of individuals, adults and children, with ABI (N=1500) will be recruited. Patients will be recruited from the acute care and rehabilitation partner centers within 4 health regions (living labs) and followed throughout the continuum of rehabilitation. Participants will also be recruited from the community. Biomedical, clinician-reported, patient-reported, and brain imaging data will be collected. Theme 1 will implement and evaluate the feasibility of collecting data across BRILLIANT living labs and conduct predictive analyses and artificial intelligence (AI) to identify mobility subgroups. Theme 2 will implement, evaluate, and identify community mobility interventions that optimize outcomes for mobility subgroups of patients with ABI. Results The biomedical infrastructure and equipment have been established across the living labs, and development of the clinician- and patient-reported outcome digital solutions is underway. Recruitment is expected to begin in May 2022. Conclusions The program will develop and deploy a comprehensive clinical and community-based mobility-monitoring system to evaluate the factors that result in poor mobility, and develop personalized mobility interventions that are optimized for specific patient subgroups. Technology solutions will be designed to support clinicians and patients to deliver cost-effective care and the right intervention to the right person at the right time to optimize long-term functional potential and meaningful participation in the community. International Registered Report Identifier (IRRID) PRR1-10.2196/12506

2022-06-01

JMIR Research Protocols (publié)

Block Contextual MDPs for Continual Learning

Shagun Sodhani

Franziska Meier