Publications

Safe option-critic: learning safety in the option-critic architecture

Arushi Jain

Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications b… (see more)ut also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.

2021-04-07

The Knowledge Engineering Review (published)

doi.org

arxiv.org

Comparing Transfer and Meta Learning Approaches on a Unified Few-Shot Classification Benchmark

Vincent Dumoulin

Neil Houlsby

Utku Evci

Xiaohua Zhai

Ross Goroshin

Sylvain Gelly

Hugo Larochelle

Meta and transfer learning are two successful families of approaches to few-shot learning. Despite highly related goals, state-of-the-art ad… (see more)vances in each family are measured largely in isolation of each other. As a result of diverging evaluation norms, a direct or thorough comparison of different approaches is challenging. To bridge this gap, we perform a cross-family study of the best transfer and meta learners on both a large-scale meta-learning benchmark (Meta-Dataset, MD), and a transfer learning benchmark (Visual Task Adaptation Benchmark, VTAB). We find that, on average, large-scale transfer methods (Big Transfer, BiT) outperform competing approaches on MD, even when trained only on ImageNet. In contrast, meta-learning approaches struggle to compete on VTAB when trained and validated on MD. However, BiT is not without limitations, and pushing for scale does not improve performance on highly out-of-distribution MD tasks. In performing this study, we reveal a number of discrepancies in evaluation norms and study some of these in light of the performance gap. We hope that this work facilitates sharing of insights from each community, and accelerates progress on few-shot learning.

2021-04-06

ArXiv (preprint)

arxiv.org

Understanding Continual Learning Settings with Data Distribution Drift Analysis

Timothee LESORT

Massimo Caccia

Irina Rish

Classical machine learning algorithms often assume that the data are drawn i.i.d. from a stationary probability distribution. Recently, cont… (see more)inual learning emerged as a rapidly growing area of machine learning where this assumption is relaxed, i.e. where the data distribution is non-stationary and changes over time. This paper represents the state of data distribution by a context variable

2021-04-04

ArXiv (preprint)

arxiv.org

Discourse-Aware Unsupervised Summarization for Long Scientific Documents

Yue Dong

Andrei Mircea

Jackie Cheung

2021-04-01

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (published)

doi.org

All in This Together? A Preregistered Report on Deservingness of Government Aid During the COVID-19 Pandemic

Aengus Bridgman

Eric Merkley

Peter John Loewen

Taylor Owen

Derek Ruths

2021-03-31

Journal of Experimental Political Science (published)

doi.org

All in This Together? A Preregistered Report on Deservingness of Government Aid During the COVID-19 Pandemic

Aengus Bridgman

Eric Roman Owen Merkley

Peter John Loewen

Taylor Reid Owen

Derek Ruths

Abstract The COVID-19 pandemic has placed unprecedented pressure on governments to engage in widespread cash transfers directly to citizens … (see more)to help mitigate economic losses. Major and near-universal redistribution efforts have been deployed, but there is remarkably little understanding of where the mass public believes financial support is warranted. Using experimental evidence, we evaluate whether considerations related to deservingness, similarity, and prejudicial attitudes structure support for these transfers. A preregistered experiment found broad, generous, and nondiscriminatory support for direct cash transfers related to COVID-19 in Canada. The second study, accepted as a preregistered report, further probes these dynamics by comparing COVID-19-related outlays with nonemergency ones. We find that COVID-19-related spending was more universal as compared to a more generic cash allocation program. Given that the results were driven by the income of hypothetical recipients, we find broad support for disaster relief that is not means-tested or otherwise constrained by pre-disaster income.

2021-03-31

Journal of Experimental Political Science (published)

doi.org

Evaluating the Integration of One Health in Surveillance Systems for Antimicrobial Use and Resistance: A Conceptual Framework

Cécile Aenishaenslin

Barbara Häsler

André Ravel

E. Jane Parmley

Sarah Mediouni

Houda Bennani

Katharina D. C. Stärk

David Buckeridge

It is now widely acknowledged that surveillance of antimicrobial resistance (AMR) must adopt a “One Health” (OH) approach to successfull… (see more)y address the significant threats this global public health issue poses to humans, animals, and the environment. While many protocols exist for the evaluation of surveillance, the specific aspect of the integration of a OH approach into surveillance systems for AMR and antimicrobial Use (AMU), suffers from a lack of common and accepted guidelines and metrics for its monitoring and evaluation functions. This article presents a conceptual framework to evaluate the integration of OH in surveillance systems for AMR and AMU, named the Integrated Surveillance System Evaluation framework (ISSE framework). The ISSE framework aims to assist stakeholders and researchers who design an overall evaluation plan to select the relevant evaluation questions and tools. The framework was developed in partnership with the Canadian Integrated Program for Antimicrobial Resistance Surveillance (CIPARS). It consists of five evaluation components, which consider the capacity of the system to: [1] integrate a OH approach, [2] produce OH information and expertise, [3] generate actionable knowledge, [4] influence decision-making, and [5] positively impact outcomes. For each component, a set of evaluation questions is defined, and links to other available evaluation tools are shown. The ISSE framework helps evaluators to systematically assess the different OH aspects of a surveillance system, to gain comprehensive information on the performance and value of these integrated efforts, and to use the evaluation results to refine and improve the surveillance of AMR and AMU globally.

2021-03-24

Frontiers in Veterinary Science (published)

doi.org

Beyond Correlation versus Causation: Multi-brain Neuroscience Needs Explanation

Quentin Moreau

Guillaume Dumas

2021-03-19

Trends in Cognitive Sciences (published)

doi.org

Intention estimation and controllable behaviour models for traffic merges

Aditya Mahajan

Takayasu Kumano

Yuji Yasui

This work focuses on decision making for automated driving vehicles in interaction rich scenarios like traffic merges in a flexibly assertiv… (see more)e yet safe manner. We propose a Q-learning based approach, that takes in active intention inferences as additional inputs besides the directly observed state inputs. The outputs of Q-function are processed to select a decision by a modulation function, which can control how assertively or defensively the agent behaves.

2021-03-18

SICE Journal of Control Measurement and System Integration (published)

doi.org

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Alex Lamb

Anirudh Goyal

A. Slowik

Michael Curtis Mozer

Philippe Beaudoin

Yoshua Bengio

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (see more)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Quantum Tensor Networks, Stochastic Processes, and Weighted Automata

Sandesh M. Adhikary

Siddarth Srinivasan

Jacob Miller

Guillaume Rabusseau

Byron Boots

Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (see more)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou

Sharan Vaswani

Issam Hadj Laradji

Simon Lacoste-Julien

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (see more) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications