Joelle Pineau

Dieuwke Hupkes

Adina Williams

2022-01-01

EMNLP (Findings) (published)

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

Varun Gupta

Iulian V. Serban

2021-07-27

International Journal of Artificial Intelligence in Education (published)

Automated Data-Driven Generation of Personalized Pedagogical Interventions in Intelligent Tutoring Systems

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

Varun Gupta

Iulian V. Serban

2021-07-27

International Journal of Artificial Intelligence in Education (published)

SPeCiaL: Self-Supervised Pretraining for Continual Learning

Lucas Caccia

2021-06-16

ArXiv (preprint)

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Manan Tomar

Amy Zhang

Roberto Calandra

Matthew E. Taylor

Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of task… (see more)s increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for compositional generalization to unseen states, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction and show improved generalization in a simple toy domain. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains and validate our approach by showing improved modelling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Finally, within the MBRL setting we show strong performance gains with respect to sample efficiency across a host of other continuous control tasks.

2021-02-19

ArXiv (preprint)

openreview.net

Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)

Philippe Vincent‐lamarre

Koustuv Sinha

Vincent Larivière

Alina Beygelzimer

Florence D'alche-buc

E. Fox

Hugo Larochelle

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Amy Zhang

Shagun Sodhani

Khimya Khetarpal

2021-01-01

ICLR (published)

openreview.net

Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP

Amy Zhang

Shagun Sodhani

Khimya Khetarpal

Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.

2020-07-14

ArXiv (preprint)

Deep interpretability for GWAS

Deepak Sharma

Audrey Durand

Marc-andr'e Legault

Louis-philippe Lemieux Perreault

Audrey Lemaccon

Marie-Pierre Dub'e

Genome-Wide Association Studies are typically conducted using linear models to find genetic variants associated with common diseases. In the… (see more)se studies, association testing is done on a variant-by-variant basis, possibly missing out on non-linear interaction effects between variants. Deep networks can be used to model these interactions, but they are difficult to train and interpret on large genetic datasets. We propose a method that uses the gradient based deep interpretability technique named DeepLIFT to show that known diabetes genetic risk factors can be identified using deep models along with possibly novel associations.

2020-07-03

ArXiv (preprint)

Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks

Maxime Wabartha

Audrey Durand

Vincent Francois-Lavet

By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (see more)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.

2020-07-01

International Joint Conference on Artificial Intelligence (published)

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

Vincent Francois-Lavet

Guillaume Rabusseau

Damien Ernst

Raphael Fonteneau

When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: … (see more)a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.

2020-07-01

International Joint Conference on Artificial Intelligence (published)

A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

Iulian V. Serban

Varun Gupta

Ekaterina Kochmar

Dung D. Vu

Robert Belfer