Doina Precup

Samin Yeasar Arnob

PhD - McGill University

Sumana Basu

Collaborating Alumni - McGill University

Co-supervisor :

Adriana Romero Soriano

Collaborating Alumni - McGill University

Raymond Chua

PhD - McGill University

Co-supervisor :

PhD - McGill University

Principal supervisor :

David Meger

Jonathan Colaço Carr

Master's Research - McGill University

Principal supervisor :

Prakash Panangaden

Élodie Coté-Gauthier

Collaborating researcher - McGill University

Franco Del Balso

Collaborating researcher - Université de Montréal

Jesse Farebrother

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

PhD - McGill University

Principal supervisor :

Collaborating researcher - Birla Institute of Technology

Jonathan Hu

Master's Research - McGill University

Howard Huang

PhD - McGill University

Haque Ishfaq

Collaborating Alumni - McGill University

Mohammad Sami Nur Islam Islam

Master's Research - McGill University

Hangzhan Jin

PhD - Polytechnique Montréal

Martin Klissarov

PhD - McGill University

Postdoctorate - McGill University

Jonathan Lebensold

Collaborating Alumni - McGill University

Collaborating Alumni - McGill University

Ray Luo

PhD - McGill University

Principal supervisor :

G McCracken

PhD - McGill University

Nazanin Mohammadi Sepahvand

Collaborating Alumni - McGill University

Shahrad Mohammadzadeh

Master's Research - McGill University

Principal supervisor :

Gabriela Moisescu-Pareja

Collaborating researcher - McGill University

Co-supervisor :

Irina Rish

Padideh Nouri

PhD - Université de Montréal

Co-supervisor :

PhD - McGill University

Co-supervisor :

Research Intern - McGill University

Nate Rahn

PhD - McGill University

Principal supervisor :

Marc Gendron-Bellemare

Manoosh Samiei

PhD - McGill University

Co-supervisor :

PhD - McGill University

Co-supervisor :

PhD - McGill University

Nishanth Anand Vemgal

PhD - McGill University

PhD - McGill University

Co-supervisor :

Samira Ebrahimi Kahou

Research Intern - McGill University

Zihan Wang

PhD - McGill University

Skipper: Combining Spatial and Temporal Abstraction for Better Generalization

Steve Wen

Master's Research - McGill University

Co-supervisor :

Gregory Dudek

Zijing Wu

PhD - McGill University

Principal supervisor :

PhD - McGill University

Harry Zhao

Collaborating Alumni - McGill University

Co-supervisor :

Blog Posts

Generic thumbnail for Mila Blog articles.

February 22, 2024

Mingde Harry Zhao

Safa Alver

Harm van Seijen

Romain Laroche

Doina Precup

Yoshua Bengio

Read the article

Publications

1. Searching for Big-Oh in the Data: Inferring Asymptotic Complexity from Experiments

Catherine McGeoch

Peter Sanders 0001

Rudolf Fleischer

Paul R. Cohen

2018-12-31

(published)

Avoidance Learning Using Observational Reinforcement Learning

David Venuto

Léonard Boussioux

Junhao Wang

Rola Dali

Jhelum Chakravorty

Yoshua Bengio

Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a … (see more)perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of \textit{learning to avoid}, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the \emph{distance} of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.

2018-12-31

arXiv (preprint)

Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks

Sitao Luan

Mingde Zhao

Xiao-Wen Chang

Recently, neural network based approaches have achieved significant improvement for solving large, complex, graph-structured problems. Howev… (see more)er, their bottlenecks still need to be addressed, and the advantages of multi-scale information and deep architectures have not been sufficiently exploited. In this paper, we theoretically analyze how existing Graph Convolutional Networks (GCNs) have limited expressive power due to the constraint of the activation functions and their architectures. We generalize spectral graph convolution and deep GCN in block Krylov subspace forms and devise two architectures, both with the potential to be scaled deeper but each making use of the multi-scale information in different ways. We further show that the equivalence of these two architectures can be established under certain conditions. On several node classification tasks, with or without the help of validation, the two new architectures achieve better performance compared to many state-of-the-art methods.

2018-12-31

NeurIPS (published)

Community size effect in artificial learning systems

Olivier Tieleman

Angeliki Lazaridou

Shibl Mourad

Charles Blundell

Motivated by theories of language and communication that explain why communities with large numbers of speakers have, on average, simpler la… (see more)nguages with more regularity, we cast the representation learning problem in terms of learning to communicate . Our starting point sees the traditional autoencoder setup as a single encoder with a ﬁxed decoder partner that must learn to communicate. Generalizing from there, we introduce community -based autoencoders in which multiple encoders and decoders collectively learn representations by being randomly paired up on successive training iterations. We ﬁnd that increasing community sizes reduce idiosyncrasies in the learned codes, resulting in representations that better encode concept categories and correlate with human feature norms.

2018-12-31

ViGIL@NeurIPS (published)

dblp.uni-trier.de

Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning ( Supplementary Material ) A Proofs

More precisely, the WFA A = (α, {A}σ∈Σ,Ω) with n states and the linear 2-RNN M = (α,A,Ω) with n hidden units, where A ∈ Rn×Σ×n … (see more)is defined by A:,σ,: = A for all σ ∈ Σ, are such that fA(σ1σ2 · · ·σk) = fM (x1,x2, · · · ,xk) for all sequences of input symbols σ1, · · · , σk ∈ Σ, where for each i ∈ [k] the input vector xi ∈ RΣ is the one-hot encoding of the symbol σi. Proof. We first show by induction on k that, for any sequence σ1 · · ·σk ∈ Σ∗, the hidden state hk computed by M (see Eq. (1)) on the corresponding one-hot encoded sequence x1, · · · ,xk ∈ R satisfies hk = (A1 · · ·Ak )>α. The case k = 0 is immediate. Suppose the result true for sequences of length up to k. One can check easily check that A •2 xi = Ai for any index i. Using the induction hypothesis it then follows that hk+1 = A •1 hk •2 xk+1 = Ak+1 •1 hk = (Ak+1)hk = (Aσk+1)>(Aσ1 · · ·Ak )>α = (A1 · · ·Aσk+1)>α.

2018-12-31

(published)

Data-driven Chance Constrained Programming based Electric Vehicle Penetration Analysis

Di Wu

Tracy Can Cui

Benoit Boulet

Transportation electrification has been growing rapidly in recent years. The adoption of electric vehicles (EVs) could help to release the d… (see more)ependency on oil and reduce greenhouse gas emission. However, the increasing EV adoption will also impose a high demand on the power grid and may jeopardize the grid network infrastructures. For certain high EV penetration areas, the EV charging demand may lead to transformer overloading at peak hours which makes the maximal EV penetration analysis an urgent problem to solve. This paper proposes a data-driven chance constrained programming based framework for maximal EV penetration analysis. Simulation results are presented for a real-world neighborhood level network. The proposed framework could serve as a guidance for utility companies to schedule infrastructure upgrades.

2018-12-31

(published)

An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation

Vincent Michalski

Vikram Voleti

Samira Ebrahimi Kahou

Anthony Ortiz

Pascal Vincent

Chris Pal

Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act … (see more)as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.

2018-12-31

arXiv (preprint)

Hindsight Credit Assignment

Anna Harutyunyan

Will Dabney

Thomas Mesnard

Mohammad Gheshlaghi Azar

Bilal Piot

Nicolas Heess

Hado van Hasselt

Greg Wayne

Satinder Singh

Remi Munos

2018-12-31

Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (published)

Learning Reliable Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Hossein Aboutalebi

Tibor Schuster

The stochastic multi-armed bandit problem is a well-known model for studying the explorationexploitation trade-off. It has significant possi… (see more)ble applications in adaptive clinical trials, which allow for a dynamic change of patient allocation ratios. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data especially when the sample size is small. In this article, we propose a modification of the BESA algorithm [Baransi, Maillard, and Mannor, 2014] which takes into account the variance in the action outcomes in addition to the mean. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset form the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

2018-12-31

KHD@IJCAI (published)

dblp.uni-trier.de

Learning representations of Logical Formulae using Graph Neural Networks

Xavier Glorot

Ankit Anand

Eser Aygün

Shibl Mourad

Pushmeet Kohli

We explore the use of Graph Neural Networks(GNNs) for learning representations of propositional and ﬁrst-order logical formulae. Tradition… (see more)al non-graphical based approaches like CNNs and LSTMs do not exploit invariant properties like variable renaming and order invariance predominantly present in logical formulae. In this work, we explicitly try to encode these logical invariances using GNNs. We use the task of entailment proposed in Evans et al. [2018] for propositional logic. We also explore our approach for the task of proof length prediction in ﬁrst-order logic. We use the Mizar-40 dataset to evaluate several representation learning approaches for proof length prediction task. We observe that GNNs signiﬁcantly outperform the other traditional approaches on both these tasks.

2018-12-31

(published)

Meta-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

Mingde Zhao

Ian Porada

Sitao Luan

Xiao-Wen Chang

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that lea… (see more)rn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter

2018-12-31

arXiv (preprint)

Prediction of Disease Progression in Multiple Sclerosis Patients using Deep Learning Analysis of MRI Data

Adrian Tousignant

Paul Lemaitre