Publications

Integrated inbound train split and load planning in an intermodal railway terminal

Bruno Petrato Bruck

Jean-François Cordeau

Emma Frejinger

2021-02-28

Transportation Research Part B: Methodological (published)

doi.org

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Mingjun Zhao

Shengli Yan

Bang Liu

Xinwang Zhong

Qian Hao

Haolan Chen

Di Niu

Bo Long

Wei-dong Guo

2021-02-28

Computer Speech & Language (published)

doi.org

arxiv.org

Towards robust and replicable sex differences in the intrinsic brain function of autism

Dorothea L. Floris

José O. A. Filho

Meng-Chuan Lai

Steve Giavasis

Marianne Oldehinkel

Maarten Mennes

Tony Charman

Julian Tillmann

Guillaume Dumas

Christine Ecker

Flavio Dell’Acqua

Tobias Banaschewski

Carolin Moessnang

Simon Baron-Cohen

Sarah Durston

Eva Loth

Declan Murphy

Jan K. Buitelaar

Christian Beckmann

Michael P. Milham … (see 1 more)

Adriana Di Martino

2021-02-28

Molecular Autism (published)

doi.org

Transformers with Competitive Ensembles of Independent Mechanisms

Alex Lamb

Di He

Anirudh Goyal

Guolin Ke

Chien-Feng Liao

Mirco Ravanelli

Yoshua Bengio

An important development in deep learning from the earliest MLPs has been a move towards architectures with structural inductive biases whic… (see more)h enable the model to keep distinct sources of information and routes of processing well-separated. This structure is linked to the notion of independent mechanisms from the causality literature, in which a mechanism is able to retain the same processing as irrelevant aspects of the world are changed. For example, convnets enable separation over positions, while attention-based architectures (especially Transformers) learn which combination of positions to process dynamically. In this work we explore a way in which the Transformer architecture is deficient: it represents each position with a large monolithic hidden representation and a single set of parameters which are applied over the entire hidden representation. This potentially throws unrelated sources of information together, and limits the Transformer's ability to capture independent mechanisms. To address this, we propose Transformers with Independent Mechanisms (TIM), a new Transformer layer which divides the hidden representation and parameters into multiple mechanisms, which only exchange information through attention. Additionally, we propose a competition mechanism which encourages these mechanisms to specialize over time steps, and thus be more independent. We study TIM on a large-scale BERT model, on the Image Transformer, and on speech enhancement and find evidence for semantically meaningful specialization as well as improved performance.

2021-02-26

ArXiv (preprint)

doi.org

openreview.net

From Generative Models to Generative Passages: A Computational Approach to (Neuro) Phenomenology

Maxwell J. D. Ramstead

Anil K. Seth

Casper Hesp

Lars Sandved-Smith

Jonas Mago

Michael Lifshitz

Giuseppe Pagnoni

Ryan Smith

Guillaume Dumas

Antoine Lutz

Karl Friston

Axel Constant

This paper presents a version of neurophenomenology based on generative modelling techniques developed in computational neuroscience and bio… (see more)logy. Our approach can be described as computational phenomenology because it applies methods originally developed in computational modelling to provide a formal model of the descriptions of lived experience in the phenomenological tradition of philosophy (e.g., the work of Edmund Husserl, Maurice Merleau-Ponty, etc.). The first section presents a brief review of the overall project to naturalize phenomenology. The second section presents and evaluates philosophical objections to that project and situates our version of computational phenomenology with respect to these projects. The third section reviews the generative modelling framework. The final section presents our approach in detail. We conclude by discussing how our approach differs from previous attempts to use generative modelling to help understand consciousness. In summary, we describe a version of computational phenomenology which uses generative modelling to construct a computational model of the inferential or interpretive processes that best explain this or that kind of lived experience.

2021-02-22

Review of Philosophy and Psychology (published)

doi.org

Towards Causal Representation Learning

Bernhard Schölkopf

Francesco Locatello

Stefan Bauer

Nan Rosemary Ke

Nal Kalchbrenner

Anirudh Goyal

Yoshua Bengio

The two fields of machine learning and graphical causality arose and developed separately. However, there is now cross-pollination and incre… (see more)asing interest in both fields to benefit from the advances of the other. In the present paper, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

2021-02-21

ArXiv (preprint)

arxiv.org

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Manan Tomar

Amy Zhang

Roberto Calandra

Matthew E. Taylor

Joelle Pineau

Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of task… (see more)s increases, so does the sample inefficiency of learning accurate dynamics models. However, many complex tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for compositional generalization to unseen states, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction and show improved generalization in a simple toy domain. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains and validate our approach by showing improved modelling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Finally, within the MBRL setting we show strong performance gains with respect to sample efficiency across a host of other continuous control tasks.

2021-02-18

ArXiv (preprint)

openreview.net

Concurrent prescriptions for opioids and benzodiazepines and risk of opioid overdose: protocol for a retrospective cohort study using linked administrative data

Erin Y Liu

Robyn Tamblyn

Kristian B Filion

David L Buckeridge

2021-02-17

BMJ Open (published)

doi.org

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing Its Gradient Estimator Bias

Axel Laborieux

Maxence Ernoult

Benjamin Scellier

Yoshua Bengio

Julie Grollier

Damien Querlioz

2021-02-17

Frontiers in Neuroscience (published)

doi.org

arxiv.org

Smart Futures Based Resource Trading and Coalition Formation for Real-Time Mobile Data Processing

Ruitao Chen

Xianbin Wang

Xue Liu

Collaboration among mobile devices (MDs) is becoming more important, as it could augment computing capacity at the network edge through peer… (see more)-to-peer service provisioning, and directly enhance real-time computational performance in smart Internet-of-Things applications. As an important aspect of collaboration mechanism, conventional resource trading (RT) among MDs relies on an onsite interaction process, i.e., price negotiation between service providers and requesters, which, however, inevitably incurs excessive latency and degrades RT efficiency. To overcome this challenge, this article adopts the concept of futures contract (FC) used in financial market, and proposes a smart futures for low latency RT. This new technique enables MDs to form trading coalitions and negotiate multilateral forward contracts applied to a collaboration term in the future. To maximize the benefits of self-interested MDs, the negotiation process of FC is modelled as a coalition formation game comprised of three components executed in an iterative manner, i.e., futures resource allocation, revenue sharing and payment allocation, and distributed decision-making of individual MD. Additionally, a FC enforcement scheme is implemented to efficiently manage the onsite resource sharing via recording resource balances of different task-types and MDs. Simulation results prove the superiority of smart futures in RT latency reduction and trading fairness provisioning.

2021-02-17

IEEE Transactions on Services Computing (published)

doi.org

Bridging the Gap Between Adversarial Robustness and Optimization Bias

Fartash Faghri

Cristina Vasconcelos

David J Fleet

Fabian Pedregosa

Nicolas Roux

2021-02-16

ArXiv (preprint)

arxiv.org

Structured Sparsity Inducing Adaptive Optimizers for Deep Learning

Tristan Deleu

Yoshua Bengio

The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance. To prun… (see more)e out unimportant groups of parameters, we can include some non-differentiable penalty to the objective function, and minimize it using proximal gradient methods. In this paper, we derive the weighted proximal operator, which is a necessary component of these proximal methods, of two structured sparsity inducing penalties. Moreover, they can be approximated efficiently with a numerical solver, and despite this approximation, we prove that existing convergence guarantees are preserved when these operators are integrated as part of a generic adaptive proximal method. Finally, we show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns, on representative examples from computer vision and natural language processing.

2021-02-06

ArXiv (preprint)

arxiv.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications