Publications

BabyAI 1.1

David Y. T. Hui

Maxime Chevalier-Boisvert

Dzmitry Bahdanau

Yoshua Bengio

CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation

Tariq Daouda

Maude Dumont-Lagacé

Albert Feghaly

Yahya Benslimane

6. Rébecca

Panes

Mathieu Courcelles

Mohamed Benhammadi

Lea Harrington

Pierre Thibault

François Major

Yoshua Bengio

Étienne Gagnon

Sébastien Lemieux

Claude Perreault

30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (see more), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino

CAMAP: Artificial neural networks unveil the role of 1 codon arrangement in modulating MHC-I peptides 2 presentation discovery of minor histocompatibility with

Tariq Daouda

Maude Dumont-Lagacé

Albert Feghaly

Yahya Benslimane

6. Rébecca

Panes

Mathieu Courcelles

Mohamed Benhammadi

Lea Harrington

Pierre Thibault

François Major

Yoshua Bengio

Étienne Gagnon

Sébastien Lemieux

Claude Perreault

30 MHC-I associated peptides (MAPs) play a central role in the elimination of virus-infected and 31 neoplastic cells by CD8 T cells. However… (see more), accurately predicting the MAP repertoire remains 32 difficult, because only a fraction of the transcriptome generates MAPs. In this study, we 33 investigated whether codon arrangement (usage and placement) regulates MAP biogenesis. We 34 developed an artificial neural network called Codon Arrangement MAP Predictor (CAMAP), 35 predicting MAP presentation solely from mRNA sequences flanking the MAP-coding codons 36 (MCCs), while excluding the MCC per se . CAMAP predictions were significantly more accurate 37 when using original codon sequences than shuffled codon sequences which reflect amino acid 38 usage. Furthermore, predictions were independent of mRNA expression and MAP binding affinity 39 to MHC-I molecules and applied to several cell types and species. Combining MAP ligand scores, 40 transcript expression level and CAMAP scores was particularly useful to increaser MAP prediction 41 accuracy. Using an in vitro assay, we showed that varying the synonymous codons in the regions 42 flanking the MCCs (without changing the amino acid sequence) resulted in significant modulation 43 of MAP presentation at the cell surface. Taken together, our results demonstrate the role of codon 44 arrangement in the regulation of MAP presentation and support integration of both translational 45 and post-translational events in predictive algorithms to ameliorate modeling of the 46 immunopeptidome. 47 48 49 they modulated the levels of SIINFEKL presentation in both constructs, but enhanced translation efficiency could only be detected for OVA-RP. These data show that codon arrangement can modulate MAP presentation strength without any changes in the amino

A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

Harry Zhao

Mingde Zhao

Zhen Liu

Sitao Luan

Shuyuan Zhang

Doina Precup

Yoshua Bengio

We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state during plan… (see more)ning. The agent uses a bottleneck mechanism over a set-based representation to force the number of entities to which the agent attends at each planning step to be small. In experiments, we investigate the bottleneck mechanism with several sets of customized environments featuring different challenges. We consistently observe that the design allows the planning agents to generalize their learned task-solving abilities in compatible unseen environments by attending to the relevant objects, leading to better out-of-distribution generalization performance.

openreview.net

Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension

Oliver Bender

Franz Josef Och

Yoshua Bengio

R´ejean Ducharme

Pascal Vincent

Kevin Clark

Quoc Minh-Thang Luong

V. Le

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Adam Fisch

Alon Talmor

Robin Jia

Minjoon Seo

Michael R. Glass

A. Gliozzo

Rishav Chakravarti

Ian J Goodfellow

Jean Pouget-Abadie … (see 39 more)

Mehdi Mirza

Serhii Havrylov

Ivan Titov. 2017

Emergence

Jun-Tao He

Jiatao Gu

Jiajun Shen

Marc’Aurelio

Matthew Henderson

I. Casanueva

Nikola Mrkˇsi´c

Pei-hao Su

Tsung-Hsien Wen

Ivan Vuli´c

Yikang Shen

Yi Tay

Che Zheng

Dara Bahri

Donald

Metzler Aaron

Courville

Structformer

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

Thomas Wolf

Lysandre Debut

Julien Victor Sanh

Clement Chaumond

Anthony Delangue

Pier-339 Moi

Tim ric Cistac

R´emi Rault

Morgan Louf

Qizhe Xie

Eduard H. Hovy

Silei Xu

Sina Jandaghi Semnani

Giovanni Campagna

Pretrained language models have signiﬁcantly 001 improved the performance of down-stream 002 language understanding tasks, including ex-00… (see more)3 tractive question answering, by providing 004 high-quality contextualized word embeddings. 005 However, training question answering models 006 still requires large amounts of annotated data 007 for speciﬁc domains. In this work, we pro-008 pose a cooperative, self-play learning frame-009 work, REGEX, for automatically generating 010 more non-trivial question-answer pairs to im-011 prove model performance. REGEX is built 012 upon a masked answer extraction task with an 013 interactive learning environment containing an 014 answer entity REcognizer, a question Gener-015 ator, and an answer EXtractor. Given a pas-016 sage with a masked entity, the generator gen-017 erates a question around the entity, and the 018 extractor is trained to extract the masked en-019 tity with the generated question and raw texts. 020 The framework allows the training of question 021 generation and answering models on any text 022 corpora without annotation. We further lever-023 age a reinforcement learning technique to re-024 ward generating high-quality questions and to 025 improve the answer extraction model’s perfor-026 mance. Experiment results show that REGEX 027 outperforms the state-of-the-art (SOTA) pre-028 trained language models and transfer learning 029 approaches on standard question-answering 030 benchmarks, and yields the new SOTA per-031 formance under given model size and transfer 032 learning settings. 033

Dynamic Inference with Neural Interpreters

Nasim Rahaman

Muhammad Waleed Gondal

Shruti Joshi

Peter Vincent Gehler

Yoshua Bengio

Francesco Locatello

Bernhard Schölkopf

Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they ar… (see more)e less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call _functions_. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization.

openreview.net

An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming

Minkai Xu

Wujie Wang

Shitong Luo

Chence Shi

Yoshua Bengio

Rafael G'omez-bombarelli

Jian Tang

Predicting molecular conformations (or 3D structures) from molecular graphs is a fundamental problem in many applications. Most existing app… (see more)roaches are usually divided into two steps by first predicting the distances between atoms and then generating a 3D structure through optimizing a distance geometry problem. However, the distances predicted with such two-stage approaches may not be able to consistently preserve the geometry of local atomic neighborhoods, making the generated structures unsatisfying. In this paper, we propose an end-to-end solution for molecular conformation prediction called ConfVAE based on the conditional variational autoencoder framework. Specifically, the molecular graph is first encoded in a latent space, and then the 3D structures are generated by solving a principled bilevel optimization program. Extensive experiments on several benchmark data sets prove the effectiveness of our proposed approach over existing state-of-the-art approaches. Code is available at https://github.com/MinkaiXu/ConfVAE-ICML21.

2021-01-01

ICML (published)

proceedings.mlr.press

arxiv.org

Episodes Meta Sequence S 2 Fast Update Slow Update Fast Update Slow Update

Kanika Madan

Nan Rosemary Ke

Anirudh Goyal

Bernhard Schölkopf

Yoshua Bengio

Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning age… (see more)nt interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, metaparameters.We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.

Exploring the Wasserstein metric for survival analysis

Tristan Sylvain

Margaux Luck

Joseph Paul Cohen

Andrea Lodi

Yoshua Bengio

Survival analysis is a type of semi-supervised task where the target output (the survival time) is often right-censored. Utilizing this info… (see more)rmation is a challenge because it is not obvious how to correctly incorporate these censored examples into a model. We study how three categories of loss functions can take advantage of this information: partial likelihood methods, rank methods, and our own classiﬁcation method based on a Wasserstein metric (WM) and the non-parametric Kaplan Meier (KM) estimate of the probability density to impute the labels of censored examples. The proposed method predicts the probability distribution of an event, letting us compute survival curves and expected times of survival that are easier to interpret than the rank. We also demonstrate that this approach directly optimizes the expected C-index which is the most common evaluation metric for survival models.

Exploring the Wasserstein metric for time-to-event analysis.

Tristan Sylvain

Margaux Luck

Joseph Paul Cohen

Heloise Cardinal

Andrea Lodi

Yoshua Bengio

2021-01-01

SPACA (published)

proceedings.mlr.press

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments

Anirudh Goyal

Alex Lamb

Phanideep Gampa

Philippe Beaudoin

Charles Blundell

Sergey Levine

Yoshua Bengio

Michael Curtis Mozer

2021-01-01

ICLR (published)

openreview.net

Fast and Slow Learning of Recurrent Independent Mechanisms

Kanika Madan

Nan Rosemary Ke

Anirudh Goyal

Bernhard Schölkopf

Yoshua Bengio

Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning age… (see more)nt interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.

2021-01-01

ICLR (published)

openreview.net

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications