Publications

On the Effectiveness of Two-Step Learning for Latent-Variable Models

Maxime Gasse

Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.

2019-12-31

2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (published)

doi.org

On the interplay between noise and curvature and its effect on optimization and generalization

Valentin Thomas

Fabian Pedregosa

Bart Van Merriënboer

Pierre-Antoine Mangazol

Yoshua Bengio

Nicolas Le Roux

The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (see more)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.

2019-12-31

AISTATS (published)

doi.org

proceedings.mlr.press

On the Morality of Artificial Intelligence

Alexandra Luccioni

Yoshua Bengio

Much of the existing research on the social and ethical impact of Artificial Intelligence has been focused on defining ethical principles an… (see more)d guidelines surrounding Machine Learning (ML) and other Artificial Intelligence (AI) algorithms [IEEE, 2017, Jobin et al., 2019]. While this is extremely useful for helping define the appropriate social norms of AI, we believe that it is equally important to discuss both the potential and risks of ML and to inspire the community to use ML for beneficial objectives. In the present article, which is specifically aimed at ML practitioners, we thus focus more on the latter, carrying out an overview of existing high-level ethical frameworks and guidelines, but above all proposing both conceptual and practical principles and guidelines for ML research and deployment, insisting on concrete actions that can be taken by practitioners to pursue a more ethical and moral practice of ML aimed at using AI for social good.

2019-12-31

IEEE Technology and Society Magazine (published)

doi.org

arxiv.org

On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT.

Abhilasha Ravichander

Eduard Hovy

Kaheer Suleman

Adam Trischler

Jackie CK Cheung

2019-12-31

*SEM@COLING (published)

dblp.uni-trier.de

The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget

Anirudh Goyal

Yoshua Bengio

Matthew Botvinick

Sergey Levine

In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision abo… (see more)ut which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a "privileged" input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.

2019-12-31

ICLR (published)

doi.org

openreview.net

Title : Differential functional neural circuitry behind autism subtypes with marked imbalance between social-communicative and restricted repetitive behavior symptom domains

Natasha Bertelsen

Isotta Landi

Richard A.I. Bethlehem

Jakob Seidlitz

Elena

Maria Busuoli

Veronica Mandelli

Eleonora Satta

Stavros Trakoshis

Bonnie Auyeung

Prantik Kundu

Eva Loth

Guillaume Dumas

Sarah Baumeister

Christian Beckmann

Sven Bölte

Thomas Bourgeron

Tony Charman

Sarah Durston

Christine Ecker … (see 22 more)

Rosemary Holt

Mark Johnson

Emily J. H. Jones

Luke Mason

-. AndreasMeyer

Lindenberg

Carolin

Moessnang

Marianne

Oldehinkel

Antonio

Persico

Julián

Tillmann

Steven C. R. Williams

Will Spooren

Declan Murphy

Katherine Jan

Buitelaar

Simon Baron-Cohen

Meng-Chuan Lai

Michael V. Lombardo

Social-communication (SC) and restricted repetitive behaviors (RRB) are autism diagnostic symptom domains. SC and RRB severity can markedly … (see more)differ within and between individuals and is underpinned by different neural circuitry and genetic mechanisms. Modeling SC-RRB balance could help identify how neural circuitry and genetic mechanisms map onto such phenotypic heterogeneity. Here we developed a phenotypic stratification model that makes highly accurate (96-98%) out-of-sample SC=RRB, SC>RRB, and RRB>SC subtype predictions. Applying this model to resting state fMRI data from the EU-AIMS LEAP dataset (n=509), we find replicable somatomotor-perisylvian hypoconnectivity in the SC>RRB subtype versus a typically-developing (TD) comparison group. In contrast, replicable motor-anterior salience hyperconnectivity is apparent in the SC=RRB subtype versus TD. Autism-associated genes affecting astrocytes, excitatory, and inhibitory neurons are highly expressed specifically within SC>RRB hypoconnected networks, but not SC=RRB hyperconnected networks. SC-RRB balance subtypes may indicate different paths individuals take from genome, neural circuitry, to the clinical phenotype. (CIMH). Procedures were undertaken to optimize the MRI sequences for the best scanner-specific options, and phantoms and travelling heads were employed to assure standardization and quality assurance of the multisite image-acquisition 20 . Structural images were obtained using a 5.5 minute MPRAGE sequence (TR=2300ms, TE=2.93ms, T1=900ms, voxels size=1.1x1.1x1.2mm, flip angle=9°, matrix size=256x256, FOV=270mm, 176 slices). An eight-to-ten minute resting-state fMRI (rsfMRI) scan was acquired using a multi-echo planar imaging (ME-EPI) sequence 65,66 ; TR=2300ms, TE~12ms, 31ms, and 48ms (slight variations are present across centers), flip angle=80°, matrix size=64x64, (UMCU), 215 (KCL, CIMH), 265 (RUMC, UCAM). were to relax, with eyes open and fixate on a cross presented on the screen for the duration of the rsfMRI scan.

2019-12-31

(published)

www.semanticscholar.org

Toward Training Recurrent Neural Networks for Lifelong Learning.

Shagun Sodhani

Sarath Chandar

Yoshua Bengio

Catastrophic forgetting and capacity saturation are the central challenges of any parametric lifelong learning system. In this work, we stud… (see more)y these challenges in the context of sequential supervised learning with an emphasis on recurrent neural networks. To evaluate the models in the lifelong learning setting, we propose a curriculum-based, simple, and intuitive benchmark where the models are trained on tasks with increasing levels of difficulty. To measure the impact of catastrophic forgetting, the model is tested on all the previous tasks as it completes any task. As a step toward developing true lifelong learning systems, we unify gradient episodic memory (a catastrophic forgetting alleviation approach) and Net2Net (a capacity expansion approach). Both models are proposed in the context of feedforward networks, and we evaluate the feasibility of using them for recurrent networks. Evaluation on the proposed benchmark shows that the unified model is more suitable than the constituent models for lifelong learning setting.

2019-12-31

Neural Computation (published)

doi.org

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

Miles Brundage

Shahar Avin

Jasmine Wang

Haydn Belfield

Gretchen Krueger

Gillian Hadfield

Heidy Khlaaf

Jingying Yang

Helen Toner

Ruth Fong

Tegan Maharaj

Pang Wei Koh

Sara Hooker

Jade Leung

Andrew Trask

Emma Bluemke

Jonathan Lebensold

Cullen O'Keefe

Mark Koren

Théo Ryffel … (see 39 more)

JB Rubinovitz

Tamay Besiroglu

Federica Carugati

Jack Clark

Peter Eckersley

Sarah de Haas

Maritza Johnson

Ben Laurie

Alex Ingerman

Igor Krawczuk

Amanda Askell

Rosario Cammarota

Andrew Lohn

David Krueger

Charlotte Stix

Peter Henderson

Logan Graham

Carina Prunkl

Bianca Martin

Elizabeth Seger

Noa Zilberman

Seán Ó hÉigeartaigh

Frens Kroeger

Girish Sastry

Rebecca Kagan

Adrian Weller

Brian Tse

Elizabeth Barnes

Allan Dafoe

Paul Scharre

Ariel Herbert-Voss

Martijn Rasser

Shagun Sodhani

Carrick Flynn

Thomas Krendl Gilbert

Lisa Dyer

Saif Khan

Yoshua Bengio

Markus Anderljung

With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and … (see more)recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

2019-12-31

arXiv (preprint)

doi.org

arxiv.org

Towards Queryable and Traceable Domain Models

Rijul Saini

Gunter Mussbacher

Jin L.C. Guo

Jörg Kienzle

Model-Driven Software Engineering encompasses various modelling formalisms for supporting software development. One such formalism is domain… (see more) modelling which bridges the gap between requirements expressed in natural language and analyzable and more concise domain models expressed in class diagrams. Due to the lack of modelling skills among novice modellers and time constraints in industrial projects, it is often not possible to build an accurate domain model manually. To address this challenge, we aim to develop an approach to extract domain models from problem descriptions written in natural language by combining rules based on natural language processing with machine learning. As a first step, we report on an automated and tool-supported approach with an accuracy of extracted domain models higher than existing approaches. In addition, the approach generates trace links for each model element of a domain model. The trace links enable novice modellers to execute queries on the extracted domain models to gain insights into the modelling decisions taken for improving their modelling skills. Furthermore, to evaluate our approach, we propose a novel comparison metric and discuss our experimental design. Finally, we present a research agenda detailing research directions and discuss corresponding challenges.

2019-12-31

2020 IEEE 28th International Requirements Engineering Conference (RE) (published)

doi.org

Towards robust and replicable sex differences in the intrinsic brain 1 function of autism 2 3

Dorothea L. Floris

José O. A. Filho

Meng-Chuan Lai

Steve

Giavasis

Marianne Oldehinkel

Maarten Mennes

Tony Charman

Julián

Tillmann

Guillaume Dumas

Christine Ecker

Flavio Dell’Acqua

Tobias Banaschewski

Carolin Moessnang

Simon Baron-Cohen

Sarah

Durston

Eva Loth

Declan Murphy … (see 4 more)

Jan K. Buitelaar

Christian Beckmann

Michael P. Milham

A. Martino

84 Background: Marked sex differences in autism prevalence accentuate the need to understand 85 the role of biological sex-related factors i… (see more)n autism. Efforts to unravel sex differences in the 86 brain organization of autism have, however, been challenged by the limited availability of 87 female data. Methods: We addressed this gap by using a large sample of males and females 88 with autism and neurotypical (NT) control individuals (ABIDE; Autism: 362 males, 82 89 females; NT: 409 males, 166 females; 7-18 years). Discovery analyses examined main effects 90 of diagnosis, sex and their interaction across five resting-state fMRI (R-fMRI) metrics 91 (voxel-level Z > 3.1, cluster-level P 0.01, gaussian random field corrected). Secondary 92 analyses assessed the robustness of the results to different pre-processing approaches and 93 their replicability in two independent samples: the EU-AIMS Longitudinal European Autism 94 Project (LEAP) and the Gender Explorations of Neurogenetics and Development to Advance 95 Autism Research (GENDAAR). Results: Discovery analyses in ABIDE revealed significant 96 main effects across the intrinsic functional connectivity (iFC) of the posterior cingulate 97 cortex, regional homogeneity and voxel-mirrored homotopic connectivity (VMHC) in several 98 cortical regions, largely converging in the default network midline. Sex-by-diagnosis 99 interactions were confined to the dorsolateral occipital cortex, with reduced VMHC in 100 females with autism. All findings were robust to different pre-processing steps. Replicability 101 in independent samples varied by R-fMRI measures and effects with the targeted sex-by102 diagnosis interaction being replicated in the larger of the two replication samples – EU-AIMS 103 LEAP. Limitations: Given the lack of a priori harmonization among the discovery and 104 replication datasets available to date, sample-related variation remained and may have 105 affected replicability. Conclusions: Atypical cross-hemispheric interactions are 106 neurobiologically relevant to autism. They likely result from the combination of sex107

2019-12-31

(published)

www.semanticscholar.org

Université de Montréal Balancing Signals for Semi-Supervised Sequence Learning

Ya Xu

Yoshua Bengio

Christopher Pal

Aaron Courville

Training recurrent neural networks (RNNs) on long sequences using backpropagation through time (BPTT) remains a fundamental challenge. It ha… (see more)s been shown that adding a local unsupervised loss term into the optimization objective makes the training of RNNs on long sequences more effective. While the importance of an unsupervised task can in principle be controlled by a coefficient in the objective function, the gradients with respect to the unsupervised loss term still influence all the hidden state dimensions, which might cause important information about the supervised task to be degraded or erased. Compared to existing semi-supervised sequence learning methods, this thesis focuses upon a traditionally overlooked mechanism – an architecture with explicitly designed private and shared hidden units designed to mitigate the detrimental influence of the auxiliary unsupervised loss over the main supervised task. We achieve this by dividing the RNN hidden space into a private space for the supervised task or a shared space for both the supervised and unsupervised tasks. We present extensive experiments with the proposed framework on several long sequence modeling benchmark datasets. Results indicate that the proposed framework can yield performance gains in RNN models where long term dependencies are notoriously challenging to deal with.

2019-12-31

(published)

www.semanticscholar.org

Unsupervised Learning of Dense Visual Representations

Pedro O. Pinheiro

Amjad Almahairi

Ryan Y. Benmalek

Florian Golemo

Aaron Courville

Contrastive self-supervised learning has emerged as a promising approach to unsupervised visual representation learning. In general, these m… (see more)ethods learn global (image-level) representations that are invariant to different views (i.e., compositions of data augmentation) of the same image. However, many visual understanding tasks require dense (pixel-level) representations. In this paper, we propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations. VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions. Specifically, this is achieved through pixel-level contrastive learning: matching features (that is, features that describes the same location of the scene on different views) should be close in an embedding space, while non-matching features should be apart. VADeR provides a natural representation for dense prediction tasks and transfers well to downstream tasks. Our method outperforms ImageNet supervised pretraining (and strong unsupervised baselines) in multiple dense prediction tasks.

2019-12-31

Advances in Neural Information Processing Systems 33 (NeurIPS 2020) (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications