Anirudh Goyal

Inductive biases for deep learning of higher-level cognition

Anirudh Goyal

Yoshua Bengio

2022-10-12

Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences (publié)

doi.org

arxiv.org

On the Generalization and Adaption Performance of Causal Models

Nan Rosemary Ke

2022-07-20

ICML.cc/2022/Workshop/SCIS (poster)

doi.org

openreview.net

Uniform Priors for Data-Efficient Learning

Samarth Sinha

Karsten Roth

Anirudh Goyal

Marzyeh Ghassemi

Zeynep Akata

Hugo Larochelle

Animesh Garg

Few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore cruci… (voir plus)al to find properties that encourage more transferable features in deep networks for generalization. In this paper, we show that models that learn uniformly distributed features from the training data, are able to perform better transfer learning at test-time. Motivated by this, we evaluate our method: uniformity regularization (UR) on its ability to facilitate adaptation to unseen tasks and data on six distinct domains: Few-Learning with Images, Few-shot Learning with Language, Deep Metric Learning, 0-Shot Domain Adaptation, Out-of-Distribution classification, and Neural Radiance Fields. Across all experiments, we show that using UR, we are able to learn robust vision systems which consistently offer benefits over baselines trained without uniformity regularization and are able to achieve state-of-the-art performance in Deep Metric Learning, Few-shot learning with images and language.

2022-06-19

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (publié)

doi.org

Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel

Dianbo Liu

Tianmin Shu

Michael Curtis Mozer

Nicolas Heess

Yoshua Bengio

In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one a… (voir plus)nother. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents’ collective performance. To ensure that this facilitator does not become a centralized controller, agents are incentivized to reduce their dependence on the messages it conveys, and the messages can only inﬂuence the selection of a policy from a ﬁxed set, not instantaneous actions given the policy. We demonstrate the strength of this architecture over existing baselines on several cooperative MARL environments.

2022-05-21

ArXiv (prépublication)

doi.org

arxiv.org

Coordination Among Neural Modules Through a Shared Global Workspace

Anirudh Goyal

Aniket Rajiv Didolkar

Alex Lamb

Kartikeya Badola

Nan Rosemary Ke

Nasim Rahaman

Jonathan Binas

Charles Blundell

Michael Curtis Mozer

Yoshua Bengio

Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For exam… (voir plus)ple, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural networks to model interactions among entities. We consider how to improve on pairwise interactions in terms of global coordination and a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.

2022-01-28

ICLR.cc/2022/Conference (présentation orale)

openreview.net

Biasly: a machine learning based platform for automatic racial discrimination detection in online texts

David Bamman

Chris Dyer

Noah A. Smith. 2014

Steven Bird

Ewan Klein

Edward Loper

Nat-527

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina Toutanova. 2019

Bert

Samuel Gehman

Suchin Gururangan

Maarten Sap

Dan Hendrycks

Kevin Gimpel. 2020

Gaussian

Alex Lamb

Di He … (voir 22 de plus)

Anirudh Goyal

Guolin Ke

Feng Liao

Mirco Ravanelli

Yoshua Bengio

Zhenzhong Lan

Mingda Chen

Sebastian Goodman

Yann Lecun

Bernhard E. Boser

J. Denker

Don-608 nie Henderson

Robin Howard

Wayne Hubbard

Yinhan Liu

Myle Ott

Naman Goyal

Jingfei Du

Mandar Joshi

Danqi Chen

Omer Levy

Mike Lewis

Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexi… (voir plus)st language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.

Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Hongyu Zang

Xin Li

Romain Laroche

Yoshua Bengio

Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and rea… (voir plus)ch a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy, high-dimensional sensory inputs is one possibility, yet this poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning compositional representations of goals and processing the resulting representation via a discretization bottleneck, for coarser specification of goals, through an approach we call DGRL. We show that discretizing outputs from goal encoders through a bottleneck can work well in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation tasks. Additionally, we show a theoretical result which bounds the expected return for goals not observed during training, while still allowing for specifying goals with expressive combinatorial structure.

openreview.net

Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization

Dianbo Liu

Alex Lamb

Kenji Kawaguchi

Anirudh Goyal

Chen Sun

Michael Curtis Mozer

Yoshua Bengio

In this appendix, as a complementary to Theorems 1–2, we provide additional theorems, Theorems 3–4, which further illustrate the two adv… (voir plus)antages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization — the third term ξ(w, r′,M′, d) > 0 in Theorem 4 — is shown to be minimized to zero with discretization in Theorems 3. For the second advantage, the underlying dimensionality of N(M′,d′)(r,H) + ln(N(M,d)(r,Θ)/δ) without discretization (in the bound of Theorem 4) is proven to be reduced to the typically much smaller underlying dimensionality of L + ln(N(M,d)(r, E ×Θ) with discretization in Theorems 3. Here, for any metric space (M, d) and subset M ⊆ M, the r-converging number of M is defined by N(M,d)(r,M) = min { |C| : C ⊆ M,M ⊆ ∪c∈CB(M,d)[c, r]} where the (closed) ball of radius r at centered at c is denoted by B(M,d)[c, r] = {x ∈M : d(x, c) ≤ r}. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M, d) and (M′, d′) are chosen to be Euclidean spaces.

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

Aniket Rajiv Didolkar

Kshitij Gupta

Anirudh Goyal

Nitesh Bharadwaj Gundavarapu

Alex Lamb

Nan Rosemary Ke

Yoshua Bengio

openreview.net

Discrete-Valued Neural Communication

Dianbo Liu

Alex Lamb

Kenji Kawaguchi

Anirudh Goyal

Chen Sun

Michael Curtis Mozer

Yoshua Bengio

Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed… (voir plus) of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a"cat"is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.

openreview.net

Neural Production Systems

Anirudh Goyal

Aniket Rajiv Didolkar

Nan Rosemary Ke

Charles Blundell

Philippe Beaudoin

Nicolas Heess

Michael Curtis Mozer

Yoshua Bengio

Visual environments are structured, consisting of distinct objects or entities. These entities have properties---visible or latent---that d… (voir plus)etermine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.

openreview.net

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke

Aniket Rajiv Didolkar

Danilo Jimenez Rezende

Yoshua Bengio

Michael Curtis Mozer

Chris Pal

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (voir plus)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

2021-10-11

NeurIPS.cc/2021/Track/Datasets_and_Benchmarks/Round2 (publié)

openreview.net

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Anirudh Goyal

Publications

Hackathon | Créer une IA plus sécuritaire pour la santé mentale des jeunes

Éclaireurs autochtones en IA

Avantage IA

Mots-clés populaires:

Anirudh Goyal

Publications