Portrait de Anirudh Goyal n'est pas disponible

Anirudh Goyal

Alumni

Publications

Inductive biases for deep learning of higher-level cognition
On the Generalization and Adaption Performance of Causal Models
Uniform Priors for Data-Efficient Learning
Samarth Sinha
Marzyeh Ghassemi
Zeynep Akata
Animesh Garg
Few or zero-shot adaptation to novel tasks is important for the scalability and deployment of machine learning models. It is therefore cruci… (voir plus)al to find properties that encourage more transferable features in deep networks for generalization. In this paper, we show that models that learn uniformly distributed features from the training data, are able to perform better transfer learning at test-time. Motivated by this, we evaluate our method: uniformity regularization (UR) on its ability to facilitate adaptation to unseen tasks and data on six distinct domains: Few-Learning with Images, Few-shot Learning with Language, Deep Metric Learning, 0-Shot Domain Adaptation, Out-of-Distribution classification, and Neural Radiance Fields. Across all experiments, we show that using UR, we are able to learn robust vision systems which consistently offer benefits over baselines trained without uniformity regularization and are able to achieve state-of-the-art performance in Deep Metric Learning, Few-shot learning with images and language.
Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel
Dianbo Liu
Tianmin Shu
Michael Curtis Mozer
Nicolas Heess
In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one a… (voir plus)nother. In this paper, we propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents’ collective performance. To ensure that this facilitator does not become a centralized controller, agents are incentivized to reduce their dependence on the messages it conveys, and the messages can only influence the selection of a policy from a fixed set, not instantaneous actions given the policy. We demonstrate the strength of this architecture over existing baselines on several cooperative MARL environments.
Coordination Among Neural Modules Through a Shared Global Workspace
Aniket Rajiv Didolkar
Nan Rosemary Ke
Nasim Rahaman
Charles Blundell
Michael Curtis Mozer
Deep learning has seen a movement away from representing examples with a monolithic hidden state towards a richly structured state. For exam… (voir plus)ple, Transformers segment by position, and object-centric architectures decompose images into entities. In all these architectures, interactions between different elements are modeled via pairwise interactions: Transformers make use of self-attention to incorporate information from other positions and object-centric architectures make use of graph neural networks to model interactions among entities. We consider how to improve on pairwise interactions in terms of global coordination and a coherent, integrated representation that can be used for downstream tasks. In cognitive science, a global workspace architecture has been proposed in which functionally specialized components share information through a common, bandwidth-limited communication channel. We explore the use of such a communication channel in the context of deep learning for modeling the structure of complex environments. The proposed method includes a shared workspace through which communication among different specialist modules takes place but due to limits on the communication bandwidth, specialist modules must compete for access. We show that capacity limitations have a rational basis in that (1) they encourage specialization and compositionality and (2) they facilitate the synchronization of otherwise independent specialists.
Biasly: a machine learning based platform for automatic racial discrimination detection in online texts
David Bamman
Chris Dyer
Noah A. Smith. 2014
Steven Bird
Ewan Klein
Edward Loper
Nat-527
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova. 2019
Bert
Samuel Gehman
Suchin Gururangan
Maarten Sap
Dan Hendrycks
Kevin Gimpel. 2020
Gaussian
Di He … (voir 22 de plus)
Guolin Ke
Feng Liao
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Bernhard E. Boser
J. Denker
Don-608 nie Henderson
Robin Howard
Wayne Hubbard
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
Mike Lewis
Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexi… (voir plus)st language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.
Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning
Hongyu Zang
Xin Li
Romain Laroche
Remi Tachet des Combes
Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and rea… (voir plus)ch a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy, high-dimensional sensory inputs is one possibility, yet this poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning compositional representations of goals and processing the resulting representation via a discretization bottleneck, for coarser specification of goals, through an approach we call DGRL. We show that discretizing outputs from goal encoders through a bottleneck can work well in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation tasks. Additionally, we show a theoretical result which bounds the expected return for goals not observed during training, while still allowing for specifying goals with expressive combinatorial structure.
Discrete-Valued Neural Communication in Structured Architectures Enhances Generalization
Dianbo Liu
Chen Sun
Michael Curtis Mozer
In this appendix, as a complementary to Theorems 1–2, we provide additional theorems, Theorems 3–4, which further illustrate the two adv… (voir plus)antages of the discretization process by considering an abstract model with the discretization bottleneck. For the advantage on the sensitivity, the error due to potential noise and perturbation without discretization — the third term ξ(w, r′,M′, d) > 0 in Theorem 4 — is shown to be minimized to zero with discretization in Theorems 3. For the second advantage, the underlying dimensionality of N(M′,d′)(r,H) + ln(N(M,d)(r,Θ)/δ) without discretization (in the bound of Theorem 4) is proven to be reduced to the typically much smaller underlying dimensionality of L + ln(N(M,d)(r, E ×Θ) with discretization in Theorems 3. Here, for any metric space (M, d) and subset M ⊆ M, the r-converging number of M is defined by N(M,d)(r,M) = min { |C| : C ⊆ M,M ⊆ ∪c∈CB(M,d)[c, r]} where the (closed) ball of radius r at centered at c is denoted by B(M,d)[c, r] = {x ∈M : d(x, c) ≤ r}. See Appendix C.1 for a simple comparison between the bound of Theorem 3 and that of Theorem 4 when the metric spaces (M, d) and (M′, d′) are chosen to be Euclidean spaces.
Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning
Aniket Rajiv Didolkar
Nitesh Bharadwaj Gundavarapu
Nan Rosemary Ke
Discrete-Valued Neural Communication
Dianbo Liu
Chen Sun
Michael Curtis Mozer
Deep learning has advanced from fully connected architectures to structured models organized into components, e.g., the transformer composed… (voir plus) of positional elements, modular architectures divided into slots, and graph neural nets made up of nodes. In structured models, an interesting question is how to conduct dynamic and possibly sparse communication among the separate components. Here, we explore the hypothesis that restricting the transmitted information among components to discrete representations is a beneficial bottleneck. The motivating intuition is human language in which communication occurs through discrete symbols. Even though individuals have different understandings of what a"cat"is based on their specific experiences, the shared discrete token makes it possible for communication among individuals to be unimpeded by individual differences in internal representation. To discretize the values of concepts dynamically communicated among specialist components, we extend the quantization mechanism from the Vector-Quantized Variational Autoencoder to multi-headed discretization with shared codebooks and use it for discrete-valued neural communication (DVNC). Our experiments show that DVNC substantially improves systematic generalization in a variety of architectures -- transformers, modular architectures, and graph neural networks. We also show that the DVNC is robust to the choice of hyperparameters, making the method very useful in practice. Moreover, we establish a theoretical justification of our discretization process, proving that it has the ability to increase noise robustness and reduce the underlying dimensionality of the model.
Neural Production Systems
Aniket Rajiv Didolkar
Nan Rosemary Ke
Charles Blundell
Philippe Beaudoin
Nicolas Heess
Michael Curtis Mozer
Visual environments are structured, consisting of distinct objects or entities. These entities have properties---visible or latent---that d… (voir plus)etermine the manner in which they interact with one another. To partition images into entities, deep-learning researchers have proposed structural inductive biases such as slot-based architectures. To model interactions among entities, equivariant graph neural nets (GNNs) are used, but these are not particularly well suited to the task for two reasons. First, GNNs do not predispose interactions to be sparse, as relationships among independent entities are likely to be. Second, GNNs do not factorize knowledge about interactions in an entity-conditional manner. As an alternative, we take inspiration from cognitive science and resurrect a classic approach, production systems, which consist of a set of rule templates that are applied by binding placeholder variables in the rules to specific entities. Rules are scored on their match to entities, and the best fitting rules are applied to update entity properties. In a series of experiments, we demonstrate that this architecture achieves a flexible, dynamic flow of control and serves to factorize entity-specific and rule-based information. This disentangling of knowledge achieves robust future-state prediction in rich visual environments, outperforming state-of-the-art methods using GNNs, and allows for the extrapolation from simple (few object) environments to more complex environments.
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning
Nan Rosemary Ke
Aniket Rajiv Didolkar
Danilo Jimenez Rezende
Michael Curtis Mozer
Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (voir plus)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.