Publications

OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

Pau Rodríguez

A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. … (see more)Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits.

2023-10-27

ArXiv (preprint)

arxiv.org

Rethinking Teacher-Student Curriculum Learning under the Cooperative Mechanics of Experience

Manfred Diaz

Liam Paull

Andrea Tacchetti

Teacher-Student Curriculum Learning (TSCL) is a curriculum learning framework that draws inspiration from human cultural transmission and le… (see more)arning. It involves a teacher algorithm shaping the learning process of a learner algorithm by exposing it to controlled experiences. Despite its success, understanding the conditions under which TSCL is effective remains challenging. In this paper, we propose a data-centric perspective to analyze the underlying mechanics of the teacher-student interactions in TSCL. We leverage cooperative game theory to describe how the composition of the set of experiences presented by the teacher to the learner, as well as their order, influences the performance of the curriculum that are found by TSCL approaches. To do so, we demonstrate that for every TSCL problem, there exists an equivalent cooperative game, and several key components of the TSCL framework can be reinterpreted using game-theoretic principles. Through experiments covering supervised learning, reinforcement learning, and classical games, we estimate the cooperative values of experiences and use value-proportional curriculum mechanisms to construct curricula, even in cases where TSCL struggles. The framework and experimental setup we present in this work represent a foundation that can be used for a deeper exploration of TSCL, shedding light on its underlying mechanisms and providing insights into its broader applicability in machine learning.

2023-10-27

NeurIPS.cc/2023/Workshop/ALOE (poster)

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats

Leo Schwinn

David Dobre

Stephan Günnemann

Gauthier Gidel

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastl… (see more)y unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.

2023-10-26

NeurIPS.cc/2023/Workshop/ICBINB (published)

proceedings.mlr.press

Assessing the Generalization Capabilities of Neural Machine Translation Models for SPARQL Query Generation

Samuel Reyd

Amal Zouaq

2023-10-26

The Semantic Web – ISWC 2023 (published)

Attention Schema in Neural Agents

Dianbo Liu

Samuele Bolotta

He Zhu

Guillaume Dumas

Attention has become a common ingredient in deep learning architectures. It adds a dynamical selection of information on top of the static s… (see more)election of information supported by weights. In the same way, we can imagine a higher-order informational filter built on top of attention: an Attention Schema (AS), namely, a descriptive and predictive model of attention. In cognitive neuroscience, Attention Schema Theory (AST) supports this idea of distinguishing attention from AS. A strong prediction of this theory is that an agent can use its own AS to also infer the states of other agents' attention and consequently enhance coordination with other agents. As such, multi-agent reinforcement learning would be an ideal setting to experimentally test the validity of AST. We explore different ways in which attention and AS interact with each other. Our preliminary results indicate that agents that implement the AS as a recurrent internal control achieve the best performance. In general, these exploratory experiments suggest that equipping artificial agents with a model of attention can enhance their social intelligence.

2023-10-26

InfoCog @ Neural Information Processing Systems (poster)

Baking Symmetry into GFlowNets

George Ma

Emmanuel Bengio

Dinghuai Zhang

GFlowNets have exhibited promising performance in generating diverse candidates with high rewards. These networks generate objects increment… (see more)ally and aim to learn a policy that assigns probability of sampling objects in proportion to rewards. However, the current training pipelines of GFlowNets do not consider the presence of isomorphic actions, which are actions resulting in symmetric or isomorphic states. This lack of symmetry increases the amount of samples required for training GFlowNets and can result in inefficient and potentially incorrect flow functions. As a consequence, the reward and diversity of the generated objects decrease. In this study, our objective is to integrate symmetries into GFlowNets by identifying equivalent actions during the generation process. Experimental results using synthetic data demonstrate the promising performance of our proposed approaches.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Science (oral)

Causal Discovery in Gene Regulatory Networks with GFlowNet: Towards Scalability in Large Systems

Trang Nguyen

Alexander Tong

Kanika Madan

Dianbo Liu

Understanding causal relationships within Gene Regulatory Networks (GRNs) is essential for unraveling the gene interactions in cellular proc… (see more)esses. However, causal discovery in GRNs is a challenging problem for multiple reasons including the existence of cyclic feedback loops and uncertainty that yields diverse possible causal structures. Previous works in this area either ignore cyclic dynamics (assume acyclic structure) or struggle with scalability. We introduce Swift-DynGFN as a novel framework that enhances causal structure learning in GRNs while addressing scalability concerns. Specifically, Swift-DynGFN exploits gene-wise independence to boost parallelization and to lower computational cost. Experiments on real single-cell RNA velocity and synthetic GRN datasets showcase the advancement in learning causal structure in GRNs and scalability in larger systems.

2023-10-26

NeurIPS.cc/2023/Workshop/GenBio (poster)

Channel Selection for Test-Time Adaptation Under Distribution Shift

Pedro Vianna

Muawiz Sajjad Chaudhary

An Tang

Guy Cloutier

Guy Wolf

Michael Eickenberg

Eugene Belilovsky

To ensure robustness and generalization to real-world scenarios, test-time adaptation has been recently studied as an approach to adjust mod… (see more)els to a new data distribution during inference. Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks by recalculating batch normalization statistics on test batches. However, in many practical applications this technique is vulnerable to label distribution shifts. We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts. We find that adapted models significantly improve the performance compared to the baseline models and counteract unknown label shifts.

2023-10-26

NeurIPS.cc/2023/Workshop/DistShift (poster)

Crystal-GFN: sampling materials with desirable properties and constraints

Mistal

Alex Hernández-García

Alexandre AGM Duval

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

DGFN: Double Generative Flow Networks

2023-10-26

NeurIPS.cc/2023/Workshop/GenBio (poster)

Discrete, compositional, and symbolic representations through attractor dynamics

Andrew Nam

Eric Elmoznino

Nikolay Malkin

Chen Sun

Guillaume Lajoie

Compositionality is an important feature of discrete symbolic systems, such as language and programs, as it enables them to have infinite ca… (see more)pacity despite a finite symbol set. It serves as a useful abstraction for reasoning in both cognitive science and in AI, yet the interface between continuous and symbolic processing is often imposed by fiat at the algorithmic level, such as by means of quantization or a softmax sampling step. In this work, we explore how discretization could be implemented in a more neurally plausible manner through the modeling of attractor dynamics that partition the continuous representation space into basins that correspond to sequences of symbols. Building on established work in attractor networks and introducing novel training methods, we show that imposing structure in the symbolic space can produce compositionality in the attractor-supported representation space of rich sensory inputs. Lastly, we argue that our model exhibits the process of an information bottleneck that is thought to play a role in conscious experience, decomposing the rich information of a sensory input into stable components encoding symbolic information.

2023-10-26

NeurIPS.cc/2023/Workshop/InfoCog (oral)