Publications

Beyond Mahalanobis Distance for Textual OOD Detection
Pierre Colombo
Eduardo Dadalto Câmara Gomes
Guillaume Staerman
Nathan Noiry
Bisimulation metrics and norms for real-weighted automata
Borja Balle
Pascale Gourdeau
Building Together - Towards a Roadmap for African Language Technologies
Kathleen Siminyu
Jade Abbott
Kọ́lá Túbọ̀sún
Aremu Anuoluwapo
Blessing Kudzaishe Sibanda
Kofi Yeboah
Masabata Mokgesi-Selinga
Frederick R. Apina
Angela Thandizwe Mthembu
Arshath Ramkilowan
Babatunde Oladimeji
Cognitive Models as Simulators: The Case of Moral Decision-Making
Ardavan S. Nobandegani
T. Shultz
COIL: A Deep Architecture for Column Generation
Behrouz Babaki
Sanjay Dominik Jena
. Column generation is a popular method to solve large-scale linear programs with an exponential number of variables. Several important appl… (see more)ications, such as the vehicle routing problem, rely on this technique in order to be solved. However, in practice, column generation methods suffer from slow convergence (i.e. they require too many iterations). Stabilization techniques, which carefully select the column to add at each iteration, are commonly used to improve convergence. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. Our architecture is inspired by stabilization techniques and predicts the optimal duals, which are then used to select the columns to add. We proposed architecture, trained using imitation learning. Exemplified on the Vehicle Routing Problem, we show that several machine learning models yield good performance in predicting the optimal duals and that our architecture outperforms them as well as a popular state-of-the-art stabilization technique. Further, the architecture approach can generalize to instances larger than those observed during training.
Compositional Attention: Disentangling Search and Retrieval
Sarthak Mittal
Sharath Chandra Raparthy
Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (see more)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.
Computing Nash equilibria for integer programming games
Andrea Lodi
João Pedro Pedroso
Continual Learning In Environments With Polynomial Mixing Times
Matthew D Riemer
Sharath Chandra Raparthy
Ignacio Cases
Gopeshh Raaj Subbaraj
Maximilian Puelma Touzel
The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mi… (see more)xing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches that suffer from myopic bias and stale bootstrapped estimates. To validate the proposed theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task switching frequency for pretrained high performing policies on seven Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
Oleksiy Ostapenko
Timothee LESORT
Pau Rodriguez
Md Rifat Arefin
Arthur Douillard
Deposited in DRO : 17 January 2022 Version of attached le : Accepted Version Peer-review status of attached
Nelly Bencomo
Rachel Harrison
Hans-Martin Heyn
Tim Menzies
Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowl… (see more)edge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software. Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowledge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software1. To begin, we first need a definition. What is requirements engineering or RE? RE used to be viewed as an early lifecycle activity that proceeded analysis, design, coding and testing. For safety critical applications there is certainly a pressing need to create those requirements before the coding starts (we will return to this point, later in the paper). However, in this age of DevOps and Autonomous and Self-adaptive systems, requirements can happen at many other times in a software project[15], [14]. We say that: Requirements engineering is any discussion about what to build and how to trade-off competing cost/benefits. It can happen before, during, or after runtime. 1This paper is based on the Panel “Artificial Intelligence and Requirement Engineering: Challenges and Opportunities”, which took place at the Eighth International Workshop on Artificial Intelligence and Requirements Engineering (AIRE). As shown in Table 1 and Table 2, there are many ways AI can help RE, across a broad range of SE activities. But, what about the other way around? If we add more requirements into AI, and use RE methods to get truly desired requirements, can we make better software by combining human and artificial intelligence? In our view, when integrating AI into software engineering is a co-design problem between humans, the AI model, the data required to train and validate the desired behaviour, and the hardware running the AI model, in addition to the classical software components. This means that when integrating AI, you need to know and understand the context of the system in which you want to apply your AI model to derive the necessary model requirements [17]. For example, in the arena of safety critical systems, model construction must be guided by safety requirements. one challenge for AI in RE are safety standards that base on the EN-IEC 61508 standard2. These safety standards assume that for software only systematic faults exists. Therefore, they emphasise correct processes and the creation of lifecycle artifacts to minimise systematic mistakes during both the 2Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems; for example ISO 26262 for the automotive sector or IEC 61511 for the process industry. IEEE Software (submitted) Published by the IEEE Computer Society © 2021 IEEE 1
Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA
Sébastien Lachapelle
Pau Rodriguez
Yash Sharma
Katie E Everett
Rémi LE PRIOL
Alexandre Lacoste
This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (see more)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.
DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing
Qidie Wu
Jinsheng Kuang
Jiyun Tao
Jienan Chen
As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (see more)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.