Publications

COIL: A Deep Architecture for Column Generation

Behrouz Babaki

Sanjay Dominik Jena

. Column generation is a popular method to solve large-scale linear programs with an exponential number of variables. Several important appl… (voir plus)ications, such as the vehicle routing problem, rely on this technique in order to be solved. However, in practice, column generation methods suffer from slow convergence (i.e. they require too many iterations). Stabilization techniques, which carefully select the column to add at each iteration, are commonly used to improve convergence. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. Our architecture is inspired by stabilization techniques and predicts the optimal duals, which are then used to select the columns to add. We proposed architecture, trained using imitation learning. Exemplified on the Vehicle Routing Problem, we show that several machine learning models yield good performance in predicting the optimal duals and that our architecture outperforms them as well as a popular state-of-the-art stabilization technique. Further, the architecture approach can generalize to instances larger than those observed during training.

2022-01-01

(publié)

www.semanticscholar.org

Compositional Attention: Disentangling Search and Retrieval

Sarthak Mittal

Sharath Chandra Raparthy

Irina Rish

Yoshua Bengio

Guillaume Lajoie

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent … (voir plus)years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

2022-01-01

International Conference on Learning Representations (publié)

openreview.net

Computing Nash equilibria for integer programming games

Margarida Carvalho

Andrea Lodi

João Pedro Pedroso

2022-01-01

European Journal of Operational Research (publié)

doi.org

arxiv.org

Continual Learning In Environments With Polynomial Mixing Times

Matthew D Riemer

Sharath Chandra Raparthy

Ignacio Cases

Gopeshh Raaj Subbaraj

Maximilian Puelma Touzel

Irina Rish

The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mi… (voir plus)xing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches that suffer from myopic bias and stale bootstrapped estimates. To validate the proposed theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task switching frequency for pretrained high performing policies on seven Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.

openreview.net

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

Oleksiy Ostapenko

Timothee LESORT

Pau Rodriguez

Md Rifat Arefin

Arthur Douillard

Irina Rish

Laurent Charlin

2022-01-01

CoLLAs (publié)

arxiv.org

Deposited in DRO : 17 January 2022 Version of attached le : Accepted Version Peer-review status of attached

Nelly Bencomo

Jin Guo

Rachel Harrison

Hans-Martin Heyn

Tim Menzies

Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowl… (voir plus)edge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software. Much has been written about the algorithmic role that AI plays for automation in SE. But what about the role of AI, augmented by human knowledge? Can we make a profound advance by combining human and artificial intelligence? Researchers in requirements engineering think so, arguing that requirement engineering is the secret weapon for better AI and better software1. To begin, we first need a definition. What is requirements engineering or RE? RE used to be viewed as an early lifecycle activity that proceeded analysis, design, coding and testing. For safety critical applications there is certainly a pressing need to create those requirements before the coding starts (we will return to this point, later in the paper). However, in this age of DevOps and Autonomous and Self-adaptive systems, requirements can happen at many other times in a software project[15], [14]. We say that: Requirements engineering is any discussion about what to build and how to trade-off competing cost/benefits. It can happen before, during, or after runtime. 1This paper is based on the Panel “Artificial Intelligence and Requirement Engineering: Challenges and Opportunities”, which took place at the Eighth International Workshop on Artificial Intelligence and Requirements Engineering (AIRE). As shown in Table 1 and Table 2, there are many ways AI can help RE, across a broad range of SE activities. But, what about the other way around? If we add more requirements into AI, and use RE methods to get truly desired requirements, can we make better software by combining human and artificial intelligence? In our view, when integrating AI into software engineering is a co-design problem between humans, the AI model, the data required to train and validate the desired behaviour, and the hardware running the AI model, in addition to the classical software components. This means that when integrating AI, you need to know and understand the context of the system in which you want to apply your AI model to derive the necessary model requirements [17]. For example, in the arena of safety critical systems, model construction must be guided by safety requirements. one challenge for AI in RE are safety standards that base on the EN-IEC 61508 standard2. These safety standards assume that for software only systematic faults exists. Therefore, they emphasise correct processes and the creation of lifecycle artifacts to minimise systematic mistakes during both the 2Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems; for example ISO 26262 for the automotive sector or IEC 61511 for the process industry. IEEE Software (submitted) Published by the IEEE Computer Society © 2021 IEEE 1

2022-01-01

(publié)

www.semanticscholar.org

Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

Sébastien Lachapelle

Pau Rodriguez

Yash Sharma

Katie E Everett

Rémi LE PRIOL

Alexandre Lacoste

Simon Lacoste-Julien

This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (voir plus)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.

2022-01-01

CLeaR (publié)

proceedings.mlr.press

openreview.net

DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing

Qidie Wu

Jinsheng Kuang

Jiyun Tao

Jienan Chen

Warren Gross

As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (voir plus)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.

2022-01-01

IEEE Transactions on Signal Processing (publié)

doi.org

DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing

Qidie Wu

Jinsheng Kuang

Jiyun Tao

Jienan Chen

Warren Gross

As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (voir plus)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.

2022-01-01

IEEE Transactions on Signal Processing (published)

doi.org

Empirical Analysis of Model Selection for Heterogenous Causal Effect Estimation

Divyat Mahajan

Ioannis Mitliagkas

Brady Neal

Vasilis Syrgkanis

We study the problem of model selection in causal inference, specifically for the case of conditional average treatment effect (CATE) estima… (voir plus)tion under binary treatments. Unlike model selection in machine learning, there is no perfect analogue of cross-validation as we do not observe the counterfactual potential outcome for any data point. Towards this, there have been a variety of proxy metrics proposed in the literature, that depend on auxiliary nuisance models estimated from the observed data (propensity score model, outcome regression model). However, the effectiveness of these metrics has only been studied on synthetic datasets as we can access the counterfactual data for them. We conduct an extensive empirical analysis to judge the performance of these metrics introduced in the literature, and novel ones introduced in this work, where we utilize the latest advances in generative modeling to incorporate multiple realistic datasets. Our analysis suggests novel model selection strategies based on careful hyperparameter tuning of CATE estimators and causal ensembling.

2022-01-01

arXiv.org (prépublication)

doi.org

arxiv.org

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

Yifei Li

Pratheeksha Nair

Kellin Pelrine

Reihaneh Rabbany

Online escort advertisement websites are widely used for advertising victims of human trafficking. Domain experts agree that advertising mul… (voir plus)tiple people in the same ad is a strong indicator of trafficking. Thus, extracting person names from the text of these ads can provide valuable clues for further analysis. However, Named-Entity Recognition (NER) on escort ads is challenging because the text can be noisy, colloquial and often lacking proper grammar and punctuation. Most existing state-of-the-art NER models fail to demonstrate satisfactory performance in this task. In this paper, we propose NEAT (Name Extraction Against Trafficking) for extracting person names. It effectively combines classic rule-based and dictionary extractors with a contextualized language model to capture ambiguous names (e.g penny, hazel) and adapts to adversarial changes in the text by expanding its dictionary. NEAT shows 19% improvement on average in the F1 classification score for name extraction compared to previous state-of-the-art in two domain-specific datasets.

2022-01-01

Findings (published)

doi.org

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

Yifei Li

Pratheeksha Nair

Kellin Pelrine

Reihaneh Rabbany

2022-01-01

Findings (publié)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications