Publications

Building Together - Towards a Roadmap for African Language Technologies

Kathleen Siminyu

Jade Abbott

Kọ́lá Túbọ̀sún

Aremu Anuoluwapo

Blessing Kudzaishe Sibanda

Kofi Yeboah

David Ifeoluwa Adelani

Masabata Mokgesi-Selinga

Frederick R. Apina

Angela Thandizwe Mthembu

Arshath Ramkilowan

Babatunde Oladimeji

2021-12-31

Social Science Research Network (publié)

doi.org

Catalyzing next-generation Artificial Intelligence through NeuroAI

Anthony Zador

Blake Aaron Richards

Bence Ölveczky

Sean Escola

Yoshua Bengio

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Yann Lecun

Timothy P Lillicrap

Adam Marblestone

Bruno Olshausen

Alexandre Pouget … (voir 7 de plus)

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2021-12-31

arXiv.org (prépublication)

doi.org

arxiv.org

Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise

Eduard Gorbunov

Marina Danilova

David Dobre

Pavel Dvurechensky

Alexander Gasnikov

Gauthier Gidel

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Cognitive Models as Simulators: The Case of Moral Decision-Making

Ardavan S. Nobandegani

T. Shultz

Irina Rish

2021-12-31

CogSci (publié)

doi.org

arxiv.org

COIL: A Deep Architecture for Column Generation

Behrouz Babaki

Laurent Charlin

Sanjay Dominik Jena

. Column generation is a popular method to solve large-scale linear programs with an exponential number of variables. Several important appl… (voir plus)ications, such as the vehicle routing problem, rely on this technique in order to be solved. However, in practice, column generation methods suffer from slow convergence (i.e. they require too many iterations). Stabilization techniques, which carefully select the column to add at each iteration, are commonly used to improve convergence. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. Our architecture is inspired by stabilization techniques and predicts the optimal duals, which are then used to select the columns to add. We proposed architecture, trained using imitation learning. Exemplified on the Vehicle Routing Problem, we show that several machine learning models yield good performance in predicting the optimal duals and that our architecture outperforms them as well as a popular state-of-the-art stabilization technique. Further, the architecture approach can generalize to instances larger than those observed during training.

2021-12-31

(publié)

www.semanticscholar.org

Combining Reinforcement Learning and Constraint Programming for Sequence-Generation Tasks with Hard Constraints

Daphne Lafleur

A. Chandar

Gilles Pesant

While Machine Learning (ML) techniques are good at generating data similar to a dataset, they lack the capacity to enforce constraints. On t… (voir plus)he other hand, any solution to a Constraint Programming (CP) model satisfies its constraints but has no obligation to imitate a dataset. Yet, we sometimes need both. In this paper we borrow RL-Tuner, a Reinforcement Learning (RL) algorithm introduced to tune neural networks, as our enabling architecture to exploit the respective strengths of ML and CP. RL-Tuner maximizes the sum of a pretrained network’s learned probabilities and of manually-tuned penalties for each violated constraint. We replace the latter with outputs of a CP model representing the marginal probabilities of each value and the number of constraint violations. As was the case for the original RL-Tuner, we apply our algorithm to music generation since it is a highly-constrained domain for which CP is especially suited. We show that combining ML and CP, as opposed to using them individually, allows the agent to reflect the pretrained network while taking into account constraints, leading to melodic lines that respect both the corpus’ style and the music theory constraints.

2021-12-31

International Conference on Principles and Practice of Constraint Programming (publié)

doi.org

Computing Nash Equilibria for Integer Programming Games

Margarida Carvalho

Andrea Lodi

João Pedro Pedroso

The recently defined class of integer programming games (IPG) models situations where multiple self-interested decision makers interact, wit… (voir plus)h their strategy sets represented by a finite set of linear constraints together with integer requirements. Many real-world problems can suitably be fit in this class, and hence anticipating IPG outcomes is of crucial value for policy makers and regulators. Nash equilibria have been widely accepted as the solution concept of a game. Consequently, their computation provides a reasonable prediction of the games outcome. In this paper, we start by showing the computational complexity of deciding the existence of a Nash equilibrium for an IPG. Then, using sufficient conditions for their existence, we develop two general algorithmic approaches that are guaranteed to approximate an equilibrium under mild conditions. We also showcase how our methodology can be changed to determine other equilibria definitions. The performance of our methods is analyzed through computational experiments in a knapsack game, a competitive lot-sizing game, and a kidney exchange game. To the best of our knowledge, this is the first time that equilibria computation methods for general integer programming games have been designed and computationally tested.

2021-12-31

European Journal of Operational Research (publié)

doi.org

arxiv.org

A Conceptual Framework for Representing Events Under Public Health Surveillance.

Anya Okhmatovskaia

Yannan Shen

Iris Ganser

Nigel Collier

Nicholas B. King

Zaiqiao Meng

David L. Buckeridge

Information integration across multiple event-based surveillance (EBS) systems has been shown to improve global disease surveillance in expe… (voir plus)rimental settings. In practice, however, integration does not occur due to the lack of a common conceptual framework for encoding data within EBS systems. We aim to address this gap by proposing a candidate conceptual framework for representing events and related concepts in the domain of public health surveillance.

2021-12-31

Medical Informatics Europe (publié)

doi.org

Consistency-CAM: Towards Improved Weakly Supervised Semantic Segmentation.

Sai Rajeswar

Issam Hadj Laradji

Pau Rodríguez

David Vázquez

Aaron Courville

2021-12-31

BMVC (publié)

dblp.uni-trier.de

Continual Learning In Environments With Polynomial Mixing Times

Matthew Riemer

Sharath Chandra Raparthy

Ignacio Cases

Gopeshh Subbaraj

Maximilian Puelma Touzel

Irina Rish

The mixing time of the Markov chain induced by a policy limits performance in real-world continual learning scenarios. Yet, the effect of mi… (voir plus)xing times on learning in continual reinforcement learning (RL) remains underexplored. In this paper, we characterize problems that are of long-term interest to the development of continual RL, which we call scalable MDPs, through the lens of mixing times. In particular, we theoretically establish that scalable MDPs have mixing times that scale polynomially with the size of the problem. We go on to demonstrate that polynomial mixing times present significant difficulties for existing approaches, which suffer from myopic bias and stale bootstrapped estimates. To validate our theory, we study the empirical scaling behavior of mixing times with respect to the number of tasks and task duration for high performing policies deployed across multiple Atari games. Our analysis demonstrates both that polynomial mixing times do emerge in practice and how their existence may lead to unstable learning behavior like catastrophic forgetting in continual learning settings.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Continuous MDP Homomorphisms and Homomorphic Policy Gradient

Sahand Rezaei-Shoshtari

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this pape… (voir plus)r, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Contrastive introspection (ConSpec) to rapidly identify invariant prototypes for success in RL

Chen Sun

Mila

Wannan Yang

Benjamin Alsbury-Nealy

Thomas Jiralerspong

Yoshua Bengio

†. BlakeRichards

Reinforcement learning (RL) algorithms have achieved notable success in recent years, but still struggle with fundamental issues in long-ter… (voir plus)m credit assignment. It remains diﬃcult to learn in situations where success is contingent upon multiple critical steps that are distant in time from each other and from a sparse reward; as is often the case in real life. Moreover, how RL algorithms assign credit in these diﬃcult situations is typically not coded in a way that can rapidly generalize to new situations. Here, we present an approach using oﬄine contrastive learning, which we call contrastive introspection (ConSpec), that can be added to any existing RL algorithm and addresses both issues. In ConSpec, a contrastive loss is used during oﬄine replay to identify invariances among successful episodes. This takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. ConSpec stores this knowledge in a collection of prototypes summarizing the intermediate states required for success. During training, arrival at any state that matches these prototypes generates an intrinsic reward that is added to any external rewards. As well, the reward shaping provided by ConSpec can be made to preserve the optimal policy of the underlying RL agent. The prototypes in ConSpec provide two key beneﬁts for credit assignment: (1) They enable rapid identiﬁcation of all the critical states. (2) They do so in a readily interpretable manner, enabling out of distribution generalization when sensory features are altered. In summary, ConSpec is a modular system that can be added to any existing RL algorithm to improve its long-term credit assignment.

2021-12-31

(publié)

www.semanticscholar.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications