Publications

Robustness of Markov perfect equilibrium to model approximations in general-sum dynamic games

Jayakumar Subramanian

Dynamic games (also called stochastic games or Markov games) are an important class of games for modeling multi-agent interactions. In many … (see more)situations, the dynamics and reward functions of the game are learnt from past data and are therefore approximate. In this paper, we study the robustness of Markov perfect equilibrium to approximations in reward and transition functions. Using approximation results from Markov decision processes, we show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game. We provide explicit bounds on the approximation error in terms of three quantities: (i) the error in approximating the reward functions, (ii) the error in approximating the transition function, and (iii) a property of the value function of the MPE of the approximate game. The second and third quantities depend on the choice of metric on probability spaces. We also present coarser upper bounds which do not depend on the value function but only depend on the properties of the reward and transition functions of the approximate game. We illustrate the results via a numerical example.

2021-12-19

2021 Seventh Indian Control Conference (ICC) (published)

doi.org

Neural Column Generation for Capacitated Vehicle Routing

Behrouz Babaki

Sanjay Dominik Jena

Laurent Charlin

The column generation technique is essential for solving linear programs with an exponential number of variables. Many important application… (see more)s such as the vehicle routing problem (VRP) now require it. However, in practice, getting column generation to converge is challenging. It often ends up adding too many columns. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. The architecture, inspired by stabilization techniques, first predicts the optimal duals. These predictions are then used to obtain the columns to add. We show using VRP instances that in this setting several machine learning models yield good performance on the task and that our proposed architecture learned using imitation learning outperforms a modern stabilization technique.

2021-12-15

AAAI.org/2022/Workshop/ML4OR-22 (poster)

openreview.net

Preference for biological motion is reduced in ASD: implications for clinical trials and the search for biomarkers

Luke Mason

F. Shic

T. Falck-Ytter

Bhismadev Chakrabarti

Tony Charman

Eva Loth

Julian Tillmann

Tobias Banaschewski

Simon Baron-Cohen

Sven Bölte

J. Buitelaar

Sarah Durston

Bob Oranje

Antonio Persico

C. Beckmann

Thomas Bougeron

Flavio Dell’Acqua

Christine Ecker

Carolin Moessnang

D. Murphy … (see 49 more)

M. H. Johnson

Emily J. H. Jones

Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette Chris Ineke Daisy Guillaume Jessica Vincent Pilar David Lindsay Joerg Rosemary Meng-Chuan Xavier Liogier Michael V. David J. René Andre Maarten Andreas Nico Bethany Laurence Marianne Gahan Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad

Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette C Ahmad Ambrosino Baumeister Bours Brammer Brandeis

Jumana Ahmad

Sara Ambrosino

Sarah Baumeister

Carsten Bours

Michael Brammer

Daniel Brandeis

Claudia Brogna

Yvette de Bruijn

Christopher H. Chatham

Ineke Cornelissen

Daisy Crawley

Guillaume Dumas

Jessica Faulkner

Vincent Frouin

Pilar Garcés

David Goyard

Lindsay Ham

Joerg F. Hipp

Rosemary Holt

Meng-Chuan Lai

Xavier Liogier D’ardhuy

Michael V. Lombardo

David J. Lythgoe

René Mandl

Andre Marquand

Maarten Mennes

Andreas Meyer-Lindenberg

Nico Bast

Beth Oakley

Larry O’Dwyer

Marianne Oldehinkel

Gahan Pandina

Barbara Ruggeri

Amber N. V. Ruigrok

Jessica Sabet

Roberto Sacco

Antonia San José Cáceres

Emily Simonoff

Will Spooren

Roberto Toro

Heike Tost

Jack Waldman

Steve C. R. Williams

Caroline Wooldridge

Marcel P. Zwiers

2021-12-14

Molecular Autism (published)

doi.org

Decision Referrals in Human-Automation Teams

Kesav Kaza

Jerome Le Ny

Aditya Mahajan

We consider a model for optimal decision referrals in human-automation teams performing binary classification tasks. The automation observes… (see more) a batch of independent tasks, analyzes them, and has the option to refer a subset of them to a human operator. The human operator performs fresh analysis of the tasks referred to him. Our key modeling assumption is that the human performance degrades with workload (i.e., the number of tasks referred to human). We model the problem as a stochastic optimization problem. We first consider the special case when the workload of the human is pre-specified. We show that in this setting it is optimal to myopically refer tasks which lead to the largest reduction in the conditional expected cost until the desired workload target is met. We next consider the general setting where there is no constraint on the workload. We leverage the solution of the previous step and provide a search algorithm to efficiently find the optimal set of tasks to refer. Finally, we present a numerical study to compare the performance of our algorithm with some baseline allocation policies.

2021-12-13

IEEE Conference on Decision and Control (published)

doi.org

Mean-field approximation for large-population beauty-contest games

Raihan Seraj

Jerome Le Ny

Aditya Mahajan

We study a class of Keynesian beauty contest games where a large number of heterogeneous players attempt to estimate a common parameter base… (see more)d on their own observations. The players are rewarded for producing an estimate close to a certain multiplicative factor of the average decision, this factor being specific to each player. This model is motivated by scenarios arising in commodity or financial markets, where investment decisions are sometimes partly based on following a trend. We provide a method to compute Nash equilibria within the class of affine strategies. We then develop a mean-field approximation, in the limit of an infinite number of players, which has the advantage that computing the best-response strategies only requires the knowledge of the parameter distribution of the players, rather than their actual parameters. We show that the mean-field strategies lead to an Îµ-Nash equilibrium for a system with a finite number of players. We conclude by analyzing the impact on individual behavior of changes in aggregate population behavior.

2021-12-13

IEEE Conference on Decision and Control (published)

doi.org

Thompson sampling for linear quadratic mean-field teams

Mukul Gagrani

Sagar Sudhakara

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the ag… (see more)ents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of |M| different types at time horizon T is

2021-12-13

2021 60th IEEE Conference on Decision and Control (CDC) (published)

doi.org

arxiv.org

Behavior Predictive Representations for Generalization in Reinforcement Learning

Siddhant Agarwal

Aaron Courville

Rishabh Agarwal

Deep reinforcement learning (RL) agents trained on a few environments, often struggle to generalize on unseen environments, even when such e… (see more)nvironments are semantically equivalent to training environments. Such agents learn representations that overfit the characteristics of the training environments. We posit that generalization can be improved by assigning similar representations to scenarios with similar sequences of long-term optimal behavior. To do so, we propose behavior predictive representations (BPR) that capture long-term optimal behavior. BPR trains an agent to predict latent state representations multiple steps into the future such that these representations can predict the optimal behavior at the future steps. We demonstrate that BPR provides large gains on a jumping task from pixels, a problem designed to test generalization.

2021-12-12

NeurIPS.cc/2021/Workshop/DeepRL (unknown)

openreview.net

Early Transcriptional Changes in Rabies Virus-Infected Neurons and Their Impact on Neuronal Functions

Seonhee Kim

Florence Larrous

Hugo Varet

Rachel Legendre

Lena Feige

Guillaume Dumas

Rebecca Matsas

Georgia Kouroupi

Regis Grailhe

Hervé Bourhy

Rabies is a zoonotic disease caused by rabies virus (RABV). As rabies advances, patients develop a variety of severe neurological symptoms t… (see more)hat inevitably lead to coma and death. Unlike other neurotropic viruses that can induce symptoms of a similar range, RABV-infected post-mortem brains do not show significant signs of inflammation nor the structural damages on neurons. This suggests that the observed neurological symptoms possibly originate from dysfunctions of neurons. However, many aspects of neuronal dysfunctions in the context of RABV infection are only partially understood, and therefore require further investigation. In this study, we used differentiated neurons to characterize the RABV-induced transcriptomic changes at the early time-points of infection. We found that the genes modulated in response to the infection are particularly involved in cell cycle, gene expression, immune response, and neuronal function-associated processes. Comparing a wild-type RABV to a mutant virus harboring altered matrix proteins, we found that the RABV matrix protein plays an important role in the early down-regulation of host genes, of which a significant number is involved in neuronal functions. The kinetics of differentially expressed genes (DEGs) are also different between the wild type and mutant virus datasets. The number of modulated genes remained constant upon wild-type RABV infection up to 24 h post-infection, but dramatically increased in the mutant condition. This result suggests that the intact viral matrix protein is important to control the size of host gene modulation. We then examined the signaling pathways previously studied in relation to the innate immune responses against RABV, and found that these pathways contribute to the changes in neuronal function-associated processes. We further examined a set of regulated genes that could impact neuronal functions collectively, and demonstrated in calcium imaging that indeed the spontaneous activity of neurons is influenced by RABV infection. Overall, our findings suggest that neuronal function-associated genes are modulated by RABV early on, potentially through the viral matrix protein-interacting signaling molecules and their downstream pathways.

2021-12-12

Frontiers in Microbiology (published)

doi.org

Long-Term Credit Assignment via Model-based Temporal Shortcuts

2021-12-12

NeurIPS.cc/2021/Workshop/DeepRL (unknown)

openreview.net