Publications

A Cost-Efficient Metadata Scheme for High-Performance Deduplication Systems
Yuxuan Mo
Yu Hua
Pengfei Li
Qin Cao
Data deduplication has been widely used in backup systems to eliminate redundant data, which speeds up the backup process and reduces the st… (voir plus)orage overhead. Deduplication packs multiple chunks into a large, fixed-size container as a storage unit to maintain the locality and achieve efficient compression. We observe that the traditional containers have low filling ratios due to a large amount of metadata generated by small files. Unfilled containers require more space to store a backup, which decreases the storage efficiency and reduces restore performance. In order to address this problem, we propose a Metadata region Adaptive Container Structure, called MACS. MACS maintains a tag to record the length of metadata region in the container. The boundary between meta-data region and data region is dynamically decided to ensure the maximum space efficiency of the containers. Moreover, we propose a container metadata length-based indexing and cache replacement strategy to allow MACS to be practical in data backup systems. We demonstrate the advantages of MACS with three real world backup datasets. MACS achieves over 95% average container filling ratio, which is significantly higher than existing designs. MACS further achieves better restore performance than the traditional container structure. When combined with existing rewriting method, MACS achieves an efficient trade-off between deduplication ratio and restore performance.
Faults in deep reinforcement learning programs: a taxonomy and a detection approach
Amin Nikanjam
Mohammad Mehdi Morovati
Houssem Ben Braiek
Robustness of Markov perfect equilibrium to model approximations in general-sum dynamic games
Jayakumar Subramanian
Amit Sinha
Dynamic games (also called stochastic games or Markov games) are an important class of games for modeling multi-agent interactions. In many … (voir plus)situations, the dynamics and reward functions of the game are learnt from past data and are therefore approximate. In this paper, we study the robustness of Markov perfect equilibrium to approximations in reward and transition functions. Using approximation results from Markov decision processes, we show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game. We provide explicit bounds on the approximation error in terms of three quantities: (i) the error in approximating the reward functions, (ii) the error in approximating the transition function, and (iii) a property of the value function of the MPE of the approximate game. The second and third quantities depend on the choice of metric on probability spaces. We also present coarser upper bounds which do not depend on the value function but only depend on the properties of the reward and transition functions of the approximate game. We illustrate the results via a numerical example.
Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge
Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behaviora… (voir plus)l probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
Neural Column Generation for Capacitated Vehicle Routing
Behrouz Babaki
Sanjay Dominik Jena
The column generation technique is essential for solving linear programs with an exponential number of variables. Many important application… (voir plus)s such as the vehicle routing problem (VRP) now require it. However, in practice, getting column generation to converge is challenging. It often ends up adding too many columns. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. The architecture, inspired by stabilization techniques, first predicts the optimal duals. These predictions are then used to obtain the columns to add. We show using VRP instances that in this setting several machine learning models yield good performance on the task and that our proposed architecture learned using imitation learning outperforms a modern stabilization technique.
Global epidemiology of SARS-CoV-2 infection: a systematic review and meta-analysis of standardized population-based seroprevalence studies, Jan 2020-Oct 2021
Isabel Bergeri
Mairead Whelan
Harriet Ware
Lorenzo Subissi
Anthony Nardone
H. Lewis
Zihan Li
Xiaomeng Ma
Marta Valenciano
Brianna Cheng
Lubna Al Ariqi
Arash Rashidian
Joseph Okeibunor
Tasnim Azim
Pushpa Wijesinghe
Linh-Vi Le
Aisling Vaughan
Richard Pebody
Andrea Vicari
Tingting Yan … (voir 8 de plus)
Mercedes Yanes-Lane
Christian Cao
Matthew P. Cheng
Jesse Papenburg
Niklas Bobrovitz
Rahul K. Arora
Maria D Van Kerkhove
Background COVID-19 case data underestimates infection and immunity, especially in low- and middle-income countries (LMICs). We meta-analyze… (voir plus)d standardized SARS-CoV-2 seroprevalence studies to estimate global seroprevalence. Objectives/Methods We conducted a systematic review and meta-analysis, searching MEDLINE, Embase, Web of Science, preprints, and grey literature for SARS-CoV-2 seroprevalence studies aligned with the WHO UNITY protocol published between 2020-01-01 and 2021-10-29. Eligible studies were extracted and critically appraised in duplicate. We meta-analyzed seroprevalence by country and month, pooling to estimate regional and global seroprevalence over time; compared seroprevalence from infection to confirmed cases to estimate under-ascertainment; meta-analyzed differences in seroprevalence between demographic subgroups; and identified national factors associated with seroprevalence using meta-regression. PROSPERO: CRD42020183634. Results We identified 396 full texts reporting 736 distinct seroprevalence studies (41% LMIC), including 355 low/moderate risk of bias studies with national/sub-national scope in further analysis. By April 2021, global SARS-CoV-2 seroprevalence was 26.1%, 95% CI [24.6-27.6%]. Seroprevalence rose steeply in the first half of 2021 due to infection in some regions (e.g., 18.2% to 45.9% in Africa) and vaccination and infection in others (e.g., 11.3% to 57.4% in the Americas high-income countries), but remained low in others (e.g., 0.3% to 1.6% in the Western Pacific). In 2021 Q1, median seroprevalence to case ratios were 1.9:1 in HICs and 61.9:1 in LMICs. Children 0-9 years and adults 60+ were at lower risk of seropositivity than adults 20-29. In a multivariate model using data pre-vaccination, more stringent public health and social measures were associated with lower seroprevalence. Conclusions Global seroprevalence has risen considerably over time and with regional variation, however much of the global population remains susceptible to SARS-CoV-2 infection. True infections far exceed reported COVID-19 cases. Standardized seroprevalence studies are essential to inform COVID-19 control measures, particularly in resource-limited regions.
Preference for biological motion is reduced in ASD: implications for clinical trials and the search for biomarkers
Luke Mason
F. Shic
T. Falck-Ytter
Bhismadev Chakrabarti
Tony Charman
Eva Loth
Julian Tillmann
Tobias Banaschewski
Simon Baron-Cohen
Sven Bölte
J. Buitelaar
Sarah Durston
Bob Oranje
Antonio Persico
C. Beckmann
Thomas Bougeron
Flavio Dell’Acqua
Christine Ecker
Carolin Moessnang
D. Murphy … (voir 49 de plus)
M. H. Johnson
Emily J. H. Jones
Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette Chris Ineke Daisy Guillaume Jessica Vincent Pilar David Lindsay Joerg Rosemary Meng-Chuan Xavier Liogier Michael V. David J. René Andre Maarten Andreas Nico Bethany Laurence Marianne Gahan Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad
Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette C Ahmad Ambrosino Baumeister Bours Brammer Brandeis
Jumana Ahmad
Sara Ambrosino
Sarah Baumeister
Carsten Bours
Michael Brammer
Daniel Brandeis
Claudia Brogna
Yvette de Bruijn
Christopher H. Chatham
Ineke Cornelissen
Daisy Crawley
Jessica Faulkner
Vincent Frouin
Pilar Garcés
David Goyard
Lindsay Ham
Joerg F. Hipp
Rosemary Holt
Meng-Chuan Lai
Xavier Liogier D’ardhuy
Michael V. Lombardo
David J. Lythgoe
René Mandl
Andre Marquand
Maarten Mennes
Andreas Meyer-Lindenberg
Nico Bast
Beth Oakley
Laurence O’Dwyer
Marianne Oldehinkel
Gahan Pandina
Barbara Ruggeri
Amber N. V. Ruigrok
Jessica Sabet
Roberto Sacco
Antonia San José Cáceres
Emily Simonoff
Will Spooren
Roberto Toro
Heike Tost
Jack Waldman
Steve C. R. Williams
Caroline Wooldridge
Marcel P. Zwiers
Decision Referrals in Human-Automation Teams
Kesav Kaza
Jerome Le Ny
We consider a model for optimal decision referrals in human-automation teams performing binary classification tasks. The automation observes… (voir plus) a batch of independent tasks, analyzes them, and has the option to refer a subset of them to a human operator. The human operator performs fresh analysis of the tasks referred to him. Our key modeling assumption is that the human performance degrades with workload (i.e., the number of tasks referred to human). We model the problem as a stochastic optimization problem. We first consider the special case when the workload of the human is pre-specified. We show that in this setting it is optimal to myopically refer tasks which lead to the largest reduction in the conditional expected cost until the desired workload target is met. We next consider the general setting where there is no constraint on the workload. We leverage the solution of the previous step and provide a search algorithm to efficiently find the optimal set of tasks to refer. Finally, we present a numerical study to compare the performance of our algorithm with some baseline allocation policies.
Mean-field approximation for large-population beauty-contest games
Raihan Seraj
Jerome Le Ny
We study a class of Keynesian beauty contest games where a large number of heterogeneous players attempt to estimate a common parameter base… (voir plus)d on their own observations. The players are rewarded for producing an estimate close to a certain multiplicative factor of the average decision, this factor being specific to each player. This model is motivated by scenarios arising in commodity or financial markets, where investment decisions are sometimes partly based on following a trend. We provide a method to compute Nash equilibria within the class of affine strategies. We then develop a mean-field approximation, in the limit of an infinite number of players, which has the advantage that computing the best-response strategies only requires the knowledge of the parameter distribution of the players, rather than their actual parameters. We show that the mean-field strategies lead to an ε-Nash equilibrium for a system with a finite number of players. We conclude by analyzing the impact on individual behavior of changes in aggregate population behavior.
Thompson sampling for linear quadratic mean-field teams
Mukul Gagrani
Sagar Sudhakara
Ashutosh Nayyar
Yi Ouyang
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the ag… (voir plus)ents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of |M| different types at time horizon T is
Early Transcriptional Changes in Rabies Virus-Infected Neurons and Their Impact on Neuronal Functions
Seonhee Kim
Florence Larrous
Hugo Varet
Rachel Legendre
Lena Feige
Rebecca Matsas
Georgia Kouroupi
Regis Grailhe
Hervé Bourhy
Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs
Stephen Bonner
Ufuk Kirik
Ola Engkvist
I. Barrett
Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utiliz… (voir plus)e the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.