Publications

A Cost-Efficient Metadata Scheme for High-Performance Deduplication Systems

Yuxuan Mo

Yu Hua

Pengfei Li

Qin Cao

Data deduplication has been widely used in backup systems to eliminate redundant data, which speeds up the backup process and reduces the st… (voir plus)orage overhead. Deduplication packs multiple chunks into a large, fixed-size container as a storage unit to maintain the locality and achieve efficient compression. We observe that the traditional containers have low filling ratios due to a large amount of metadata generated by small files. Unfilled containers require more space to store a backup, which decreases the storage efficiency and reduces restore performance. In order to address this problem, we propose a Metadata region Adaptive Container Structure, called MACS. MACS maintains a tag to record the length of metadata region in the container. The boundary between meta-data region and data region is dynamically decided to ensure the maximum space efficiency of the containers. Moreover, we propose a container metadata length-based indexing and cache replacement strategy to allow MACS to be practical in data backup systems. We demonstrate the advantages of MACS with three real world backup datasets. MACS achieves over 95% average container filling ratio, which is significantly higher than existing designs. MACS further achieves better restore performance than the traditional container structure. When combined with existing rewriting method, MACS achieves an efficient trade-off between deduplication ratio and restore performance.

2021-12-20

2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys) (publié)

doi.org

Faults in deep reinforcement learning programs: a taxonomy and a detection approach

Amin Nikanjam

Mohammad Mehdi Morovati

Foutse Khomh

Houssem Ben Braiek

2021-12-20

Automated Software Engineering (publié)

doi.org

arxiv.org

Robustness of Markov perfect equilibrium to model approximations in general-sum dynamic games

Jayakumar Subramanian

Amit Sinha

Aditya Mahajan

Dynamic games (also called stochastic games or Markov games) are an important class of games for modeling multi-agent interactions. In many … (voir plus)situations, the dynamics and reward functions of the game are learnt from past data and are therefore approximate. In this paper, we study the robustness of Markov perfect equilibrium to approximations in reward and transition functions. Using approximation results from Markov decision processes, we show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game. We provide explicit bounds on the approximation error in terms of three quantities: (i) the error in approximating the reward functions, (ii) the error in approximating the transition function, and (iii) a property of the value function of the MPE of the approximate game. The second and third quantities depend on the choice of metric on probability spaces. We also present coarser upper bounds which do not depend on the value function but only depend on the properties of the reward and transition functions of the approximate game. We illustrate the results via a numerical example.

2021-12-20

2021 Seventh Indian Control Conference (ICC) (publié)

doi.org

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

Ian Porada

Alessandro Sordoni

Jackie Cheung

Transformer models pre-trained with a masked-language-modeling objective (e.g., BERT) encode commonsense knowledge as evidenced by behaviora… (voir plus)l probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question. To answer this question, we selectively inject verbalized knowledge into the pre-training minibatches of BERT and evaluate how well the model generalizes to supported inferences after pre-training on the injected knowledge. We find generalization does not improve over the course of pre-training BERT from scratch, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.

2021-12-16

ArXiv (preprint)

doi.org

arxiv.org

Neural Column Generation for Capacitated Vehicle Routing

Behrouz Babaki

Sanjay Dominik Jena

Laurent Charlin

The column generation technique is essential for solving linear programs with an exponential number of variables. Many important application… (voir plus)s such as the vehicle routing problem (VRP) now require it. However, in practice, getting column generation to converge is challenging. It often ends up adding too many columns. In this work, we frame the problem of selecting which columns to add as one of sequential decision-making. We propose a neural column generation architecture that iteratively selects columns to be added to the problem. The architecture, inspired by stabilization techniques, first predicts the optimal duals. These predictions are then used to obtain the columns to add. We show using VRP instances that in this setting several machine learning models yield good performance on the task and that our proposed architecture learned using imitation learning outperforms a modern stabilization technique.

2021-12-16

AAAI.org/2022/Workshop/ML4OR-22 (poster)

openreview.net

Global epidemiology of SARS-CoV-2 infection: a systematic review and meta-analysis of standardized population-based seroprevalence studies, Jan 2020-Oct 2021

Isabel Bergeri

Mairead Whelan

Harriet Ware

Lorenzo Subissi

Anthony Nardone

H. Lewis

Zihan Li

Xiaomeng Ma

Marta Valenciano

Brianna Cheng

Lubna Al Ariqi

Arash Rashidian

Joseph Okeibunor

Tasnim Azim

Pushpa Wijesinghe

Linh-Vi Le

Aisling Vaughan

Richard Pebody

Andrea Vicari

Tingting Yan … (voir 8 de plus)

Mercedes Yanes-Lane

Christian Cao

Matthew P. Cheng

Jesse Papenburg

David Buckeridge

Niklas Bobrovitz

Rahul K. Arora

Maria D Van Kerkhove

Background COVID-19 case data underestimates infection and immunity, especially in low- and middle-income countries (LMICs). We meta-analyze… (voir plus)d standardized SARS-CoV-2 seroprevalence studies to estimate global seroprevalence. Objectives/Methods We conducted a systematic review and meta-analysis, searching MEDLINE, Embase, Web of Science, preprints, and grey literature for SARS-CoV-2 seroprevalence studies aligned with the WHO UNITY protocol published between 2020-01-01 and 2021-10-29. Eligible studies were extracted and critically appraised in duplicate. We meta-analyzed seroprevalence by country and month, pooling to estimate regional and global seroprevalence over time; compared seroprevalence from infection to confirmed cases to estimate under-ascertainment; meta-analyzed differences in seroprevalence between demographic subgroups; and identified national factors associated with seroprevalence using meta-regression. PROSPERO: CRD42020183634. Results We identified 396 full texts reporting 736 distinct seroprevalence studies (41% LMIC), including 355 low/moderate risk of bias studies with national/sub-national scope in further analysis. By April 2021, global SARS-CoV-2 seroprevalence was 26.1%, 95% CI [24.6-27.6%]. Seroprevalence rose steeply in the first half of 2021 due to infection in some regions (e.g., 18.2% to 45.9% in Africa) and vaccination and infection in others (e.g., 11.3% to 57.4% in the Americas high-income countries), but remained low in others (e.g., 0.3% to 1.6% in the Western Pacific). In 2021 Q1, median seroprevalence to case ratios were 1.9:1 in HICs and 61.9:1 in LMICs. Children 0-9 years and adults 60+ were at lower risk of seropositivity than adults 20-29. In a multivariate model using data pre-vaccination, more stringent public health and social measures were associated with lower seroprevalence. Conclusions Global seroprevalence has risen considerably over time and with regional variation, however much of the global population remains susceptible to SARS-CoV-2 infection. True infections far exceed reported COVID-19 cases. Standardized seroprevalence studies are essential to inform COVID-19 control measures, particularly in resource-limited regions.

2021-12-15

medRxiv (prépublication)

doi.org

Preference for biological motion is reduced in ASD: implications for clinical trials and the search for biomarkers

Luke Mason

F. Shic

T. Falck-Ytter

Bhismadev Chakrabarti

Tony Charman

Eva Loth

Julian Tillmann

Tobias Banaschewski

Simon Baron-Cohen

Sven Bölte

J. Buitelaar

Sarah Durston

Bob Oranje

Antonio Persico

C. Beckmann

Thomas Bougeron

Flavio Dell’Acqua

Christine Ecker

Carolin Moessnang

D. Murphy … (voir 49 de plus)

M. H. Johnson

Emily J. H. Jones

Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette Chris Ineke Daisy Guillaume Jessica Vincent Pilar David Lindsay Joerg Rosemary Meng-Chuan Xavier Liogier Michael V. David J. René Andre Maarten Andreas Nico Bethany Laurence Marianne Gahan Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad

Jumana Sara Sarah Carsten Michael Daniel Claudia Yvette C Ahmad Ambrosino Baumeister Bours Brammer Brandeis

Jumana Ahmad

Sara Ambrosino

Sarah Baumeister

Carsten Bours

Michael Brammer

Daniel Brandeis

Claudia Brogna

Yvette de Bruijn

Christopher H. Chatham

Ineke Cornelissen

Daisy Crawley

Guillaume Dumas

Jessica Faulkner

Vincent Frouin

Pilar Garcés

David Goyard

Lindsay Ham

Joerg F. Hipp

Rosemary Holt

Meng-Chuan Lai

Xavier Liogier D’ardhuy

Michael V. Lombardo

David J. Lythgoe

René Mandl

Andre Marquand

Maarten Mennes

Andreas Meyer-Lindenberg

Nico Bast

Beth Oakley

Laurence O’Dwyer

Marianne Oldehinkel

Gahan Pandina

Barbara Ruggeri

Amber N. V. Ruigrok

Jessica Sabet

Roberto Sacco

Antonia San José Cáceres

Emily Simonoff

Will Spooren

Roberto Toro

Heike Tost

Jack Waldman

Steve C. R. Williams

Caroline Wooldridge

Marcel P. Zwiers

2021-12-15

Molecular Autism (publié)

doi.org

Decision Referrals in Human-Automation Teams

Kesav Kaza

Jerome Le Ny

Aditya Mahajan

We consider a model for optimal decision referrals in human-automation teams performing binary classification tasks. The automation observes… (voir plus) a batch of independent tasks, analyzes them, and has the option to refer a subset of them to a human operator. The human operator performs fresh analysis of the tasks referred to him. Our key modeling assumption is that the human performance degrades with workload (i.e., the number of tasks referred to human). We model the problem as a stochastic optimization problem. We first consider the special case when the workload of the human is pre-specified. We show that in this setting it is optimal to myopically refer tasks which lead to the largest reduction in the conditional expected cost until the desired workload target is met. We next consider the general setting where there is no constraint on the workload. We leverage the solution of the previous step and provide a search algorithm to efficiently find the optimal set of tasks to refer. Finally, we present a numerical study to compare the performance of our algorithm with some baseline allocation policies.

2021-12-14

IEEE Conference on Decision and Control (publié)

doi.org

Mean-field approximation for large-population beauty-contest games

Raihan Seraj

Jerome Le Ny

Aditya Mahajan

We study a class of Keynesian beauty contest games where a large number of heterogeneous players attempt to estimate a common parameter base… (voir plus)d on their own observations. The players are rewarded for producing an estimate close to a certain multiplicative factor of the average decision, this factor being specific to each player. This model is motivated by scenarios arising in commodity or financial markets, where investment decisions are sometimes partly based on following a trend. We provide a method to compute Nash equilibria within the class of affine strategies. We then develop a mean-field approximation, in the limit of an infinite number of players, which has the advantage that computing the best-response strategies only requires the knowledge of the parameter distribution of the players, rather than their actual parameters. We show that the mean-field strategies lead to an Îµ-Nash equilibrium for a system with a finite number of players. We conclude by analyzing the impact on individual behavior of changes in aggregate population behavior.

2021-12-14

IEEE Conference on Decision and Control (publié)

doi.org

Thompson sampling for linear quadratic mean-field teams

Mukul Gagrani

Sagar Sudhakara

Aditya Mahajan

Ashutosh Nayyar

Yi Ouyang

We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the ag… (voir plus)ents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of |M| different types at time horizon T is

2021-12-14

2021 60th IEEE Conference on Decision and Control (CDC) (publié)

doi.org

arxiv.org

Early Transcriptional Changes in Rabies Virus-Infected Neurons and Their Impact on Neuronal Functions

Seonhee Kim

Florence Larrous

Hugo Varet

Rachel Legendre

Lena Feige

Guillaume Dumas

Rebecca Matsas

Georgia Kouroupi

Regis Grailhe

Hervé Bourhy

2021-12-13

Frontiers in Microbiology (publié)

doi.org

Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs

Stephen Bonner

Ufuk Kirik

Ola Engkvist

Jian Tang

I. Barrett

Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KGs) that utiliz… (voir plus)e the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modelling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modelling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.

2021-12-13

ArXiv (preprint)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications