Publications

Compositional Generalization in Dependency Parsing
Compositionality— the ability to combine familiar units like words into novel phrases and sentences— has been the focus of intense inter… (see more)est in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behaviour of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser’s lower performance on the most challenging splits.
Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers
Gabriele Prato
Simon Guiroy
Ethan Caballero
Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly… (see more) in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization.
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations
Pau Rodriguez
Massimo Caccia
Alexandre Lacoste
Lee Zamparo
Issam Hadj Laradji
David Vazquez
Explainability for machine learning models has gained considerable attention within the research community given the importance of deploying… (see more) more reliable machine-learning systems. In computer vision applications, generative counterfactual methods indicate how to perturb a model’s input to change its prediction, providing details about the model’s decision-making. Current methods tend to generate trivial counterfactuals about a model’s decisions, as they often suggest to exaggerate or remove the presence of the attribute being classified. For the machine learning practitioner, these types of counterfactuals offer little value, since they provide no new information about undesired model or data biases. In this work, we identify the problem of trivial counterfactual generation and we propose DiVE to alleviate it. DiVE learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss to uncover multiple valuable explanations about the model’s prediction. Further, we introduce a mechanism to prevent the model from producing trivial explanations. Experiments on CelebA and Synbols demonstrate that our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods. Code is available at https://github.com/ElementAI/beyond-trivial-explanations.
DoMoBOT: An AI-Empowered Bot for Automated and Interactive Domain Modelling
Rijul Saini
Gunter Mussbacher
Jörg Kienzle
Domain modelling transforms informal requirements written in natural language in the form of problem descriptions into concise and analyzabl… (see more)e domain models. As the manual construction of these domain models is often time-consuming, error-prone, and labor-intensive, several approaches already exist to automate domain modelling. However, the current approaches suffer from lower accuracy of extracted domain models and the lack of support for system-modeller interactions. To better assist modellers, we introduce DoMoBOT, a web-based Domain Modelling BOT. Our proposed bot combines artificial intelligence techniques such as natural language processing and machine learning to extract domain models with higher accuracy. More importantly, our bot incorporates a set of features to bring synergy between automated model extraction and bot-modeller interactions. During these interactions, the bot presents multiple possible solutions to a modeller for modelling scenarios present in a given problem description. The bot further enables modellers to switch to a particular solution and updates the other parts of the domain model proactively. In this tool demo paper, we demonstrate how the implementation and architecture of DoMoBOT support the paradigm of automated and interactive domain modelling for assisting modellers.
Impact of Aliasing on Generalization in Deep Convolutional Networks
Cristina Vasconcelos
Vincent Dumoulin
Rob Romijnders
Ross Goroshin
We investigate the impact of aliasing on generalization in Deep Convolutional Networks and show that data augmentation schemes alone are una… (see more)ble to prevent it due to structural limitations in widely used architectures. Drawing insights from frequency analysis theory, we take a closer look at ResNet and EfficientNet architectures and review the trade-off between aliasing and information loss in each of their major components. We show how to mitigate aliasing by inserting non-trainable low-pass filters at key locations, particularly where networks lack the capacity to learn them. These simple architectural changes lead to substantial improvements in generalization on i.i.d. and even more on out-of-distribution conditions, such as image classification under natural corruptions on ImageNet-C [11] and few-shot learning on Meta-Dataset [26]. State-of-the art results are achieved on both datasets without introducing additional trainable parameters and using the default hyper-parameters of open source codebases.
GPU acceleration of finite state machine input execution: Improving scale and performance
Vanya Yaneva
Ajitha Rajan
Model‐based development is a popular development approach in which software is implemented and verified based on a model of the required s… (see more)ystem. Finite state machines (FSMs) are widely used as models for systems in several domains. Validating that a model accurately represents the required behaviour involves the generation and execution of a large number of input sequences, which is often an expensive and time‐consuming process. In this paper, we speed up the execution of input sequences for FSM validation, by leveraging the high degree of parallelism of modern graphics processing units (GPUs) for the automatic execution of FSM input sequences in parallel on the GPU threads. We expand our existing work by providing techniques that improve the performance and scalability of this approach. We conduct extensive empirical evaluation using 15 large FSMs from the networking domain and measure GPU speed‐up over a 16‐core CPU, taking into account total GPU time, which includes both data transfer and kernel execution time. We found that GPUs execute FSM input sequences up to 9.28× faster than a 16‐core CPU, with an average speed‐up of 4.53× across all subjects. Our optimizations achieve an average improvement over existing work of 58.95% for speed‐up and scalability to large FSMs with over 2K states and 500K transitions. We also found that techniques aimed at reducing the number of required input sequences for large FSMs with high density were ineffective when applied to all‐transition pair coverage, thus emphasizing the need for approaches like ours that speed up input execution.
GPU acceleration of finite state machine input execution: Improving scale and performance
Vanya Yaneva
Ajitha Rajan
Model‐based development is a popular development approach in which software is implemented and verified based on a model of the required s… (see more)ystem. Finite state machines (FSMs) are widely used as models for systems in several domains. Validating that a model accurately represents the required behaviour involves the generation and execution of a large number of input sequences, which is often an expensive and time‐consuming process. In this paper, we speed up the execution of input sequences for FSM validation, by leveraging the high degree of parallelism of modern graphics processing units (GPUs) for the automatic execution of FSM input sequences in parallel on the GPU threads. We expand our existing work by providing techniques that improve the performance and scalability of this approach. We conduct extensive empirical evaluation using 15 large FSMs from the networking domain and measure GPU speed‐up over a 16‐core CPU, taking into account total GPU time, which includes both data transfer and kernel execution time. We found that GPUs execute FSM input sequences up to 9.28× faster than a 16‐core CPU, with an average speed‐up of 4.53× across all subjects. Our optimizations achieve an average improvement over existing work of 58.95% for speed‐up and scalability to large FSMs with over 2K states and 500K transitions. We also found that techniques aimed at reducing the number of required input sequences for large FSMs with high density were ineffective when applied to all‐transition pair coverage, thus emphasizing the need for approaches like ours that speed up input execution.
Capacity Planning in Stable Matching
Federico Bobbio
Ignacio Rios
Alfredo Torrico
We introduce the problem of jointly increasing school capacities and finding a student-optimal assignment in the expanded market. Due to the… (see more) impossibility of efficiently solving the problem with classical methods, we generalize existent mathematical programming formulations of stability constraints to our setting, most of which result in integer quadratically-constrained programs. In addition, we propose a novel mixed-integer linear programming formulation that is exponentially large on the problem size. We show that its stability constraints can be separated by exploiting the objective function, leading to an effective cutting-plane algorithm. We conclude the theoretical analysis of the problem by discussing some mechanism properties. On the computational side, we evaluate the performance of our approaches in a detailed study, and we find that our cutting-plane method outperforms our generalization of existing mixed-integer approaches. We also propose two heuristics that are effective for large instances of the problem. Finally, we use the Chilean school choice system data to demonstrate the impact of capacity planning under stability conditions. Our results show that each additional seat can benefit multiple students and that we can effectively target the assignment of previously unassigned students or improve the assignment of several students through improvement chains. These insights empower the decision-maker in tuning the matching algorithm to provide a fair application-oriented solution.
Inter-Brain Synchronization: From Neurobehavioral Correlation to Causal Explanation
Small, correlated changes in synaptic connectivity may facilitate rapid motor learning
Barbara Feulner
Raeed H. Chowdhury
Lee Miller
Juan A. Gallego
Claudia Clopath
THE EFFECT SIZE OF GENES ON COGNITIVE ABILITIES IS LINKED TO THEIR EXPRESSION ALONG THE MAJOR HIERARCHICAL GRADIENT IN THE HUMAN BRAIN
Sébastien Jacquemont
Guillaume Huguet
Elise Douard
Zohra Saci
Laura Almasy
David C. Glahn
Trade-off Between Accuracy and Fairness of Data-driven Building and Indoor Environment Models: A Comparative Study of Pre-processing Methods
Ying Sun
Fariborz Haghighat