We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Systematicity in a Recurrent Neural Network by Factorizing Syntax and Semantics
Standard methods in deep learning fail to capture compositional or systematic structure in their training data, as shown by their inability … (see more)to generalize outside of the training distribution. However, human learners readily generalize in this way, e.g. by applying known grammatical rules to novel words. The inductive biases that might underlie this powerful cognitive capacity remain unclear. Inspired by work in cognitive science suggesting a functional distinction between systems for syntactic and semantic processing, we implement a modification to an existing deep learning architecture, imposing an analogous separation. The resulting architecture substantially out-performs standard recurrent networks on the SCAN dataset, a compositional generalization task, without any additional supervision. Our work suggests that separating syntactic from semantic learning may be a useful heuristic for capturing compositional structure, and highlights the potential of using cognitive principles to inform inductive biases in deep learning.
Latent-variable generative models offer a principled solution for modeling and sampling from complex probability distributions. Implementing… (see more) a joint training objective with a complex prior, however, can be a tedious task, as one is typically required to derive and code a specific cost function for each new type of prior distribution. In this work, we propose a general framework for learning latent variable generative models in a two-step fashion. In the first step of the framework, we train an autoencoder, and in the second step we fit a prior model on the resulting latent distribution. This two-step approach offers a convenient alternative to joint training, as it allows for a straightforward combination of existing models without the hustle of deriving new cost functions, and the need for coding the joint training objectives. Through a set of experiments, we demonstrate that two-step learning results in performances similar to joint training, and in some cases even results in more accurate modeling.
2020-01-01
2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (published)
The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the v… (see more)ariance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of some existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.
In many applications, it is desirable to extract only the relevant information from complex input data, which involves making a decision abo… (see more)ut which input features are relevant. The information bottleneck method formalizes this as an information-theoretic optimization problem by maintaining an optimal tradeoff between compression (throwing away irrelevant input information), and predicting the target. In many problem settings, including the reinforcement learning problems we consider in this work, we might prefer to compress only part of the input. This is typically the case when we have a standard conditioning input, such as a state observation, and a ``privileged'' input, which might correspond to the goal of a task, the output of a costly planning algorithm, or communication with another agent. In such cases, we might prefer to compress the privileged input, either to achieve better generalization (e.g., with respect to goals) or to minimize access to costly information (e.g., in the case of communication). Practical implementations of the information bottleneck based on variational inference require access to the privileged input in order to compute the bottleneck variable, so although they perform compression, this compression operation itself needs unrestricted, lossless access. In this work, we propose the variational bandwidth bottleneck, which decides for each example on the estimated value of the privileged information before seeing it, i.e., only based on the standard input, and then accordingly chooses stochastically, whether to access the privileged input or not. We formulate a tractable approximation to this framework and demonstrate in a series of reinforcement learning experiments that it can improve generalization and reduce access to computationally costly information.
We consider differentiable games where the goal is to find a Nash equilibrium. The machine learning community has recently started using v… (see more)ariants of the gradient method ( GD ). Prime examples are extragradient ( EG ), the optimistic gradient method ( OG ) and consensus optimization ( CO ), which enjoy linear convergence in cases like bilinear games, where the standard GD fails. The full bene-fits of theses relatively new methods are not known as there is no unified analysis for both strongly monotone and bilinear games. We provide new analyses of the EG ’s local and global convergence properties and use is to get a tighter global convergence rate for OG and CO . Our analysis covers the whole range of settings between bilinear and strongly monotone games. It reveals that these methods converges via different mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then prove that EG achieves the optimal rate for a wide class of algorithms with any number of extrapolations. Our tight analysis of EG ’s convergence rate in games shows that, unlike in convex minimization, EG may be much faster than GD .
2020-01-01
International Conference on Artificial Intelligence and Statistics (published)
Title : Differential functional neural circuitry behind autism subtypes with marked imbalance between social-communicative and restricted repetitive behavior symptom domains
Social-communication (SC) and restricted repetitive behaviors (RRB) are autism diagnostic symptom domains. SC and RRB severity can markedly … (see more)differ within and between individuals and is underpinned by different neural circuitry and genetic mechanisms. Modeling SC-RRB balance could help identify how neural circuitry and genetic mechanisms map onto such phenotypic heterogeneity. Here we developed a phenotypic stratification model that makes highly accurate (96-98%) out-of-sample SC=RRB, SC>RRB, and RRB>SC subtype predictions. Applying this model to resting state fMRI data from the EU-AIMS LEAP dataset (n=509), we find replicable somatomotor-perisylvian hypoconnectivity in the SC>RRB subtype versus a typically-developing (TD) comparison group. In contrast, replicable motor-anterior salience hyperconnectivity is apparent in the SC=RRB subtype versus TD. Autism-associated genes affecting astrocytes, excitatory, and inhibitory neurons are highly expressed specifically within SC>RRB hypoconnected networks, but not SC=RRB hyperconnected networks. SC-RRB balance subtypes may indicate different paths individuals take from genome, neural circuitry, to the clinical phenotype. (CIMH). Procedures were undertaken to optimize the MRI sequences for the best scanner-specific options, and phantoms and travelling heads were employed to assure standardization and quality assurance of the multisite image-acquisition 20 . Structural images were obtained using a 5.5 minute MPRAGE sequence (TR=2300ms, TE=2.93ms, T1=900ms, voxels size=1.1x1.1x1.2mm, flip angle=9°, matrix size=256x256, FOV=270mm, 176 slices). An eight-to-ten minute resting-state fMRI (rsfMRI) scan was acquired using a multi-echo planar imaging (ME-EPI) sequence 65,66 ; TR=2300ms, TE~12ms, 31ms, and 48ms (slight variations are present across centers), flip angle=80°, matrix size=64x64, (UMCU), 215 (KCL, CIMH), 265 (RUMC, UCAM). were to relax, with eyes open and fixate on a cross presented on the screen for the duration of the rsfMRI scan.
Model-Driven Software Engineering encompasses various modelling formalisms for supporting software development. One such formalism is domain… (see more) modelling which bridges the gap between requirements expressed in natural language and analyzable and more concise domain models expressed in class diagrams. Due to the lack of modelling skills among novice modellers and time constraints in industrial projects, it is often not possible to build an accurate domain model manually. To address this challenge, we aim to develop an approach to extract domain models from problem descriptions written in natural language by combining rules based on natural language processing with machine learning. As a first step, we report on an automated and tool-supported approach with an accuracy of extracted domain models higher than existing approaches. In addition, the approach generates trace links for each model element of a domain model. The trace links enable novice modellers to execute queries on the extracted domain models to gain insights into the modelling decisions taken for improving their modelling skills. Furthermore, to evaluate our approach, we propose a novel comparison metric and discuss our experimental design. Finally, we present a research agenda detailing research directions and discuss corresponding challenges.
2020-01-01
2020 IEEE 28th International Requirements Engineering Conference (RE) (published)
84 Background: Marked sex differences in autism prevalence accentuate the need to understand 85 the role of biological sex-related factors i… (see more)n autism. Efforts to unravel sex differences in the 86 brain organization of autism have, however, been challenged by the limited availability of 87 female data. Methods: We addressed this gap by using a large sample of males and females 88 with autism and neurotypical (NT) control individuals (ABIDE; Autism: 362 males, 82 89 females; NT: 409 males, 166 females; 7-18 years). Discovery analyses examined main effects 90 of diagnosis, sex and their interaction across five resting-state fMRI (R-fMRI) metrics 91 (voxel-level Z > 3.1, cluster-level P 0.01, gaussian random field corrected). Secondary 92 analyses assessed the robustness of the results to different pre-processing approaches and 93 their replicability in two independent samples: the EU-AIMS Longitudinal European Autism 94 Project (LEAP) and the Gender Explorations of Neurogenetics and Development to Advance 95 Autism Research (GENDAAR). Results: Discovery analyses in ABIDE revealed significant 96 main effects across the intrinsic functional connectivity (iFC) of the posterior cingulate 97 cortex, regional homogeneity and voxel-mirrored homotopic connectivity (VMHC) in several 98 cortical regions, largely converging in the default network midline. Sex-by-diagnosis 99 interactions were confined to the dorsolateral occipital cortex, with reduced VMHC in 100 females with autism. All findings were robust to different pre-processing steps. Replicability 101 in independent samples varied by R-fMRI measures and effects with the targeted sex-by102 diagnosis interaction being replicated in the larger of the two replication samples – EU-AIMS 103 LEAP. Limitations: Given the lack of a priori harmonization among the discovery and 104 replication datasets available to date, sample-related variation remained and may have 105 affected replicability. Conclusions: Atypical cross-hemispheric interactions are 106 neurobiologically relevant to autism. They likely result from the combination of sex107