Stefan Bauer

From Points to Functions: Infinite-dimensional Representations in Diffusion Models

Sarthak Mittal

Guillaume Lajoie

Arash Mehrjou

Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative… (see more) Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks.

2022-03-28

ICLR.cc/2022/Workshop/DGM4HSD (poster)

doi.org

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

Nan Rosemary Ke

Aniket Rajiv Didolkar

Danilo Jimenez Rezende

Michael Curtis Mozer

Chris Pal

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise tha… (see more)t the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e.g., number of nodes, sparsity, causal chain length, etc.). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

2021-10-11

NeurIPS.cc/2021/Track/Datasets_and_Benchmarks/Round2 (published)

Learning Neural Causal Models with Active Interventions

Nino Scherrer

Olexa Bilaniuk

Yashas Annadani

Patrick Schwab

Bernhard Schölkopf

Michael Curtis Mozer

Nan Rosemary Ke

Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing … (see more)scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far, differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across multiple frameworks in a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data.

2021-09-06

ArXiv (preprint)

Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

Yashas Annadani

Jonas Rothfuss

Alexandre Lacoste

Learning the causal structure that underlies data is a crucial step towards robust real-world decision making. The majority of existing work… (see more) in causal inference focuses on determining a single directed acyclic graph (DAG) or a Markov equivalence class thereof. However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty. For instance, planning interventions to find out more about the causal mechanisms that govern our data requires quantifying epistemic uncertainty over DAGs. While Bayesian causal inference allows to do so, the posterior over DAGs becomes intractable even for a small number of variables. Aiming to overcome this issue, we propose a form of variational inference over the graphs of Structural Causal Models (SCMs). To this end, we introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs. Its number of parameters does not grow exponentially with the number of variables and can be tractably learned by maximising an Evidence Lower Bound (ELBO). In our experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior.

2021-06-14

ArXiv (preprint)

Toward Causal Representation Learning

Bernhard Schölkopf

Francesco Locatello

Nan Rosemary Ke

Nal Kalchbrenner

The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and… (see more) increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

2021-05-01

Proceedings of the IEEE (published)

doi.org

Towards Causal Representation Learning

Bernhard Schölkopf

Francesco Locatello

Nan Rosemary Ke

Nal Kalchbrenner

The two fields of machine learning and graphical causality arose and developed separately. However, there is now cross-pollination and incre… (see more)asing interest in both fields to benefit from the advances of the other. In the present paper, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

2021-02-22

ArXiv (preprint)

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Manuel Wüthrich

Bernhard Schölkopf

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environment… (see more)s. To facilitate research addressing this problem, we proposeCausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

2021-01-12

ICLR.cc/2021/Conference (poster)

Spatially Structured Recurrent Modules

Nasim Rahaman

Muhammad Waleed Gondal

Manuel Wüthrich

Yash Sharma

Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise we… (see more)ll and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.

2021-01-12

ICLR.cc/2021/Conference (poster)

S2RMs: Spatially Structured Recurrent Modules

Nasim Rahaman

Muhammad Waleed Gondal

Manuel Wüthrich

Y. Sharma

Bernhard Schölkopf

2020-07-13

ArXiv (preprint)

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

Spyridon Bakas

Mauricio Reyes

Andras Jakab

Stefan. Bauer

Markus Rempfler

Alessandro Crimi

Russell T. Shinohara

Christoph Berger

Sung-min Ha

Martin Rozycki

Marcel W. Prastawa

Esther Alberts

Jana Lipková

John Freymann

Justin Kirby

Michel Bilello

Hassan M. Fathallah-Shaykh

Roland Wiest

J. Kirschke

Benedikt Wiestler … (see 31 more)

Rivka R. Colen

Aikaterini Kotrotsou

Pamela LaMontagne

D. Marcus

Mikhail Milchenko

Arash Nazeri

Marc-André Weber

Abhishek Mahajan

Ujjwal Baid

Dongjin Kwon

Manu Agarwal

Mahbubul Alam

Alberto Albiol

A. Albiol

Alex A. Varghese

T. Tuan

Tal Arbel

Aaron J. Avery

Bobade Pranjal

Subhashis Banerjee

Thomas H. Batchelder

Nematollah Batmanghelich

Enzo Battistella

Martin Bendszus

E. Benson

Jose Bernal

George Biros

Mariano Cabezas

Siddhartha Chandra

Yi-Ju Chang

et al.

Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneo… (see more)us histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.

2018-11-05

ArXiv (preprint)

doi.org