Olexa Bilaniuk

Neural Causal Structure Discovery from Interventions

Nan Rosemary Ke

Bernhard Schölkopf

Michael Curtis Mozer

Chris Pal

Yoshua Bengio

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.

2023-09-10

TMLR (accepted)

openreview.net

BARVINN: Arbitrary Precision DNN Accelerator Controlled by a RISC-V CPU

Mohammadhossein Askarihemmat

Sean Wagner

Olexa Bilaniuk

Yassine Hariri

Yvon Savaria

Jean-Pierre David

2023-01-31

Proceedings of the 28th Asia and South Pacific Design Automation Conference (published)

doi.org

arxiv.org

Neural Causal Structure Discovery from Interventions

Nan Rosemary Ke

Bernhard Schölkopf

Michael Curtis Mozer

Chris Pal

Yoshua Bengio

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data.… (see more) However, there are theoretical limitations on the identiﬁability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository

2023-01-01

Trans. Mach. Learn. Res. (published)

dblp.uni-trier.de

Learning Neural Causal Models with Active Interventions

Nino Scherrer

Olexa Bilaniuk

Yashas Annadani

Anirudh Goyal

Patrick Schwab

Bernhard Schölkopf

Michael Curtis Mozer

Yoshua Bengio

Stefan Bauer

Nan Rosemary Ke

Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing … (see more)scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far, differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across multiple frameworks in a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data.

2021-09-06

ArXiv (preprint)

openreview.net

Predicting Infectiousness for Proactive Contact Tracing

Prateek Gupta

Nasim Rahaman

Pierre-Luc St-Charles

Hannah Alsdurf

gaetan caron

satya ortiz gagne

Bernhard Schölkopf … (see 3 more)

Abhinav Sharma

Jian Tang

Andrew Robert Williams

The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (see more)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.

2021-01-12

ICLR.cc/2021/Conference (spotlight)

openreview.net

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

Andrew Robert Williams

Akshay Patel

gaetan caron

satya ortiz gagne

David Buckeridge … (see 9 more)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Joanna Merckx

2020-10-02

OpenReview.net/Anonymous_Preprint (unknown)

openreview.net

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Yoshua Bengio

Tristan Deleu

Nasim Rahaman

Nan Rosemary Ke

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional change… (see more)s, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

2020-01-01

ICLR (published)

openreview.net

Retrieving Signals with Deep Complex Extractors

Ousmane Dia

Recent advances have made it possible to create deep complex-valued neural networks. Despite this progress, many challenging learning tasks … (see more)have yet to leverage the power of complex representations. Building on recent advances, we propose a new deep complex-valued method for signal retrieval and extraction in the frequency domain. As a case study, we perform audio source separation in the Fourier domain. Our new method takes advantage of the convolution theorem which states that the Fourier transform of two convolved signals is the elementwise product of their Fourier transforms. Our novel method is based on a complex-valued version of Feature-Wise Linear Modulation (FiLM) and serves as the keystone of our proposed signal extraction method. We also introduce a new and explicit amplitude and phase-aware loss, which is scale and time invariant, taking into account the complex-valued components of the spectrogram. Using the Wall Street Journal Dataset, we compared our phase-aware loss to several others that operate both in the time and frequency domains and demonstrate the effectiveness of our proposed signal extraction method and proposed loss.

2019-10-21

NeurIPS.cc/2019/Workshop/Deep_Inverse (poster)

openreview.net

Deep Complex Networks

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and re… (see more)presentations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech spectrum prediction using TIMIT. We achieve state-of-the-art performance on these audio-related tasks.

2018-01-01

ICLR.cc/2018/Conference (poster)

openreview.net

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Nan Rosemary Ke

Michael Curtis Mozer

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common me… (see more)thod for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

arxiv.org

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Nan Rosemary Ke

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagat… (see more)e credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.

2017-11-07

ArXiv (preprint)

arxiv.org