Publications

DynGFN: Bayesian Dynamic Causal Discovery using Generative Flow Networks
Lazar Atanackovic
Alexander Tong
Jason Hartford
Leo Jingyu Lee
Bo Wang
Learning the causal structure of observable variables is a central focus for scientific discovery. Bayesian causal discovery methods tackle… (see more) this problem by learning a posterior over the set of admissible graphs given our priors and observations. Existing methods primarily consider observations from static systems and assume the underlying causal structure takes the form of a directed acyclic graph (DAG). In settings with dynamic feedback mechanisms that regulate the trajectories of individual variables, this acyclicity assumption fails unless we account for time. We focus on learning Bayesian posteriors over cyclic graphs and treat causal discovery as a problem of sparse identification of a dynamical sys-tem. This imposes a natural temporal causal order between variables and captures cyclic feedback loops through time. Under this lens, we propose a new framework for Bayesian causal discovery for dynamical systems and present a novel generative flow network architecture (DynGFN) tailored for this task. Our results indicate that DynGFN learns posteriors that better encapsulate the distributions over admissible cyclic causal structures compared to counterpart state-of-the-art approaches.
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Darshan Patil
Emma Strubell
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due … (see more)to its resemblance to biological learning, but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel dataset of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness in order to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach leads to performance comparable to the state-of-the-art in task-sequential continual learning across multiple settings, without retaining a memory that scales in size with the number of tasks.
Estimating causal effects with optimization-based methods: A review and empirical comparison
Martin Cousineau
Vedat Verter
Susan A. Murphy
Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness
Zichao Li
Ines Arous
The potential of using a large language model (LLM) as a knowledge base (KB) has sparked significant interest. To maintain the knowledge acq… (see more)uired by LLMs, we need to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge. Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should apply to its lexical variations without disrupting irrelevant ones. However, they neglect the dependency between a fact and its logical implications. We propose an evaluation protocol with an accompanying question-answering dataset, StandUp, that provides a comprehensive assessment of the editing process considering the above notions of dependency. Our protocol involves setting up a controlled environment in which we edit facts and monitor their impact on LLMs, along with their implications based on If-Then rules. Extensive experiments on StandUp show that existing knowledge editing methods are sensitive to the surface form of knowledge, and that they have limited performance in inferring the implications of edited facts.
Explaining Graph Neural Networks Using Interpretable Local Surrogates
Farzaneh Heidari
Perouz Taslakian
We propose an interpretable local surrogate (ILS) method for understanding the predictions of black-box graph models. Explainability methods… (see more) are commonly employed to gain insights into black-box models and, given the widespread adoption of GNNs in diverse applications, understanding the underlying reasoning behind their decision-making processes becomes crucial. Our ILS method approximates the behavior of a black-box graph model by fitting a simple surrogate model in the local neighborhood of a given input example. Leveraging the interpretability of the surrogate, ILS is able to identify the most relevant nodes contributing to a specific prediction. To efficiently identify these nodes, we utilize group sparse linear models as local surrogates. Through empirical evaluations on explainability benchmarks, our method consistently outperforms state-of-the-art graph explainability methods. This demonstrates the effectiveness of our approach in providing enhanced interpretability for GNN predictions.
Exploring Self-Attention Mechanisms for Speech Separation
Samuele Cornell
François Grondin
Mirko Bronzi
Transformers have enabled impressive improvements in deep learning. They often outperform recurrent and convolutional models in many tasks w… (see more)hile taking advantage of parallel processing. Recently, we proposed the SepFormer, which obtains state-of-the-art performance in speech separation with the WSJ0-2/3 Mix datasets. This paper studies in-depth Transformers for speech separation. In particular, we extend our previous findings on the SepFormer by providing results on more challenging noisy and noisy-reverberant datasets, such as LibriMix, WHAM!, and WHAMR!. Moreover, we extend our model to perform speech enhancement and provide experimental evidence on denoising and dereverberation tasks. Finally, we investigate, for the first time in speech separation, the use of efficient self-attention mechanisms such as Linformers, Lonformers, and ReFormers. We found that they reduce memory requirements significantly. For example, we show that the Reformer-based attention outperforms the popular Conv-TasNet model on the WSJ0-2Mix dataset while being faster at inference and comparable in terms of memory consumption.
Exploring trust development in families of children towards surgical and emergency care providers: A scoping review of the literature.
Olivia Serhan
Alexander Moise
Elena Guadagno
A. Issa
Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning
Yuchen Lu
Zhen Liu
Aristide Baratin
Romain Laroche
Family risk communication preferences in pediatric surgery: A scoping review.
Arthega Selvarajan
Brandon Arulanandam
Elena Guadagno
Feature Likelihood Divergence: Evaluating the Generalization of Generative Models Using Samples
Marco Jiralerspong
Joey Bose
Ian Gemp
Chongli Qin
Yoram Bachrach
Feature Likelihood Score: Evaluating Generalization of Generative Models Using Samples
Marco Jiralerspong
Avishek Joey Bose
Deep generative models have demonstrated the ability to generate complex, high-dimensional, and photo-realistic data. However, a unified fr… (see more)amework for evaluating different generative modeling families remains a challenge. Indeed, likelihood-based metrics do not apply in many cases while pure sample-based metrics such as FID fail to capture known failure modes such as overfitting on training data. In this work, we introduce the Feature Likelihood Score (FLS), a parametric sample-based score that uses density estimation to quantitatively measure the quality/diversity of generated samples while taking into account overfitting. We empirically demonstrate the ability of FLS to identify specific overfitting problem cases, even when previously proposed metrics fail. We further perform an extensive experimental evaluation on various image datasets and model classes. Our results indicate that FLS matches intuitions of previous metrics, such as FID, while providing a more holistic evaluation of generative models that highlights models whose generalization abilities are under or overappreciated. Code for computing FLS is provided at https://github.com/marcojira/fls.
Filtering Pixel Latent Variables for Unmixing Volumetric Images
Catherine Bouchard
Vincent Boulanger
Flavie Lavoie-Cardinal
Measurements of different overlapping components require robust unmixing algorithms to convert the raw multi-dimensional measurements to use… (see more)ful unmixed images. Such algorithms perform reliable separation of the components when the raw signal is fully resolved and contains enough information to fit curves on the raw distributions. In experimental physics, measurements are often noisy, undersam-pled, or unresolved spatially or spectrally. We propose a novel method where bandpass filters are applied to the latent space of a multi-dimensional convolutional neural network to separate the overlapping signal components and extract each of their relative contributions. Simultaneously processing all dimensions with multi-dimensional convolution kernels empowers the network to combine the information from adjacent pixels and time-or spectral-bins, facilitating component separation in instances where individual pixels lack well-resolved information. We demonstrate the applicability of the method to real experimental physics problems using fluorescence lifetime microscopy and mode decomposition in optical fibers as test cases. The successful application of our approach to these two distinct experimental cases, characterized by different measured distributions, highlights the versatility of our approach in addressing a wide array of imaging tasks.