Pablo Lemos

Causal Discovery in Astrophysics: Unraveling Supermassive Black Hole and Galaxy Coevolution

Zehao Jin

Mario Pasquato

Benjamin L. Davis

Tristan Deleu

Yu Luo

Changhyun Cho

Laurence Perreault-Levasseur

Yoshua Bengio

Xi Kang

Andrea Valerio Maccio

Yashar Hezaveh

Correlation does not imply causation, but patterns of statistical association between variables can be exploited to infer a causal structure… (see more) (even with purely observational data) with the burgeoning field of causal discovery. As a purely observational science, astrophysics has much to gain by exploiting these new methods. The supermassive black hole (SMBH)--galaxy interaction has long been constrained by observed scaling relations, that is low-scatter correlations between variables such as SMBH mass and the central velocity dispersion of stars in a host galaxy's bulge. This study, using advanced causal discovery techniques and an up-to-date dataset, reveals a causal link between galaxy properties and dynamically-measured SMBH masses. We apply a score-based Bayesian framework to compute the exact conditional probabilities of every causal structure that could possibly describe our galaxy sample. With the exact posterior distribution, we determine the most likely causal structures and notice a probable causal reversal when separating galaxies by morphology. In elliptical galaxies, bulge properties (built from major mergers) tend to influence SMBH growth, while in spiral galaxies, SMBHs are seen to affect host galaxy properties, potentially through feedback in gas-rich environments. For spiral galaxies, SMBHs progressively quench star formation, whereas in elliptical galaxies, quenching is complete, and the causal connection has reversed. Our findings support theoretical models of hierarchical assembly of galaxies and active galactic nuclei feedback regulating galaxy evolution. Our study suggests the potentiality for further exploration of causal links in astrophysical and cosmological scaling relations, as well as any other observational science.

2025-01-27

The Astrophysical Journal (published)

arxiv.org

PQMass: Probabilistic Assessment of the Quality of Generative Models Using Probability Mass Estimation

Laurence Perreault-Levasseur

Sammy Sharief

Nikolay Malkin

Salma Salhi

Yashar Hezaveh

Connor Stone

We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of gener… (see more)ative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models. PQMass divides the sample space into non-overlapping regions and applies chi-squared tests to the number of data samples that fall within each region, giving a

2025-01-21

International Conference on Learning Representations (poster)

Amortizing intractable inference in diffusion models for vision, language, and control

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-24

Neural Information Processing Systems (poster)

Improved off-policy training of diffusion samplers

We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. We ben… (see more)chmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods (continuous generative flow networks). Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work. We also propose a novel exploration strategy for off-policy methods, based on local search in the target space with the use of a replay buffer, and show that it improves the quality of samples on a variety of target distributions. Our code for the sampling methods and benchmarks studied is made public at https://github.com/GFNOrg/gfn-diffusion as a base for future work on diffusion models for amortized inference.

2024-09-24

Neural Information Processing Systems (poster)

Laurence Perreault-Levasseur

Improving Gradient-Guided Nested Sampling for Posterior Inference

Will Handley

We present a performant, general-purpose gradient-guided nested sampling algorithm, …

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Avishek Joey Bose

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-04-30

ICML.cc/2024/Conference (poster)

proceedings.mlr.press

Interpretable machine learning for finding intermediate-mass black holes

Mario Pasquato

PIERO TREVISAN

ABBAS ASKAR

GAIA CARENINI

MICHELA MAPELLI

Yashar Hezaveh

Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine learning (ML) models trained… (see more) on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to realistically simulate GCs, the simulation physics or initial conditions may fail to fully reflect reality. Therefore our training data may be biased, leading to a failure in generalization on observational data. Both the first issue -- explainability/interpretability -- and the second -- out of distribution generalization and fairness -- are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an XGBoost classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost and a global rule from CORELS.

2024-04-08

The Astrophysical Journal (published)

arxiv.org

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Guillaume Huguet

James Vuckovic

Kilian Fatras

Éric Thibodeau-Laufer