Publications

Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration
LSST Dark Energy Science Collaboration
Eric Aubourg
Camille Avestruz
Matthew R. Becker
Biswajit Biswas
Rahul Biswas
Boris Bolliet
Adam S. Bolton
Clecio R. Bom
Raphaël Bonnet-Guerrini
Alexandre Boucaud
Jean-Eric Campagne
Chihway Chang
Aleksandra Ćiprijanović
Johann Cohen-Tanugi
Michael W. Coughlin
John Franklin Crenshaw
Juan C. Cuevas-Tello
Juan de Vicente
Seth W. Digel … (see 46 more)
Steven Dillmann
Mariano Javier de León Dominguez Romero
Alex Drlica-Wagner
Sydney Erickson
Alexander T. Gagliano
Christos Georgiou
Aritra Ghosh
Matthew Grayling
Kirill A. Grishin
Alan Heavens
Lindsay R. House
Mustapha Ishak
Wassim Kabalan
Arun Kannawadi
François Lanusse
C. Danielle Leonard
Pierre-François Léget
Michelle Lochner
Yao-Yuan Mao
Peter Melchior
Grant Merz
Martin Millon
Anais Möller
Gautham Narayan
Yuuki Omori
Hiranya Peiris
Andrés A. Plazas Malagón
Nesar Ramachandra
Benjamin Remy
Cécile Roucelle
Jaime Ruiz-Zapatero
Stefan Schuldt
Ignacio Sevilla-Noarbe
Ved G. Shah
Tjitske Starkenburg
Stephen Thorp
Laura Toribio San Cipriano
Tilman Tröster
Roberto Trotta
Padma Venkatraman
Amanda Wasserman
Tim White
Tianqing Zhang
Yuanyuan Zhang
The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will produce unprecedented volumes of heterogeneous astronomical data… (see more) (images, catalogs, and alerts) that challenge traditional analysis pipelines. The LSST Dark Energy Science Collaboration (DESC) aims to derive robust constraints on dark energy and dark matter from these data, requiring methods that are statistically powerful, scalable, and operationally reliable. Artificial intelligence and machine learning (AI/ML) are already embedded across DESC science workflows, from photometric redshifts and transient classification to weak lensing inference and cosmological simulations. Yet their utility for precision cosmology hinges on trustworthy uncertainty quantification, robustness to covariate shift and model misspecification, and reproducible integration within scientific pipelines. This white paper surveys the current landscape of AI/ML across DESC's primary cosmological probes and cross-cutting analyses, revealing that the same core methodologies and fundamental challenges recur across disparate science cases. Since progress on these cross-cutting challenges would benefit multiple probes simultaneously, we identify key methodological research priorities, including Bayesian inference at scale, physics-informed methods, validation frameworks, and active learning for discovery. With an eye on emerging techniques, we also explore the potential of the latest foundation model methodologies and LLM-driven agentic AI systems to reshape DESC workflows, provided their deployment is coupled with rigorous evaluation and governance. Finally, we discuss critical software, computing, data infrastructure, and human capital requirements for the successful deployment of these new methodologies, and consider associated risks and opportunities for broader coordination with external actors.
Automatic Recursion Elimination using Recurrence Relations for Synthesis of Stack-free Hardware
Adam Musa
High-level Synthesis (HLS) eases hardware design by offering a higher level of abstraction. However, high-level programming concepts, such a… (see more)s recursion, are costly to synthesize, if at all possible. Recursion typically relies on a dynamic call stack, whose hardware implementation is resource-intensive and inefficient. Existing approaches solve this issue by replacing recursion with iteration using explicit stack arrays or by detecting specific patterns (e.g., tail recursion) to avoid using the stack. This paper introduces a novel technique for transforming recursive functions into equivalent stack-free iterative implementations. Using static analysis, a recurrence relation is extracted from the function, representing it as a sequence bounded by the order of the relation. This relation is then used to optimize the process of incrementalization, constructing a synthesizable, stackfree version of the function that uses a bounded static array. This approach is evaluated on a set of recursive benchmarks used in prior work. It eliminates recursion from 9 out of 19 benchmarks and achieves a
<scp>CISO</scp> : Species distribution modelling Conditioned on Incomplete Species Observations
Hager Radi Abdelwahed
Mélisande Teng
Robin Zbinden
Laura Pollock
Hugol Larochelle
D. Tuia
Rolnick David
Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological re… (see more)search and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked in traditional models. While some methods, such as joint species distribution models (JSDMs), partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co‐occurrence data. In practice, species observations are often sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning‐based method for species distribution modelling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species for plants and birds. Furthermore, we show that combining and conditioning on observations from multiple datasets can improve the prediction of species occurrences in scenarios with sufficient co‐occurrences between datasets to train CISO effectively. Our results show that CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.
Mapping the Perseus galaxy cluster with XRISM
Congyao Zhang
Irina Zhuravleva
Hannah McCall
Elena Bellomi
Nhut Truong
John ZuHone
Eugene Churazov
Megan E. Eckart
Yutaka Fujita
Yuto Ichinohe
Maxim Markevitch
Kyoko Matsushita
François Mernier
Eric D. Miller
Koji Mori
Hiroshi Nakajima
Anna Ogorzalek
Frederick S. Porter
Ayşegül Tümer … (see 3 more)
Shutaro Ueda
Norbert Werner
Annie Heinrich
We present extended gas kinematic maps of the Perseus cluster based on a combination of five new XRISM/Resolve pointings observed in 2025 wi… (see more)th four performance verification datasets from 2024, totaling a net exposure of 745 ks. To date, Perseus remains the only cluster that has been extensively mapped out to ≃0.7 r 2500 by XRISM/Resolve, while simultaneously offering sufficient spatial resolution to resolve gaseous substructures driven by mergers and active galactic nucleus (AGN) feedback. Our observations cover multiple radial directions and a broad range of dynamical scales, enabling us to characterize the kinematic properties of the intracluster medium up to a scale of ∼500 kpc. In the measurements, we detected high-velocity dispersions (≃300km s −1 ) in the eastern region of the cluster that are spatially coincident with the extended X-ray surface brightness excess and correspond to a nonthermal pressure fraction of ≃7 − 13%. The velocity field outside the AGN-dominant region can be effectively described by a single, large-scale kinematic driver based on the velocity structure function, which statistically favors an energy injection scale of at least a few hundred kpc. The estimated turbulent dissipation energy is comparable to the gravitational potential energy released by a recent merger, implying a significant role of turbulent cascade in the merger energy conversion. In the bulk velocity field, we observed a dipole-like pattern along the east-west direction with an amplitude of ≃ ± 200 − 300 km s −1 , indicating rotational motions induced by the recent merger event. This feature constrains the viewing direction to ≃30° −50° relative to the normal of the merger plane. Our hydrodynamic simulations suggest that Perseus has experienced at least two energetic mergers since redshift z ∼ 1, the most recent of which is associated with the radio galaxy IC310, in agreement with recent SRG/eROSITA findings. This study showcases exciting scientific opportunities for future missions with high-resolution spectroscopic capabilities (e.g., HUBS, LEM, and NewAthena).
Multi-Agent AI Framework for Threat Mitigation and Resilience in Machine Learning Systems
Armstrong Foundjem
Lionel Nganyewou Tidjon
Leuson Da Silva
Machine learning (ML) increasingly underpins foundation models and autonomous pipelines in high-stakes domains such as finance, healthcare, … (see more)and national infrastructure, rendering these systems prime targets for sophisticated adversarial threats. Attackers now leverage advanced Tactics, Techniques, and Procedures (TTPs) spanning data poisoning, model extraction, prompt injection, automated jailbreaking, training data exfiltration, and—more recently—preference-guided black-box optimization that exploits models’ own comparative judgments to craft successful attacks iteratively. These emerging text-only, query-based methods demonstrate that larger and better-calibrated models can be paradoxically more vulnerable to introspection-driven jailbreaks and cross-modal manipulations. While traditional cybersecurity frameworks offer partial mitigation, they lack ML-specific threat modeling and fail to capture evolving attack vectors across foundation, multimodal, and federated settings. Objective: This research empirically characterizes modern ML security risks by identifying dominant attacker TTPs, exposed vulnerabilities, and lifecycle stages most frequently targeted in foundation-model, multimodal, and retrieval-augmented (RAG) pipelines. The study also assesses the scalability of current defenses against generative and introspection-based attacks, highlighting the need for adaptive, ML-aware security mechanisms. Methods: We conduct a large-scale empirical analysis of ML security, extracting 93 distinct threats from multiple sources: real-world incidents in MITRE ATLAS (26), the AI Incident Database (12), and peer-reviewed literature (55), supplemented by 854 ML repositories from GitHub and the Python Advisory database. A multi-agent reasoning system with enhanced Retrieval-Augmented Generation (RAG)—powered by ChatGPT-4o (temperature 0.4)—automatically extracts TTPs, vulnerabilities, and lifecycle stages from over 300 scientific articles using evidence-grounded reasoning. The resulting ontology-driven threat graph supports cross-source validation and lifecycle mapping. Results: Our analysis uncovers multiple unreported threats beyond current ATLAS coverage, including model-stealing attacks against commercial LLM APIs, data leakage through parameter memorization, and preference-guided query optimization enabling text-only jailbreaks and multimodal adversarial examples. Gradient-based obstinate attacks, MASTERKEY automated jailbreaking, federated learning poisoning, diffusion backdoor embedding, and preference-oriented optimization leakage emerge as dominant TTPs, disproportionately impacting pretraining and inference. Graph-based dependency analysis shows that specific ML libraries and model hubs exhibit dense vulnerability clusters lacking effective issue-tracking and patch-propagation mechanisms. Conclusion: This study underscores the urgent need for adaptive, ML-specific security frameworks that address introspection-based and preference-guided attacks alongside classical adversarial vectors. Robust dependency management, automated threat intelligence, and continuous monitoring are essential to mitigate supply-chain and inference-time risks throughout the ML lifecycle. By unifying empirical evidence from incidents, literature, and repositories, this research delivers a comprehensive threat landscape for next-generation AI systems and establishes a foundation for proactive, multi-agent security governance in the era of large-scale and generative AI.
Toward Faithful Explanations in Acoustic Anomaly Detection
Maab Elrashid
Yusuf Cem Sübakan
Mirco Ravanaelli
Rémi Georges
Interpretability is essential for user trust in real-world anomaly detection applications. However, deep learning models, despite their stro… (see more)ng performance, often lack transparency. In this work, we study the interpretability of autoencoder-based models for audio anomaly detection, by comparing a standard autoencoder (AE) with a mask autoencoder (MAE) in terms of detection performance and interpretability. We applied several attribution methods, including error maps, saliency maps, SmoothGrad, Integrated Gradients, GradSHAP, and Grad-CAM. Although MAE shows a slightly lower detection, it consistently provides more faithful and temporally precise explanations, suggesting a better alignment with true anomalies. To assess the relevance of the regions highlighted by the explanation method, we propose a perturbation-based faithfulness metric that replaces them with their reconstructions to simulate normal input. Our findings, based on experiments in a real industrial scenario, highlight the importance of incorporating interpretability into anomaly detection pipelines and show that masked training improves explanation quality without compromising performance.
Press Start to Charge: Videogaming the Online Centralized Charging Scheduling Problem
Alireza Ghahtarani
Martin Cousineau
Jorge E. Mendoza
Same/Other/All K‐Fold Cross‐Validation for Estimating Similarity of Patterns in Data Subsets
Gabrielle Thibault
C. S. Bodine
Paul Nelson Arellano
Alexander F. Shenkin
Olivia Jasmine Lindly
Discrete Feynman-Kac Correctors
Viktor Ohanesian
Artem Gazizov
Alán Aspuru-Guzik
Roberto Bondesan
Kirill Neklyudov
Discrete diffusion models have recently emerged as a promising alternative to the autoregressive approach for generating discrete sequences.… (see more) Sample generation via gradual denoising or demasking processes allows them to capture hierarchical non-sequential interdependencies in the data. These custom processes, however, do not assume a flexible control over the distribution of generated samples. We propose Discrete Feynman-Kac Correctors, a framework that allows for controlling the generated distribution of discrete masked diffusion models at inference time. We derive Sequential Monte Carlo (SMC) algorithms that, given a trained discrete diffusion model, control the temperature of the sampled distribution (i.e. perform annealing), sample from the product of marginals of several diffusion processes (e.g. differently conditioned processes), and sample from the product of the marginal with an external reward function, producing likely samples from the target distribution that also have high reward. Notably, our framework does not require any training of additional models or fine-tuning of the original model. We illustrate the utility of our framework in several applications including: efficient sampling from the annealed Boltzmann distribution of the Ising model, improving the performance of language models for code generation and amortized learning, as well as reward-tilted protein sequence generation.
Inference-time Physics Alignment of Video Generative Models with Latent World Models
Jianhao Yuan
Felix Friedrich
Nicolas Beltran-Velez
Melissa Hall
Xiaochuang Han
Adriana Romero
State-of-the-art video generative models produce promising visual content yet often violate basic physics principles, limiting their utility… (see more). While some attribute this deficiency to insufficient physics understanding from pre-training, we find that the shortfall in physics plausibility also stems from suboptimal inference strategies. We therefore introduce WMReward and treat improving physics plausibility of video generation as an inference-time alignment problem. In particular, we leverage the strong physics prior of a latent world model (here, VJEPA-2) as a reward to search and steer multiple candidate denoising trajectories, enabling scaling test-time compute for better generation performance. Empirically, our approach substantially improves physics plausibility across image-conditioned, multiframe-conditioned, and text-conditioned generation settings, with validation from human preference study. Notably, in the ICCV 2025 Perception Test PhysicsIQ Challenge, we achieve a final score of 62.64%, winning first place and outperforming the previous state of the art by 7.42%. Our work demonstrates the viability of using latent world models to improve physics plausibility of video generation, beyond this specific instantiation or parameterization.
Multilinguality as Sense Adaptation
Jan Christian Blaise Cruz
Alham Fikri Aji
Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis
Da Song
Yuheng Huang 0004
Boqi Chen
Tianshuo Cong
Randy Goebel
Lei Ma 0003
The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these syste… (see more)ms must strictly adhere to regulatory standards beyond simple functional correctness. However, existing benchmarks often overlook implicit regulatory compliance, thus failing to evaluate whether LLMs can autonomously enforce mandatory safety constraints. To fill this gap, we introduce LogiSafetyGen, a framework that converts unstructured regulations into Linear Temporal Logic oracles and employs logic-guided fuzzing to synthesize valid, safety-critical traces. Building on this framework, we construct LogiSafetyBench, a benchmark comprising 240 human-verified tasks that require LLMs to generate Python programs that satisfy both functional objectives and latent compliance rules. Evaluations of 13 state-of-the-art (SOTA) LLMs reveal that larger models, despite achieving better functional correctness, frequently prioritize task completion over safety, which results in non-compliant behavior.