Publications

A Hardware‐in‐Loop Digital Twin Approach for Intelligent Optimization of Municipal Solid Waste Incineration
Wen Yu
JunFei Qiao
Neural FIM: Bridging Statistical Manifolds and Generative Modeling through Fisher Geometry
Yanlei Zhang
Edward De Brouwer
Danqi Liao
Oluwadamilola Fasina
Ricky T. Q. Chen
Maximilian Nickel
Ian Adelstein
While data diffusion-based embeddings are widely used in unsupervised learning to reveal the intrinsic geometry of data, they are fundamenta… (see more)lly constrained by their discrete nature and inability to generalize beyond training points. This limitation ob
Rapid De Novo Antibody Design with GeoFlow-V3
BioGeometry Team
Recent years have witnessed striking advances in miniprotein design, yet de novo antibody discovery remains challenging, marked by low bindi… (see more)ng rates and the need for extensive, labor-intensive experimental screening of millions of candidates. This technical report introduces GeoFlow-V3, a unified atomic generative model for structure prediction and protein design. GeoFlow-V3 delivers improved accuracy on antibody-antigen complex structure prediction relative to our previous version, and its performance is further enhanced when experimental constraints or prior knowledge are provided, enabling precise control over both folding and design. The model also demonstrates reliable ability to discriminate binders from non-binders based on its confidence scores. Leveraging this capability, we build a GeoFlow-V3 in silico pipeline to design no more than 50 nanobodies per therapeutically relevant target de novo, completing a single round of wet-lab characterization in under three weeks. GeoFlow-V3 identifies at least one binder for 8 tested epitopes and achieves an average hit rate of 15.5%, representing a two-orders-of-magnitude improvement over prior computational pipelines. These results position GeoFlow-V3 as an appealing platform for rapid, AI-driven therapeutic antibody discovery, significantly reducing experimental screening demands and offering a powerful avenue to tackle previously undruggable targets. A demo of GeoFlow-V3 can be accessed via prot.design for non-commercial use.
Improved Localized Machine Unlearning Through the Lens of Memorization
Reihaneh Torkzadehmahani
Reza Nasirigerdeh
Georgios Kaissis
Daniel Rueckert
Eleni Triantafillou
Machine unlearning refers to removing the influence of a specified subset of training data from a machine learning model, efficiently, after… (see more) it has already been trained. This is important for key applications, including making the model more accurate by removing outdated, mislabeled, or poisoned data. In this work, we study localized unlearning, where the unlearning algorithm operates on a (small) identified subset of parameters. Drawing inspiration from the memorization literature, we propose an improved localization strategy that yields strong results when paired with existing unlearning algorithms. We also propose a new unlearning algorithm, Deletion by Example Localization (DEL), that resets the parameters deemed-to-be most critical according to our localization strategy, and then finetunes them. Our extensive experiments on different datasets, forget sets and metrics reveal that DEL sets a new state-of-the-art for unlearning metrics, against both localized and full-parameter methods, while modifying a small subset of parameters, and outperforms the state-of-the-art localized unlearning in terms of test accuracy too.
The spatially-resolved effect of mergers on the stellar mass assembly of MaNGA galaxies
Eirini Angeloudi
Marc Huertas-Company
Jesús Falcón-Barroso
Alina Boecker
Understanding the origin of stars within a galaxy - whether formed in-situ or accreted from other galaxies (ex-situ) - is key to constrainin… (see more)g its evolution. Spatially resolving these components provides crucial insights into a galaxy's mass assembly history. We aim to predict the spatial distribution of ex-situ stellar mass fraction in MaNGA galaxies, and to identify distinct assembly histories based on the radial gradients of these predictions in the central regions. We employ a diffusion model trained on mock MaNGA analogs (MaNGIA), derived from the TNG50 cosmological simulation. The model learns to predict the posterior distribution of resolved ex-situ stellar mass fraction maps, conditioned on stellar mass density, velocity, and velocity dispersion gradient maps. After validating the model on an unseen test set from MaNGIA, we apply it to MaNGA galaxies to infer the spatially-resolved distribution of their ex-situ stellar mass fractions - i.e. the fraction of stellar mass in each spaxel originating from mergers. We identify four broad categories of ex-situ mass distributions: flat gradient, in-situ dominated; flat gradient, ex-situ dominated; positive gradient; and negative gradient. The vast majority of MaNGA galaxies fall in the first category - flat gradients with low ex-situ fractions - confirming that in-situ star formation is the main assembly driver for low- to intermediate-mass galaxies. At high stellar masses, the ex-situ maps are more diverse, highlighting the key role of mergers in building the most massive systems. Ex-situ mass distributions correlate with morphology, star-formation activity, stellar kinematics, and environment, indicating that accretion history is a primary factor shaping massive galaxies. Finally, by tracing their assembly histories in TNG50, we link each class to distinct merger scenarios, ranging from secular evolution to merger-dominated growth.
Toward the Decarbonization of Maritime Supply Chains: A Ship Emissions Prediction Framework
Abdelhak El Aissi
Ismail Bourzak
Abdelaziz Berrado
Maritime transport is a vital component of international trade, yet the industry contributes substantially to greenhouse gas (GHG) emissions… (see more), with carbon dioxide
High-Dimensional Privacy-Utility Dynamics of Noisy Stochastic Gradient Descent on Least Squares
Shurong Lin
Eric D. Kolaczyk
Adam Smith
Perpetua: Multi-Hypothesis Persistence Modeling for Semi-Static Environments
Miguel Saavedra-Ruiz
Samer B. Nashed
Many robotic systems require extended deployments in complex, dynamic environments. In such deployments, parts of the environment may change… (see more) between subsequent robot observations. Most robotic mapping or environment modeling algorithms are incapable of representing dynamic features in a way that enables predicting their future state. Instead, they opt to filter certain state observations, either by removing them or some form of weighted averaging. This paper introduces Perpetua, a method for modeling the dynamics of semi-static features. Perpetua is able to: incorporate prior knowledge about the dynamics of the feature if it exists, track multiple hypotheses, and adapt over time to enable predicting of future feature states. Specifically, we chain together mixtures of"persistence"and"emergence"filters to model the probability that features will disappear or reappear in a formal Bayesian framework. The approach is an efficient, scalable, general, and robust method for estimating the states of features in an environment, both in the present as well as at arbitrary future times. Through experiments on simulated and real-world data, we find that Perpetua yields better accuracy than similar approaches while also being online adaptable and robust to missing observations.
Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models
Continuously Learning Bug Locations
Paulina Stevia Nouwou Mindom
Leuson Da Silva
Amin Nikanjam
Automatically locating buggy changesets associated with bug reports is crucial in the software development process. Deep Learning (DL)-based… (see more) techniques show promising results by leveraging structural information from the code and learning links between changesets and bug reports. However, since source code associated with changesets evolves, the performance of such models tends to degrade over time due to concept drift. Aiming to address this challenge, in this paper, we evaluate the potential of using Continual Learning (CL) techniques in multiple sub-tasks setting for bug localization (each of which operates on either stationary or non-stationary data), comparing it against a bug localization technique that leverages the BERT model, a deep reinforcement learning-based technique that leverages the A2C algorithm, and a DL-based function-level interaction model for semantic bug localization. Additionally, we enhanced the CL techniques by using logistic regression to identify and integrate the most significant bug-inducing factors. Our empirical evaluation across seven widely used software projects shows that CL techniques perform better than DL-based techniques by up to 61% in terms of Mean Reciprocal Rank (MRR), 44% in terms of Mean Average Precision (MAP), 83% in terms of top@1, 56% in terms of top@5, and 66% in terms of top@10 metrics in non-stationary setting. Further, we show that the CL techniques we studied are effective at localizing changesets relevant to a bug report while being able to mitigate catastrophic forgetting across the studied tasks and require up to 5x less computational effort during training. Our findings demonstrate the potential of adopting CL for bug localization in non-stationary settings, and we hope it helps to improve bug localization activities in Software Engineering using CL techniques.
Hierarchical Differentiable Fluid Simulation
Xiangyu Kong
Arnaud Schoentgen
Damien Rioux‐Lavoie
Paul G. Kry
Differentiable simulation is an emerging field that offers a powerful and flexible route to fluid control. In grid‐based settings, high me… (see more)mory consumption is a long‐standing bottleneck that constrains optimization resolution. We introduce a two‐step algorithm that significantly reduces memory usage: our method first optimizes for bulk forces at reduced resolution, then refines local details over sub‐domains while maintaining differentiability. In trading runtime for memory, it enables optimization at previously unattainable resolutions. We validate its effectiveness and memory savings on a series of fluid control problems.
Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs
Mauricio Barba da Costa
Fabian Zaiser
Katherine M. Collins
Romir Patel
Timothy J. O'Donnell
Alexander K. Lew
Joshua B. Tenenbaum
Vikash Mansinghka
Cameron Freer