Publications

Dynamic Abstractions: Building the Next Generation of Cognitive Tools and Interfaces
Sangho Suh
Hai Dang
Ryan Yen
Josh M. Pollock
Rubaiat Habib Kazi
Hariharan Subramonyam
Jingyi Li
Nazmus Saquib
Arvind Satyanarayan
Effective Protein-Protein Interaction Exploration with PPIretrieval
Connor W. Coley
Shuangjia Zheng
EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics
Yang Liu
Odin Zhang
Kevin K Yang
Shuangjia Zheng
Molphenix: A Multimodal Foundation Model for PhenoMolecular Retrieval
Philip Fradkin
Puria Azadi Moghadam
Karush Suri
Maciej Sypetkowski
Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellu… (voir plus)lar morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1
Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity
Dhananjay Bhaskar
Yanlei Zhang
Jessica Moore
Feng Gao
Bastian Rieck
Firas Khasawneh
Elizabeth Munch
Valentina Greco
J. Adam Noah
Helen Pushkarskaya
Christopher Pittenger
Neural signals are high-dimensional, noisy, and dynamic, making it challenging to extract interpretable features linked to behavior or disea… (voir plus)se. We introduce Neurospectrum , a framework that encodes neural activity as latent trajectories shaped by spatial and temporal structure. At each timepoint, signals are represented on a graph capturing spatial relationships, with a learnable attention mechanism highlighting important regions. These are embedded using graph wavelets and passed through a manifold-regularized autoencoder that preserves temporal geometry. The resulting latent trajectory is summarized using a principled set of descriptors - including curvature, path signatures, persistent homology, and recurrent networks -that capture multiscale geometric, topological, and dynamical features. These features drive downstream prediction in a modular, interpretable, and end-to-end trainable framework. We evaluate Neurospectrum on simulated and experimental datasets. It tracks phase synchronization in Kuramoto simulations, reconstructs visual stimuli from calcium imaging, and identifies biomarkers of obsessive-compulsive disorder in fMRI. Across tasks, Neurospectrum uncovers meaningful neural dynamics and outperforms traditional analysis methods.
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
David Williams-King
Adam Oberman
As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (voir plus)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.
Epistemic Integrity in Large Language Models
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statem… (voir plus)ents with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.
Quantifying Likeness: A Simple Machine Learning Approach to Identifying Copyright Infringement in (AI-Generated) Artwork
Michaela Drouillard
Ryan Spencer
Nikée Nantambu-Allen
This study proposes an approach aligned with the legal process to quantify copyright infringement, via stylistic similarity, in AI-generated… (voir plus) artwork. In contrast to typical work in this field, and more in line with a realistic legal setting, our approach quantifies the similarity of a set of potentially-infringing “defendant” artworks to a set of copyrighted “plaintiff" artworks. We frame this as an image classification task, using a fine-tuned ResNet trained on small, customized datasets relevant to each use case. Softmax-normalized probabilities from the model serve as similarity scores for potentially infringing “defendant” artworks, and saliency maps and features visualizations complement the score by highlighting key features and allowing for interpretability. This straightforward image classification approach can be accomplished in a quite simple, low-resource setting, making it accessible for real-world applications. We present a case study using Mickey Mouse as the plaintiff, performing thorough hyperparameter tuning and robustness analysis. Our experiments include optimizing batch size, weight decay, and learning rate, as well as exploring the impact of additional distractor classes. We employ data augmentation, cross-validation, and a linear decay learning rate scheduler to improve model performance, along with conducting scaling experiments with different types of distractor classes. The aims of this work are to illustrate the potential of the approach, and identify settings which generalize well, such that it is as "plug and play" as possible for users to apply with their own plaintiff sets of artworks.
Rejecting Hallucinated State Targets during Planning
Mingde “Harry” Zhao
Mingde “Harry” Zhao
Romain Laroche
In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "ta… (voir plus)rgets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent's interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.
Simulation System Towards Solving Societal-Scale Manipulation
Austin Welch
Gayatri K
Dan Zhao
Hao Yu
Tom Gibbs
Ethan Kosak-Hine
Busra Tugce Gurbuz
The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-w… (voir plus)orld settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to address this. We elaborate upon the Concordia framework that simulates offline, `real life' activity by adding online interactions to the simulation through social media with the integration of a Mastodon server. Through a variety of means we then improve simulation efficiency and information flow, and add a set of measurement tools, particularly longitudinal surveys of the agents' political positions. We demonstrate the simulator with a tailored example of how partisan manipulation of agents can affect election results.
The Structural Safety Generalization Problem
Tom Gibbs
Julius Broomfield
George Ingebretsen
Ethan Kosak-Hine
Tia Nasir
Jason Zhang
Reihaneh Iranmanesh
Sara Pieri
It is widely known that AI is vulnerable to adversarial examples, from pixel perturbations to jailbreaks. We propose that there is a key, ea… (voir plus)sier class of problems that is also still unsolved: failures of safety to generalize over structure, despite semantic equivalence. We demonstrate this vulnerability by showing how recent AI systems are differently vulnerable both to multi-turn and multi-image attacks, compared to their single-turn and single-image counterparts with equivalent meaning. We suggest this is the same class of vulnerability as that found in yet unconnected threads of the literature: vulnerabilities to low-resource languages and indefensibility of strongly superhuman Go AIs to cyclic attacks. When viewed together, these reveal a common picture: models that are not only vulnerable to attacks, but vulnerable to attacks with near identical meaning in their benign and harmful components both, and only different in structure. In contrast to attacks with identical benign input (e.g., pictures that look like cats) but unknown semanticity of the harmful component (e.g., diverse noise that is all unintelligible to humans), these represent a class of attacks where semantic understanding and defense against one version should guarantee defense against others—yet current AI safety measures do not. This vulnerability represents a necessary but not sufficient condition towards defending against attacks whose harmful component has arbitrary semanticity. Consequently, by building on the data and approaches we highlight, we frame an intermediate problem for AI safety to solve, that represents a critical checkpoint towards safe AI while being far more tractable than trying to solve it directly and universally.
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based methods
Teodora Baluta
Pascal Lamblin
Daniel Tarlow
Fabian Pedregosa
Machine unlearning aims to solve the problem of removing the influence of selected training examples from a learned model. Despite the incre… (voir plus)asing attention to this problem, it remains an open research question how to evaluate unlearning in large language models (LLMs), and what are the critical properties of the data to be unlearned that affect the quality and efficiency of unlearning. This work formalizes a metric to evaluate unlearning quality in generative models, and uses it to assess the trade-offs between unlearning quality and performance. We demonstrate that unlearning out-of-distribution examples requires more unlearning steps but overall presents a better trade-off overall. For in-distribution examples, however, we observe a rapid decay in performance as unlearning progresses. We further evaluate how example's memorization and difficulty affect unlearning under a classical gradient ascent-based approach.