Publications

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Smita Krishnaswamy

Guy Wolf

Bastian Rieck

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datas… (see more)ets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

2025-02-06

ArXiv (preprint)

arxiv.org

Principal Curvatures Estimation with Applications to Single Cell Data

Yanlei Zhang

Lydia Mezrag

Xingzhi Sun

Charles Xu

Kincaid MacDonald

Dhananjay Bhaskar

Smita Krishnaswamy

Guy Wolf

Bastian Rieck

2025-02-06

ArXiv (preprint)

doi.org

arxiv.org

Tackling the Problem of Distributional Shifts: Correcting Misspecified, High-Dimensional Data-Driven Priors for Inverse Problems

Gabriel Missael Barco

Alexandre Adam

Connor Stone

Yashar Hezaveh

Laurence Perreault-Levasseur

Bayesian inference for inverse problems hinges critically on the choice of priors. In the absence of specific prior information, population-… (see more)level distributions can serve as effective priors for parameters of interest. With the advent of machine learning, the use of data-driven population-level distributions (encoded, e.g., in a trained deep neural network) as priors is emerging as an appealing alternative to simple parametric priors in a variety of inverse problems. However, in many astrophysical applications, it is often difficult or even impossible to acquire independent and identically distributed samples from the underlying data-generating process of interest to train these models. In these cases, corrupted data or a surrogate, e.g. a simulator, is often used to produce training samples, meaning that there is a risk of obtaining misspecified priors. This, in turn, can bias the inferred posteriors in ways that are difficult to quantify, which limits the potential applicability of these models in real-world scenarios. In this work, we propose addressing this issue by iteratively updating the population-level distributions by retraining the model with posterior samples from different sets of observations and showcase the potential of this method on the problem of background image reconstruction in strong gravitational lensing when score-based models are used as data-driven priors. We show that starting from a misspecified prior distribution, the updated distribution becomes progressively closer to the underlying population-level distribution, and the resulting posterior samples exhibit reduced bias after several updates.

2025-02-06

The Astrophysical Journal (published)

doi.org

arxiv.org

Alignment of auditory artificial networks with massive individual fMRI brain data leads to generalizable improvements in brain encoding and downstream tasks

Maelle Freteault

Loic Tetrel

Maximilien Le Clei

Lune Bellec

Nicolas Farrugia

Artificial neural networks trained in the field of artificial intelligence (AI) have emerged as key tools to model brain processes, sparking… (see more) the idea of aligning network representations with brain dynamics to enhance performance on AI tasks. While this concept has gained support in the visual domain, we investigate here the feasibility of creating auditory artificial neural models directly aligned with individual brain activity. This objective raises major computational challenges, as models have to be trained directly with brain data, which is typically collected at a much smaller scale than data used to train AI models. We aimed to answer two key questions: (1) Can brain alignment of auditory models lead to improved brain encoding for novel, previously unseen stimuli? (2) Can brain alignment lead to generalisable representations of auditory signals that are useful for solving a variety of complex auditory tasks? To answer these questions, we relied on two massive datasets: a deep phenotyping dataset from the Courtois neuronal modelling project, where six subjects watched four seasons (36 hours) of the Friends TV series in functional magnetic resonance imaging and the HEAR benchmark, a large battery of downstream auditory tasks. We fine-tuned SoundNet, a small pretrained convolutional neural network with ∼2.5M parameters. Aligning SoundNet with brain data from three seasons of Friends led to substantial improvement in brain encoding in the fourth season, extending beyond auditory and visual cortices. We also observed consistent performance gains on the HEAR benchmark, particularly for tasks with limited training data, where brain-aligned models performed comparably to the best-performing models regardless of size. We finally compared individual and group models, finding that individual models often matched or outperformed group models in both brain encoding and downstream task performance, highlighting the data efficiency of fine-tuning with individual brain data. Our results demonstrate the feasibility of aligning artificial neural network representations with individual brain activity during auditory processing, and suggest that this alignment is particularly beneficial for tasks with limited training data. Future research is needed to establish whether larger models can achieve even better performance and whether the observed gains extend to other tasks, particularly in the context of few shot learning.

2025-02-04

bioRxiv (preprint)

doi.org

PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling

Avery Ma

Yangchen Pan

Amir-massoud Farahmand

Many-shot jailbreaking circumvents the safety alignment of large language models by exploiting their ability to process long input sequences… (see more). To achieve this, the malicious target prompt is prefixed with hundreds of fabricated conversational turns between the user and the model. These fabricated exchanges are randomly sampled from a pool of malicious questions and responses, making it appear as though the model has already complied with harmful instructions. In this paper, we present PANDAS: a hybrid technique that improves many-shot jailbreaking by modifying these fabricated dialogues with positive affirmations, negative demonstrations, and an optimized adaptive sampling method tailored to the target prompt's topic. Extensive experiments on AdvBench and HarmBench, using state-of-the-art LLMs, demonstrate that PANDAS significantly outperforms baseline methods in long-context scenarios. Through an attention analysis, we provide insights on how long-context vulnerabilities are exploited and show how PANDAS further improves upon many-shot jailbreaking.

2025-02-04

ArXiv (preprint)

arxiv.org

Bridging Causality, Individual Fairness, and Adversarial Robustness in the Absence of Structural Causal Model

Ahmad Reza Ehyaei

Golnoosh Farnadi

Samira Samadi

Despite the essential need for comprehensive considerations in responsible AI, factors such as robustness, fairness, and causality are often… (see more) studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data and were unable to reflect counterfactual proximity. To address this, our paper introduces a \emph{causal fair metric} formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the applications of the causal fair metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.

2025-02-03

TMLR (accepted)

openreview.net

Adapting Perioperative Care for Neurodivergent Children - A Scoping Review

Spandana Veeravalli

Maia Michaud

Judy Colton

Brenda Bourdeau

Samantha Sacks

Lindsay Hales

Elena Guadagno

Dan Poenaru

2025-02-01

Journal of Pediatric Surgery (published)

doi.org

AIoT Smart Home via Autonomous LLM Agents

Dmitriy Rivkin

Francois Hogan

Amal Feriani

Abhisek Konar

Adam Sigal

Xue (Steve) Liu

Gregory Dudek

The common-sense reasoning abilities and vast general knowledge of large language models (LLMs) make them a natural fit for interpreting use… (see more)r requests in a smart home assistant context. LLMs, however, lack specific knowledge about the user and their home, which limits their potential impact. Smart home agent with grounded execution (SAGE), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM’s capabilities to support some of the most critical requirements for a smart home assistant. These include: flexible and scalable user preference management (“Is my team playing tonight?”), access to any smart device’s full functionality without device-specific code via API reading (“Turn down the screen brightness on my dryer”), persistent device state monitoring (“Remind me to throw out the milk when I open the fridge”), natural device references using only a photo of the room (“Turn on the lamp on the dresser”), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 76% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).

2025-02-01

IEEE Internet of Things Journal (published)

doi.org

AIoT Smart Home via Autonomous LLM Agents

Dmitriy Rivkin

Francois Hogan

Amal Feriani

Abhisek Konar

Adam Sigal

Xue (Steve) Liu

Gregory Dudek

2025-02-01