Publications

MATES: a deep learning-based model for locus-specific quantification of transposable elements in single cell
Ruohan Wang
Yumin Zheng
Zijian Zhang
Kailu Song
Erxi Wu
Xiaopeng Zhu
Tao P. Wu
Transposable elements (TEs) are crucial for genetic diversity and gene regulation. Current single-cell quantification methods often align mu… (see more)lti-mapping reads to either ‘best-mapped’ or ‘random-mapped’ locations and categorize them at the subfamily levels, overlooking the biological necessity for accurate, locus-specific TE quantification. Moreover, these existing methods are primarily designed for and focused on transcriptomics data, which restricts their adaptability to single-cell data of other modalities. To address these challenges, here we introduce MATES, a deep-learning approach that accurately allocates multi-mapping reads to specific loci of TEs, utilizing context from adjacent read alignments flanking the TE locus. When applied to diverse single-cell omics datasets, MATES shows improved performance over existing methods, enhancing the accuracy of TE quantification and aiding in the identification of marker TEs for identified cell populations. This development facilitates the exploration of single-cell heterogeneity and gene regulation through the lens of TEs, offering an effective transposon quantification tool for the single-cell genomics community.
Physical Simulation for Multi-agent Multi-machine Tending
Abdalwhab Abdalwhab
David St-Onge
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
Tingchen Fu
Mrinank Sharma
Philip Torr
Yonadav G Shavit
Shay B. Cohen
David M. Krueger
Fazl Barez
Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To addre… (see more)ss this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to data poisoning during preference learning. Data poisoning attacks can manipulate large language model responses to include hidden malicious content or biases, potentially causing the model to generate harmful or unintended outputs while appearing to function normally. We deploy two distinct attack types across eight realistic scenarios, assessing 21 widely-used models. Our findings reveal concerning trends: (1) Scaling up parameter size does not inherently enhance resilience against poisoning attacks; (2) There exists a log-linear relationship between the effects of the attack and the data poison ratio; (3) The effect of data poisoning can generalize to extrapolated triggers that are not included in the poisoned data. These results expose weaknesses in current preference learning techniques, highlighting the urgent need for more robust defenses against malicious models and data manipulation.
"I Am the One and Only, Your Cyber BFF": Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI
Myra Cheng
Alicia DeVrio
Lisa Egede
A.R. Olteanu
Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that ar… (see more)e perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. In this perspective, we argue that we cannot thoroughly map the social impacts of generative AI without mapping the social impacts of anthropomorphic AI, and outline a call to action.
AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents
Maxime Gasse
Alexandre Lacoste
Massimo Caccia
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Yash More
Matthew D Riemer
Pin-Yu Chen
Payel Das
A. Chandar
Chandar Research Lab
Mila - Québec
U. Montŕeal
AI Institute
Ibm Research
Polytechnique Montréal
There is a growing interest in training domain-expert LLMs that excel in specific technical fields compared to their general-purpose instruc… (see more)tion-tuned counterparts. However, these expert models often experience a loss in their safety abilities in the process, making them capable of generating harmful content. As a solution, we introduce an efficient and effective merging-based alignment method called \textsc{MergeAlign} that interpolates the domain and alignment vectors, creating safer domain-specific models while preserving their utility. We apply \textsc{MergeAlign} on Llama3 variants that are experts in medicine and finance, obtaining substantial alignment improvements with minimal to no degradation on domain-specific benchmarks. We study the impact of model merging through model similarity metrics and contributions of individual models being merged. We hope our findings open new research avenues and inspire more efficient development of safe expert LLMs.
Consistency-diversity-realism Pareto fronts of conditional image generative models
Pietro Astolfi
Marlene Careil
Melissa Hall
Matthew J. Muckley
Jakob Verbeek
Adriana Romero
Building world models that accurately and comprehensively represent the real world is the utmost aspiration for conditional image generative… (see more) models as it would enable their use as world simulators. For these models to be successful world models, they should not only excel at image quality and prompt-image consistency but also ensure high representation diversity. However, current research in generative models mostly focuses on creative applications that are predominantly concerned with human preferences of image quality and aesthetics. We note that generative models have inference time mechanisms - or knobs - that allow the control of generation consistency, quality, and diversity. In this paper, we use state-of-the-art text-to-image and image-and-text-to-image models and their knobs to draw consistency-diversity-realism Pareto fronts that provide a holistic view on consistency-diversity-realism multi-objective. Our experiments suggest that realism and consistency can both be improved simultaneously; however there exists a clear tradeoff between realism/consistency and diversity. By looking at Pareto optimal points, we note that earlier models are better at representation diversity and worse in consistency/realism, and more recent models excel in consistency/realism while decreasing significantly the representation diversity. By computing Pareto fronts on a geodiverse dataset, we find that the first version of latent diffusion models tends to perform better than more recent models in all axes of evaluation, and there exist pronounced consistency-diversity-realism disparities between geographical regions. Overall, our analysis clearly shows that there is no best model and the choice of model should be determined by the downstream application. With this analysis, we invite the research community to consider Pareto fronts as an analytical tool to measure progress towards world models.
Controlling Forgetting with Test-Time Data in Continual Learning
Foundational vision-language models excel in various tasks but require updates as new tasks or domains emerge. Current Continual Learning (C… (see more)L) methods, which focus on supervised training, often suffer from significant forgetting, performing worse than the original models in zero-shot scenarios. This work proposes leveraging test-time, unsupervised data in a self-supervised manner to refresh the model’s memory of previously learned tasks, minimizing forgetting without additional labeling. By introducing a student-teacher framework with gradient-based sparse parameter updates, the approach enhances performance on prior tasks and reduces reliance on offline memory buffers, effectively improving continual learning outcomes.
Controlling Multimodal LLMs via Reward-guided Decoding
Deliberate Practice with Synthetic Data
Pietro Astolfi
Melissa Hall
Jakob Verbeek
Adriana Romero
Enhancing Multi-Agent Multi-Modal Collaboration with Fine-Grained Reward Modeling
Multi-Modal Large Language Models (MLLMs) have significantly advanced multi-modal reasoning but still struggle with compositional reasoning … (see more)tasks. Multi-agent collaboration provides a promising solution by leveraging the distinct capabilities of different agents. Specifically, a decomposer agent to handle task breakdown and an answerer agent to generate responses. While there have been efforts to adaptively decompose tasks based on the answerer agent's capabilities, such as using in-context learning, these methods often prove insufficient for fully effective decomposition. We address this issue by enhancing collaboration through fine-grained reward modeling, where each generated sub-question is assigned a specialized reward without requiring extra annotation or tuning of a reward model. Our proposed method dynamically optimizes the decomposition process, enabling better alignment between agents. Experimental results on four vision-language tasks demonstrate consistent improvements, with a 5.5\% absolute increase in mean performance over traditional approaches. These findings highlight the efficacy of fine-grained reward modeling for enhancing multi-agent, multi-modal collaboration.
Evaluating Interventional Reasoning Capabilities of Large Language Models
Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (see more)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. We evaluate six LLMs on the benchmarks, finding that GPT models show promising accuracy at predicting the intervention effects.