Publications

Too Big to Fool: Resisting Deception in Language Models
Mohammad Reza Samsami
Juan A. Rodriguez
A. Chandar
Maxime Gasse
Large language models must balance their weight-encoded knowledge with in-context information from prompts to generate accurate responses. T… (voir plus)his paper investigates this interplay by analyzing how models of varying capacities within the same family handle intentionally misleading in-context information. Our experiments demonstrate that larger models exhibit higher resilience to deceptive prompts, showcasing an advanced ability to interpret and integrate prompt information with their internal knowledge. Furthermore, we find that larger models outperform smaller ones in following legitimate instructions, indicating that their resilience is not due to disregarding in-context information. We also show that this phenomenon is likely not a result of memorization but stems from the models' ability to better leverage implicit task-relevant information from the prompt alongside their internally stored knowledge.
The Software Documentor Mindset
Deeksha M. Arya
Jin L.C. Guo
Martin P. Robillard
Software technologies are used by programmers with diverse backgrounds. To fulfill programmers' need for information, enthusiasts contribute… (voir plus) numerous learning resources that vary in style and content, which act as documentation for the corresponding technology. We interviewed 26 volunteer documentation contributors, i.e. documentors, to understand why and how they create such documentation. From a qualitative analysis of our interviews, we identified a total of sixteen considerations that documentors have during the documentation contribution process, along three dimensions, namely motivations, topic selection techniques, and styling objectives. We grouped related considerations based on common underlying themes, to elicit five software documentor mindsets that occur during documentation contribution activities. We propose a structure of mindsets, and their associated considerations across the three dimensions, as a framework for reasoning about the documentation contribution process. This framework can inform information seeking as well as documentation creation tools about the context in which documentation was contributed.
Cell ontology guided transcriptome foundation model
Transcriptome foundation models (TFMs) hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by… (voir plus) self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present **s**ingle **c**ell, **Cell**-**o**ntology guided TFM (scCello). We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses. Source code and model weights are available at https://github.com/DeepGraphLearning/scCello.
Harnessing Pre-trained Generalist Agents for Software Engineering Tasks
Paulina Stevia Nouwou Mindom
Amin Nikanjam
Nowadays, we are witnessing an increasing adoption of Artificial Intelligence (AI) to develop techniques aimed at improving the reliability,… (voir plus) effectiveness, and overall quality of software systems. Deep reinforcement learning (DRL) has recently been successfully used for automation in complex tasks such as game testing and solving the job-shop scheduling problem. However, these specialized DRL agents, trained from scratch on specific tasks, suffer from a lack of generalizability to other tasks and they need substantial time to be developed and re-trained effectively. Recently, DRL researchers have begun to develop generalist agents, able to learn a policy from various environments and capable of achieving performances similar to or better than specialist agents in new tasks. In the Natural Language Processing or Computer Vision domain, these generalist agents are showing promising adaptation capabilities to never-before-seen tasks after a light fine-tuning phase and achieving high performance. This paper investigates the potential of generalist agents for solving SE tasks. Specifically, we conduct an empirical study aimed at assessing the performance of two generalist agents on two important SE tasks: the detection of bugs in games (for two games) and the minimization of makespan in a scheduling task, to solve the job-shop scheduling problem (for two instances). Our results show that the generalist agents outperform the specialist agents with very little effort for fine-tuning, achieving a 20% reduction of the makespan over specialized agent performance on task-based scheduling. In the context of game testing, some generalist agent configurations detect 85% more bugs than the specialist agents. Building on our analysis, we provide recommendations for researchers and practitioners looking to select generalist agents for SE tasks, to ensure that they perform effectively.
TapeAgents: a Holistic Framework for Agent Development and Optimization
Nicolas Gontier
Ehsan Kamalloo
Rafael Pardinas
Torsten Scholak
Oleh Shliazhko
Jordan Prince Tremblay
Soham Parikh
Mitul Tiwari
Quaizar Vohra
We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the … (voir plus)session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
Longitudinal reproducibility of brain and spinal cord quantitative MRI biomarkers
Mathieu Boudreau
Agah Karakuzu
Arnaud Boré
Basile Pinsard
Kiril Zelenkovski
Eva Alonso-Ortiz
Julie Boyle
Quantitative MRI (qMRI) promises better specificity, accuracy, repeatability, and reproducibility relative to its clinically-used qualitativ… (voir plus)e MRI counterpart. Longitudinal reproducibility is particularly important in qMRI. The goal is to reliably quantify tissue properties that may be assessed in longitudinal clinical studies throughout disease progression or during treatment. In this work, we present the initial data release of the quantitative MRI portion of the Courtois project on neural modelling (CNeuroMod), where the brain and cervical spinal cord of six participants were scanned at regular intervals over the course of several years. This first release includes 3 years of data collection and up to 10 sessions per participant using quantitative MRI imaging protocols (T1, magnetization transfer (MTR, MTsat), and diffusion). In the brain, T1MP2RAGE, fractional anisotropy (FA), mean diffusivity (MD), and radial diffusivity (RD) all exhibited high longitudinal reproducibility (intraclass correlation coefficient – ICC ≃ 1 and within-subject coefficient of variations – wCV 1%). The spinal cord cross-sectional area (CSA) computed using T2w images and T1MTsatexhibited the best longitudinal reproducibility (ICC ≃ 1 and 0.7 respectively, and wCV 2.4% and 6.9%). Results from this work show the level of longitudinal reproducibility that can be expected from qMRI protocols in the brain and spinal cord in the absence of hardware and software upgrades, and could help in the design of future longitudinal clinical studies.
BootsTAP: Bootstrapped Training for Tracking-Any-Point
Carl Doersch
Yi Yang
Dilara Gokay
Pauline Luc
Skanda Koppula
Ankush Gupta
Joseph Heyward
Ignacio Rocco
João Carreira
Andrew Zisserman
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform… (voir plus) in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/
The Responsible Foundation Model Development Cheatsheet: A Review of Tools&Resources
Shayne Longpre
Stella Biderman
Alon Albalak
Hailey Schoelkopf
Daniel McDuff
Sayash Kapoor
Kevin Klyman
Kyle Lo
Gabriel Ilharco
Nay San
Maribeth Rauh
Aviya Skowron
Bertie Vidgen
Laura Weidinger
Arvind Narayanan
Victor Sanh
Percy Liang
Rishi Bommasani
Yacine Jernite
Luca Soldaini
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Shivalika Singh
Angelika Romanou
Cl'ementine Fourrier
Jian Gang Ngui
Daniel Vila-Suero
Peerat Limkonchotiwat
Kelly Marchisio
Wei Qi Leong
Yosephine Susanto
Raymond Ng
Shayne Longpre
Wei-Yin Ko
Madeline Smith
Antoine Bosselut
Alice Oh
André F. T. Martins
Leshem Choshen
Daphne Ippolito
Enzo Ferrante … (voir 3 de plus)
Marzieh Fadaee
Beyza Ermis
Sara Hooker
Beta cells are essential drivers of pancreatic ductal adenocarcinoma development
Cathy C. Garcia
Aarthi Venkat
Daniel C. McQuaid
Sherry Agabiti
Rebecca L. Cardone
Rebecca Starble
Akin Sogunro
Jeremy B. Jacox
Christian F. Ruiz
Richard G. Kibbey
Mandar Deepak Muzumdar
Pancreatic endocrine-exocrine crosstalk plays a key role in normal physiology and disease. For instance, endocrine islet beta (β) cell secr… (voir plus)etion of insulin or cholecystokinin (CCK) promotes progression of pancreatic adenocarcinoma (PDAC), an exocrine cell-derived tumor. However, the cellular and molecular mechanisms that govern endocrine-exocrine signaling in tumorigenesis remain incompletely understood. We find that β cell ablation impedes PDAC development in mice, arguing that the endocrine pancreas is critical for exocrine tumorigenesis. Conversely, obesity induces β cell hormone dysregulation, alters CCK-dependent peri-islet exocrine cell transcriptional states, and enhances islet proximal tumor formation. Single-cell RNA-sequencing, in silico latent-space archetypal and trajectory analysis, and genetic lineage tracing in vivo reveal that obesity stimulates postnatal immature β cell expansion and adaptation towards a pro-tumorigenic CCK+ state via JNK/cJun stress-responsive signaling. These results define endocrine-exocrine signaling as a driver of PDAC development and uncover new avenues to target the endocrine pancreas to subvert exocrine tumorigenesis.
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Adriana Romero-Soriano
Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate ae… (voir plus)sthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs.
Insect Identification in the Wild: The AMI Dataset
F. Cunha
M. J. Bunsen
L. Pasi
N. Pinoy
Flemming Helsing
JoAnne Russo
Marc Botham
Michael Sabourin
Jonathan Fréchette
Alexandre Anctil
Yacksecari Lopez
Eduardo Navarro
Filonila Perez Pimentel
Ana Cecilia Zamora
José Alejandro Ramirez Silva
Jonathan Gagnon
Tom August
K. Bjerge … (voir 8 de plus)
Alba Gomez Segura
Marc Bélisle
Yves Basset
K. P. McFarland
David Roy
Toke Thomas Høye
Maxim Larrivée
Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems… (voir plus) and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study insects, and have proposed computer vision algorithms as an answer for scalable data processing. However, insect monitoring in the wild poses unique challenges that have not yet been addressed within computer vision, including the combination of long-tailed data, extremely similar classes, and significant distribution shifts. We provide the first large-scale machine learning benchmarks for fine-grained insect recognition, designed to match real-world tasks faced by ecologists. Our contributions include a curated dataset of images from citizen science platforms and museums, and an expert-annotated dataset drawn from automated camera traps across multiple continents, designed to test out-of-distribution generalization under field conditions. We train and evaluate a variety of baseline algorithms and introduce a combination of data augmentation techniques that enhance generalization across geographies and hardware setups.