Publications

Sub-goal Distillation: A Method to Improve Small Language Agents

Elias Stengel-Eskin

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (see more)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

2025-02-17

Proceedings of The 3rd Conference on Lifelong Learning Agents (published)

doi.org

arxiv.org

A Strong Baseline for Molecular Few-Shot Learning

Philippe Formont

Hugo Jeannin

Pablo Piantanida

Ismail Ben Ayed

Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (see more)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.

2025-02-15

TMLR (accepted)

openreview.net

From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Marco Bondaschi

Nived Rajaraman

Xiuying Wei

Kannan Ramchandran

Razvan Pascanu

Caglar Gulçehre

Michael C. Gastpar

Ashok Vardhan Makkuva

While transformer-based language models have driven the AI revolution thus far, their computational complexity has spurred growing interest … (see more)in viable alternatives, such as structured state space sequence models (SSMs) and Selective SSMs. Among these, Mamba (S6) and its variant Mamba-2 have shown remarkable inference speed ups over transformers while achieving comparable or superior performance on complex language modeling tasks. However, despite these architectural innovations and empirical successes, the fundamental learning capabilities of Mamba remain poorly understood. In this paper, we address this gap by studying in-context learning (ICL) on Markov chains and uncovering a surprising phenomenon: unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator, which is both Bayes and minimax optimal, for all Markovian orders. To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing. These theoretical insights align strongly with empirical results and, to the best of our knowledge, represent the first formal connection between Mamba and optimal statistical estimators. Finally, we outline promising research directions inspired by these findings.

2025-02-14

ArXiv (preprint)

arxiv.org

From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Marco Bondaschi

Nived Rajaraman

Xiuying Wei

Kannan Ramchandran

Razvan Pascanu

Caglar Gulçehre

Michael C. Gastpar

Ashok Vardhan Makkuva

2025-02-14

ArXiv (preprint)

doi.org

arxiv.org

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control

Thomas Jiralerspong

Berton Earnshaw

Jason Hartford

Yoshua Bengio

Luca Scimeca

Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.

2025-02-14

ArXiv (preprint)

arxiv.org

A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies

Alicia DeVrio

Myra Cheng

Lisa Egede

Alexandra Olteanu

Su Lin Blodgett

Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies… (see more) like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.

2025-02-14

ArXiv (preprint)

doi.org

arxiv.org

A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies

Alicia DeVrio

Myra Cheng

Lisa Egede

Alexandra Olteanu

Su Lin Blodgett

Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies… (see more) like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.

2025-02-14

ArXiv (preprint)

doi.org

arxiv.org

Bugs in Large Language Models Generated Code: An Empirical Study

Florian Tambon

Arghavan Moradi Dakhel

Amin Nikanjam

Foutse Khomh

Michel C. Desmarais

Giuliano Antoniol

2025-02-13

Empirical Software Engineering (published)

doi.org

arxiv.org

Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models

Gabriel Tseng

Anthony Fuller

Marlena Reil

Henry Herzog

Patrick Beukema

Favyen Bastani

James R Green

Evan Shelhamer

Hannah Kerner

David Rolnick

From crop mapping to flood detection, machine learning in remote sensing has a wide range of societally beneficial applications. The commona… (see more)lities between remote sensing data in these applications present an opportunity for pretrained machine learning models tailored to remote sensing to reduce the labeled data and effort required to solve individual tasks. However, such models must be: (i) flexible enough to ingest input data of varying sensor modalities and shapes (i.e., of varying spatial and temporal dimensions), and (ii) able to model Earth surface phenomena of varying scales and types. To solve this gap, we present Galileo, a family of pretrained remote sensing models designed to flexibly process multimodal remote sensing data. We also introduce a novel and highly effective self-supervised learning approach to learn both large- and small-scale features, a challenge not addressed by previous models. Our Galileo models obtain state-of-the-art results across diverse remote sensing tasks.

2025-02-13

ArXiv (preprint)

doi.org

arxiv.org

Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models

Gabriel Tseng

A. Fuller

Marlena Reil

Henry Herzog

Patrick Beukema

Favyen Bastani

James R. Green

Evan Shelhamer

Hannah Kerner

David Rolnick

From crop mapping to flood detection, machine learning in remote sensing has a wide range of societally beneficial applications. The commona… (see more)lities between remote sensing data in these applications present an opportunity for pretrained machine learning models tailored to remote sensing to reduce the labeled data and effort required to solve individual tasks. However, such models must be: (i) flexible enough to ingest input data of varying sensor modalities and shapes (i.e., of varying spatial and temporal dimensions), and (ii) able to model Earth surface phenomena of varying scales and types. To solve this gap, we present Galileo, a family of pretrained remote sensing models designed to flexibly process multimodal remote sensing data. We also introduce a novel and highly effective self-supervised learning approach to learn both large- and small-scale features, a challenge not addressed by previous models. Our Galileo models obtain state-of-the-art results across diverse remote sensing tasks.

2025-02-13

ArXiv (preprint)

arxiv.org

Galileo: Learning Global&Local Features of Many Remote Sensing Modalities

Gabriel Tseng

A. Fuller

Marlena Reil

Henry Herzog

Patrick Beukema

Favyen Bastani

James R. Green

Evan Shelhamer

Hannah Kerner

David Rolnick

We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, … (see more)elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the diversity of relevant data modalities, and because objects of interest vary massively in scale, from small boats (1-2 pixels and transient) to glaciers (thousands of pixels and persistent). We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. Our dual global and local contrastive losses differ in their targets (deep representations vs. shallow input projections) and masking strategies (structured vs. not). Our Galileo is a single generalist model that outperforms SoTA specialist models for satellite images and pixel time series across eleven benchmarks and multiple tasks.

2025-02-13

ArXiv (preprint)

arxiv.org

INJONGO: A Multicultural Intent Detection and Slot-filling Dataset for 16 African Languages

Hao Yu

Jesujoba Oluwadara Alabi

Andiswa Bukula

Zhuang Yun Jian

En-Shiun Annie Lee

Tadesse Kebede Guge

Israel Abebe Azime

Happy Buzaaba

Blessing Kudzaishe Sibanda

Godson Kalipe

Jonathan Mukiibi

S. Kabenamualu

M. Setaka

Lolwethu Ndolela

Nkiruka Bridget Odu

Rooweither Mabuya

Shamsuddeen Hassan Muhammad

Salomey Osei

Sokhar Samb

Juliet W. Murage … (see 2 more)

Dietrich Klakow

David Ifeoluwa Adelani

Slot-filling and intent detection are well-established tasks in Conversational AI. However, current large-scale benchmarks for these tasks o… (see more)ften exclude evaluations of low-resource languages and rely on translations from English benchmarks, thereby predominantly reflecting Western-centric concepts. In this paper, we introduce Injongo -- a multicultural, open-source benchmark dataset for 16 African languages with utterances generated by native speakers across diverse domains, including banking, travel, home, and dining. Through extensive experiments, we benchmark the fine-tuning multilingual transformer models and the prompting large language models (LLMs), and show the advantage of leveraging African-cultural utterances over Western-centric utterances for improving cross-lingual transfer from the English language. Experimental results reveal that current LLMs struggle with the slot-filling task, with GPT-4o achieving an average performance of 26 F1-score. In contrast, intent detection performance is notably better, with an average accuracy of 70.6%, though it still falls behind the fine-tuning baselines. Compared to the English language, GPT-4o and fine-tuning baselines perform similarly on intent detection, achieving an accuracy of approximately 81%. Our findings suggest that the performance of LLMs is still behind for many low-resource African languages, and more work is needed to further improve their downstream performance.

2025-02-13

ArXiv (preprint)

doi.org

arxiv.org

Hackathon | Building safer AI for youth mental health

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

Indigenous Pathfinders in AI

Publications

Hackathon | Building safer AI for youth mental health

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

Indigenous Pathfinders in AI

Popular keywords:

Publications