We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Sub-goal Distillation: A Method to Improve Small Language Agents
While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational req… (see more)uirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.
2025-02-17
Proceedings of The 3rd Conference on Lifelong Learning Agents (published)
Few-shot learning has recently attracted significant interest in drug discovery, with a recent, fast-growing literature mostly involving con… (see more)voluted meta-learning strategies. We revisit the more straightforward fine-tuning approach for molecular data, and propose a regularized quadratic-probe loss based on the the Mahalanobis distance. We design a dedicated block-coordinate descent optimizer, which avoid the degenerate solutions of our loss. Interestingly, our simple fine-tuning approach achieves highly competitive performances in comparison to state-of-the-art methods, while being applicable to black-box settings and removing the need for specific episodic pre-training strategies. Furthermore, we introduce a new benchmark to assess the robustness of the competing methods to domain shifts. In this setting, our fine-tuning baseline obtains consistently better results than meta-learning methods.
While transformer-based language models have driven the AI revolution thus far, their computational complexity has spurred growing interest … (see more)in viable alternatives, such as structured state space sequence models (SSMs) and Selective SSMs. Among these, Mamba (S6) and its variant Mamba-2 have shown remarkable inference speed ups over transformers while achieving comparable or superior performance on complex language modeling tasks. However, despite these architectural innovations and empirical successes, the fundamental learning capabilities of Mamba remain poorly understood. In this paper, we address this gap by studying in-context learning (ICL) on Markov chains and uncovering a surprising phenomenon: unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator, which is both Bayes and minimax optimal, for all Markovian orders. To explain this, we theoretically characterize the representation capacity of Mamba and reveal the fundamental role of convolution in enabling it to represent the optimal Laplacian smoothing. These theoretical insights align strongly with empirical results and, to the best of our knowledge, represent the first formal connection between Mamba and optimal statistical estimators. Finally, we outline promising research directions inspired by these findings.
Diffusion Probabilistic Models (DPMs) are powerful generative models that have achieved unparalleled success in a number of generative tasks… (see more). In this work, we aim to build inductive biases into the training and sampling of diffusion models to better accommodate the target distribution of the data to model. For topologically structured data, we devise a frequency-based noising operator to purposefully manipulate, and set, these inductive biases. We first show that appropriate manipulations of the noising forward process can lead DPMs to focus on particular aspects of the distribution to learn. We show that different datasets necessitate different inductive biases, and that appropriate frequency-based noise control induces increased generative performance compared to standard diffusion. Finally, we demonstrate the possibility of ignoring information at particular frequencies while learning. We show this in an image corruption and recovery task, where we train a DPM to recover the original target distribution after severe noise corruption.
Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies… (see more) like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.
Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies… (see more) like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that language can be anthropomorphic. In this work, we draw on existing literature and analyze empirical cases of user interactions with language technologies to develop a taxonomy of textual expressions that can contribute to anthropomorphism. We highlight challenges and tensions involved in understanding linguistic anthropomorphism, such as how all language is fundamentally human and how efforts to characterize and shift perceptions of humanness in machines can also dehumanize certain humans. We discuss ways that our taxonomy supports more precise and effective discussions of and decisions about anthropomorphism of language technologies.
From crop mapping to flood detection, machine learning in remote sensing has a wide range of societally beneficial applications. The commona… (see more)lities between remote sensing data in these applications present an opportunity for pretrained machine learning models tailored to remote sensing to reduce the labeled data and effort required to solve individual tasks. However, such models must be: (i) flexible enough to ingest input data of varying sensor modalities and shapes (i.e., of varying spatial and temporal dimensions), and (ii) able to model Earth surface phenomena of varying scales and types. To solve this gap, we present Galileo, a family of pretrained remote sensing models designed to flexibly process multimodal remote sensing data. We also introduce a novel and highly effective self-supervised learning approach to learn both large- and small-scale features, a challenge not addressed by previous models. Our Galileo models obtain state-of-the-art results across diverse remote sensing tasks.
From crop mapping to flood detection, machine learning in remote sensing has a wide range of societally beneficial applications. The commona… (see more)lities between remote sensing data in these applications present an opportunity for pretrained machine learning models tailored to remote sensing to reduce the labeled data and effort required to solve individual tasks. However, such models must be: (i) flexible enough to ingest input data of varying sensor modalities and shapes (i.e., of varying spatial and temporal dimensions), and (ii) able to model Earth surface phenomena of varying scales and types. To solve this gap, we present Galileo, a family of pretrained remote sensing models designed to flexibly process multimodal remote sensing data. We also introduce a novel and highly effective self-supervised learning approach to learn both large- and small-scale features, a challenge not addressed by previous models. Our Galileo models obtain state-of-the-art results across diverse remote sensing tasks.
We introduce a highly multimodal transformer to represent many remote sensing modalities - multispectral optical, synthetic aperture radar, … (see more)elevation, weather, pseudo-labels, and more - across space and time. These inputs are useful for diverse remote sensing tasks, such as crop mapping and flood detection. However, learning shared representations of remote sensing data is challenging, given the diversity of relevant data modalities, and because objects of interest vary massively in scale, from small boats (1-2 pixels and transient) to glaciers (thousands of pixels and persistent). We present a novel self-supervised learning algorithm that extracts multi-scale features across a flexible set of input modalities through masked modeling. Our dual global and local contrastive losses differ in their targets (deep representations vs. shallow input projections) and masking strategies (structured vs. not). Our Galileo is a single generalist model that outperforms SoTA specialist models for satellite images and pixel time series across eleven benchmarks and multiple tasks.