Portrait de Darshan Patil n'est pas disponible

Darshan Patil

Doctorat - UdeM
Superviseur⋅e principal⋅e
Sujets de recherche
Apprentissage continu
Apprentissage par renforcement
Apprentissage profond

Publications

Modular Memory is the Key to Continual Learning Agents
Vaggelis Dorovatas
Malte Schwerin
Andrew D. Bagdanov
Lucas Caccia
Antonio Carta
Barbara Hammer
Tyler L. Hayes
Timm Hess
Christopher Kanan
Dhireesha Kudithipudi
Xialei Liu
Vincenzo Lomonaco
Jorge Mendez-Mendez
Ameya Prabhu
Elisa Ricci
Tinne Tuytelaars
Gido M. van de Ven
Liyuan Wang … (voir 4 de plus)
Joost van de Weijer
Jonghyun Choi
Martin Mundt
Foundation models have transformed machine learning through large-scale pretraining and increased test-time compute. Despite surpassing huma… (voir plus)n performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning (IWL), i.e., updating a single model's parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale. We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, charting a practical roadmap toward continually learning agents.
Loss Smoothing for Continual Adaptation
Neural networks are often adapted in nonstationary data distributions settings where the objective is to optimize performance on the current… (voir plus) task, and preserving accuracy on previous tasks is not required. As a result, existing methods primarily focus on improving plasticity, while stability is largely studied in the context of continual learning. In this work, we examine whether preserving stability can also be beneficial in model adaptation settings where past-task performance is irrelevant. We propose a simple loss smoothing approach that encourages selective adaptation by preserving task-shared features while modifying task-inconsistent ones. We evaluate our method on continual supervised model adaptation benchmarks and reinforcement learning benchmarks, and show that promoting representational stability during adaptation can improve performance across settings.
CoPeP: Benchmarking Continual Pretraining for Protein Language Models
Protein language models (pLMs) have recently gained significant attention for their ability to uncover relationships between sequence, struc… (voir plus)ture, and function from evolutionary statistics, thereby accelerating therapeutic drug discovery. These models learn from large protein databases that are continuously updated by the biology community and whose dynamic nature motivates the application of continual learning, not only to keep up with the ever-growing data, but also as an opportunity to take advantage of the temporal meta-information that is created during this process. As a result, we introduce the Continual Pretraining of Protein Language Models (CoPeP) benchmark, a novel benchmark for evaluating continual learning approaches on pLMs. Specifically, we curate a sequence of protein datasets derived from the UniProt Knowledgebase spanning a decade and define metrics to assess pLM performance across 31 protein understanding tasks. We evaluate several methods from the continual learning literature, including replay, unlearning, and plasticity-based methods, some of which have never been applied to models and data of this scale. Our findings reveal that incorporating temporal meta-information improves perplexity by up to 7% even when compared to training on data from all tasks jointly. Moreover, even at scale, several continual learning methods outperform naive continual pretraining. The CoPeP benchmark offers an exciting opportunity to study these methods at scale in an impactful real-world application.
Operationalizing the Superficial Alignment Hypothesis via Task Complexity
The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that pos… (voir plus)t-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called **Task Complexity**: the length of the shortest program that achieves a target performance on a task. In this framework, the SAH claims that pre-trained models drastically reduce the task complexity of achieving high performance on many tasks. Our definition unifies prior arguments supporting the SAH, interpreting them as different strategies to find such short programs. Experimentally, we estimate task complexities of mathematical reasoning, machine translation, and instruction following tasks and show that their respective task complexities can be remarkably low when conditioned on a pre-trained model. Further, we find that pre-training enables access to strong performances on our tasks, but it can require programs of gigabytes of length to access them. Post-training, on the other hand, collapses the complexity of reaching this same performance by several orders of magnitude. Overall, our results highlight that task adaptation can require remarkably little information—often just a few kilobytes.
Position: Modular Memory is the Key to Continual Learning Agents
Vaggelis Dorovatas
Malte Schwerin
Andrew Bagdanov
Lucas Caccia
Antonio Carta
CITEC Barbara Hammer
Tyler Hayes
Timm Hess
Christopher Kanan
Dhireesha Kudithipudi
Xialei Liu
Vincenzo Lomonaco
Jorge Mendez-Mendez
Ameya Pandurang Prabhu
Elisa Ricci
Tinne Tuytelaars
Gido van de Ven
Liyuan Wang … (voir 4 de plus)
Joost van de Weijer
Jonghyun Choi
Martin Mundt
Foundation models have transformed machine learning through large-scale pretraining, massive parameterization, and increased test-time compu… (voir plus)te. Despite surpassing human performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning, i.e., updating a single model’s parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. **Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale.** We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, thereby mitigating catastrophic forgetting and charting a practical roadmap toward continually learning agents.
Intelligent Switching in Reset-Free RL
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resett… (voir plus)ing} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.
Toward Debugging Deep Reinforcement Learning Programs with RLExplorer
Deep reinforcement learning (DRL) has shown success in diverse domains such as robotics, computer games, and recommendation systems. However… (voir plus), like any other software system, DRL-based software systems are susceptible to faults that pose unique challenges for debugging and diagnosing. These faults often result in unexpected behavior without explicit failures and error messages, making debugging difficult and time-consuming. Therefore, automating the monitoring and diagnosis of DRL systems is crucial to alleviate the burden on developers. In this paper, we propose RLExplorer, the first fault diagnosis approach for DRL-based software systems. RLExplorer automatically monitors training traces and runs diagnosis routines based on properties of the DRL learning dynamics to detect the occurrence of DRL-specific faults. It then logs the results of these diagnoses as warnings that cover theoretical concepts, recommended practices, and potential solutions to the identified faults. We conducted two sets of evaluations to assess RLExplorer. Our first evaluation of faulty DRL samples from Stack Overflow revealed that our approach can effectively diagnose real faults in 83% of the cases. Our second evaluation of RLExplorer with 15 DRL experts/developers showed that (1) RLExplorer could identify 3.6 times more defects than manual debugging and (2) RLExplorer is easily integrated into DRL applications.
An Empirical Investigation of the Role of Pre-training in Lifelong Learning
Sanket Vaibhav Mehta
Emma Strubell
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due … (voir plus)to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.