Portrait de Dzmitry Bahdanau

Dzmitry Bahdanau

Membre industriel principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Chercheur scientifique IA, ServiceNow
Sujets de recherche
Apprentissage profond
Traitement du langage naturel

Biographie

Dzmitry Bahdanau est professeur adjoint à l’Université McGill et chercheur à ServiceNow Element AI. Précédemment, il a obtenu son doctorat à l'Université de Montréal / Mila – Institut québécois d’intelligence artificielle en travaillant avec Yoshua Bengio. Il s'intéresse aux questions fondamentales et appliquées concernant la compréhension du langage naturel. Ses principaux domaines de recherche comprennent l'analyse sémantique, les interfaces utilisateur du langage, la généralisation systématique et les systèmes hybrides neuronaux symboliques.

Étudiants actuels

Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :
Doctorat - McGill
Co-superviseur⋅e :

Publications

How to Get Your LLM to Generate Challenging Problems for Evaluation
The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional hum… (voir plus)an annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems. In this work, we introduce **CHASE**, a framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a difficult problem in a bottom-up manner from simpler components in a verifiable way. We implement CHASE to create evaluation benchmarks across three diverse domains on which state-of-the-art LLMs demonstrate severe vulnerabilities.
PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation
Ehsan Kamalloo
Rafael Pardinas
Reinforcement Learning (RL) is increasingly utilized to enhance the reasoning capabilities of Large Language Models (LLMs). However, effecti… (voir plus)vely scaling these RL methods presents significant challenges, primarily due to the difficulty in maintaining high AI accelerator utilization without generating stale, off-policy data that harms common RL algorithms. This paper introduces PipelineRL, an approach designed to achieve a superior trade-off between hardware efficiency and data on-policyness for LLM training. PipelineRL employs concurrent asynchronous data generation and model training, distinguished by the novel in-flight weight updates. This mechanism allows the LLM generation engine to receive updated model weights with minimal interruption during the generation of token sequences, thereby maximizing both the accelerator utilization and the freshness of training data. Experiments conducted on long-form reasoning tasks using 128 H100 GPUs demonstrate that PipelineRL achieves approximately
LLMs can learn self-restraint through iterative self-reflection
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning
Masoud Hashemi
Juan A. Rodriguez
Khyati Mahajan
Vikas Yadav
Sathwik Tejaswi Madhusudhan
David Vazquez
Enamul Hoque
Sai Rajeswar
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Shikhar Murty
Hao Zhu
Christopher D Manning
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Shikhar Murty
Hao Zhu
Christopher D Manning
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agent… (voir plus)s. Given any website, NNetNav produces these demonstrations by retroactively labeling action sequences from an exploration policy. Most work on training browser agents has relied on expensive human supervision, and the limited prior work on such interaction-based techniques has failed to provide effective search through the exponentially large space of exploration. In contrast, NNetNav exploits the hierarchical structure of language instructions to make this search more tractable: Complex instructions are typically decomposable into simpler sub-tasks, allowing NNetNav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. \texttt{LLama-3.1-8b} finetuned on 10k NNetNav self-generated demonstrations obtains over 16\% success rate on WebArena, and 35\% on WebVoyager, an improvement of 15pts and 31pts respectively over zero-shot \texttt{LLama-3.1-8b}, outperforming zero-shot GPT-4 and reaching the state-of-the-art among unsupervised methods, for both benchmarks.
How to Get Your LLM to Generate Challenging Problems for Evaluation
The pace of evolution of Large Language Models (LLMs) necessitates new approaches for rigorous and comprehensive evaluation. Traditional hum… (voir plus)an annotation is increasingly impracticable due to the complexities and costs involved in generating high-quality, challenging problems, particularly for tasks such as long-context reasoning. Moreover, the rapid saturation of existing human-curated benchmarks by LLMs further necessitates the need to develop scalable and automatically renewable evaluation methodologies. In this work, we introduce **CHASE**, a unified framework to synthetically generate challenging problems using LLMs without human involvement. For a given task, our approach builds a hard problem in a bottom-up manner from simpler components. Moreover since we want to generate synthetic data for evaluation, our framework decomposes the generation process into independently verifiable sub-tasks, thereby ensuring a high level of quality and correctness. We implement CHASE to create evaluation benchmarks across three diverse domains: document-based question answering, repository-level code completion, and math reasoning. The performance of state-of-the-art LLMs on these synthetic benchmarks lies in the range of 40-60\% accuracy, thereby demonstrating the effectiveness of our framework at generating hard problems. Our experiments further reveal that the Gemini models significantly outperform other LLMs at long-context reasoning, and that the performance of all LLMs drastically drops by as much as 70\% when we scale up the context size to 50k tokens.
TapeAgents: a Holistic Framework for Agent Development and Optimization
Nicolas Gontier
Ehsan Kamalloo
Rafael Pardinas
Alex Pich'e
Torsten Scholak
Oleh Shliazhko
Jordan Prince Tremblay
Soham Parikh
Mitul Tiwari
Quaizar Vohra
We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the … (voir plus)session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
TapeAgents: a Holistic Framework for Agent Development and Optimization
Nicolas Gontier
Ehsan Kamalloo
Rafael Pardinas
Alex Pich'e
Torsten Scholak
Oleh Shliazhko
Jordan Prince Tremblay
Soham Parikh
Mitul Tiwari
Quaizar Vohra
We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the … (voir plus)session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
TapeAgents: a Holistic Framework for Agent Development and Optimization
Nicolas Gontier
Ehsan Kamalloo
Rafael Pardinas
Torsten Scholak
Oleh Shliazhko
Jordan Prince Tremblay
Soham Parikh
Mitul Tiwari
Quaizar Vohra
We present TapeAgents, an agent framework built around a granular, structured log tape of the agent session that also plays the role of the … (voir plus)session's resumable state. In TapeAgents we leverage tapes to facilitate all stages of the LLM Agent development lifecycle. The agent reasons by processing the tape and the LLM output to produce new thought and action steps and append them to the tape. The environment then reacts to the agent's actions by likewise appending observation steps to the tape. By virtue of this tape-centred design, TapeAgents can provide AI practitioners with holistic end-to-end support. At the development stage, tapes facilitate session persistence, agent auditing, and step-by-step debugging. Post-deployment, one can reuse tapes for evaluation, fine-tuning, and prompt-tuning; crucially, one can adapt tapes from other agents or use revised historical tapes. In this report, we explain the TapeAgents design in detail. We demonstrate possible applications of TapeAgents with several concrete examples of building monolithic agents and multi-agent teams, of optimizing agent prompts and finetuning the agent's LLM. We present tooling prototypes and report a case study where we use TapeAgents to finetune a Llama-3.1-8B form-filling assistant to perform as well as GPT-4o while being orders of magnitude cheaper. Lastly, our comparative analysis shows that TapeAgents's advantages over prior frameworks stem from our novel design of the LLM agent as a resumable, modular state machine with a structured configuration, that generates granular, structured logs and that can transform these logs into training text -- a unique combination of features absent in previous work.
NNetscape Navigator: Complex Demonstrations for Web Agents Without a Demonstrator
Shikhar Murty
Christopher D. Manning
We introduce NNetscape Navigator (NNetnav), a method for training web agents entirely through synthetic demonstrations. These demonstrations… (voir plus) are collected by first interacting with a browser to generate trajectory rollouts, which are then retroactively labeled into instructions using a language model. Most work on training browser agents has relied on expensive human supervision, and the limited previous work on such interaction-first synthetic data techniques has failed to provide effective search through the exponential space of exploration. In contrast, NNetnav exploits the hierarchical structure of language instructions to make this search more tractable: complex instructions are typically decomposable into simpler subtasks, allowing NNetnav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. We use NNetnav demonstrations from a language model for supervised fine-tuning of a smaller language model policy, and find improvements of 6 points on WebArena and over 20 points on MiniWoB++, two popular environments for web-agents. Notably, on WebArena, we observe that language model policies can be further enhanced when fine-tuned with NNetnav demonstrations derived from the same language model. Finally, we collect and release a dataset of over 6k NNetnav demonstrations on WebArena, spanning a diverse and complex set of instructions.
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders