Portrait de Dzmitry Bahdanau

Dzmitry Bahdanau

Membre industriel principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique
Chercheur scientifique IA, ServiceNow
Sujets de recherche
Apprentissage profond
Traitement du langage naturel

Biographie

Dzmitry Bahdanau est professeur adjoint à l’Université McGill et chercheur à ServiceNow Element AI. Précédemment, il a obtenu son doctorat à l'Université de Montréal / Mila – Institut québécois d’intelligence artificielle en travaillant avec Yoshua Bengio. Il s'intéresse aux questions fondamentales et appliquées concernant la compréhension du langage naturel. Ses principaux domaines de recherche comprennent l'analyse sémantique, les interfaces utilisateur du langage, la généralisation systématique et les systèmes hybrides neuronaux symboliques.

Étudiants actuels

Doctorat - McGill
Co-superviseur⋅e :

Publications

RepoFusion: Training Code Models to Understand Your Repository
Disha Shrivastava
Denis Kocetkov
Harm de Vries
Torsten Scholak
Despite the huge success of Large Language Models (LLMs) in coding assistants like GitHub Copilot, these models struggle to understand the c… (voir plus)ontext present in the repository (e.g., imports, parent classes, files with similar names, etc.), thereby producing inaccurate code completions. This effect is more pronounced when using these assistants for repositories that the model has not seen during training, such as proprietary software or work-in-progress code projects. Recent work has shown the promise of using context from the repository during inference. In this work, we extend this idea and propose RepoFusion, a framework to train models to incorporate relevant repository context. Experiments on single-line code completion show that our models trained with repository context significantly outperform much larger code models as CodeGen-16B-multi (
The Stack: 3 TB of permissively licensed source code
Denis Kocetkov
Raymond Li
Loubna Ben allal
Jia LI
Chenghao Mou
Carlos Muñoz Ferrandis
Yacine Jernite
Margaret Mitchell
Sean Hughes
Thomas Wolf
Leandro Von Werra
Harm de Vries
Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language proces… (voir plus)sing but also for code understanding and generation. To stimulate open and responsible research on LLMs for code, we introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages. We describe how we collect the full dataset, construct a permissively licensed subset, present a data governance plan, discuss limitations, and show promising results on text2code benchmarks by training 350M-parameter decoders on different Python subsets. We find that (1) near-deduplicating the data significantly boosts performance across all experiments, and (2) it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data. We make the dataset available at https://hf.co/BigCode, provide a tool called"Am I in The Stack"(https://hf.co/spaces/bigcode/in-the-stack) for developers to search The Stack for copies of their code, and provide a process for code to be removed from the dataset by following the instructions at https://www.bigcode-project.org/docs/about/the-stack/.
SantaCoder: don't reach for the stars!
Loubna Ben allal
Raymond Li
Denis Kocetkov
Chenghao Mou
Christopher Akiki
Carlos Muñoz Ferrandis
Niklas Muennighoff
Mayank Mishra
Alex Gu
Manan Dey
Logesh Kumar Umapathi
Carolyn Jane Anderson
Yangtian Zi
Joel Lamy Poirier
Hailey Schoelkopf
S. Troshin
Dmitry Abulkhanov
Manuel L. Romero
M. Lappert
Francesco De Toni … (voir 21 de plus)
Bernardo Garc'ia del R'io
Qian Liu
Shamik Bose
Urvashi Bhattacharyya
Terry Yue Zhuo
Ian Yu
Paulo Villegas
Marco Zocca
Sourab Mangrulkar
D. Lansky
Huu Nguyen
Danish Contractor
Luisa Villa
Jia LI
Yacine Jernite
Sean Christopher Hughes
Daniel Fried
Arjun Guha
Harm de Vries
Leandro Von Werra
The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech … (voir plus)report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
Arkil Patel
Satwik Bhattamishra
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Gaurav Sahu
Olga Vechtomova
Issam Hadj Laradji
Data augmentation is a widely used technique to address the problem of text classification when there is a limited amount of training data. … (voir plus)Recent work often tackles this problem using large language models (LLMs) like GPT3 that can generate new examples given already available ones. In this work, we propose a method to generate more helpful augmented data by utilizing the LLM's abilities to follow instructions and perform few-shot classifications. Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data. We evaluate the proposed method in challenging 2-shot and zero-shot settings on four text classification datasets: Banking77, TREC6, Subjectivity (SUBJ), and Twitter Complaints. Our experiments show that generating and, crucially, relabeling borderline examples facilitates the transfer of knowledge of a massive LLM like GPT3.5-turbo into smaller and cheaper classifiers like DistilBERT
On the Compositional Generalization Gap of In-Context Learning
Pretrained large generative language models have shown great performance on many tasks, but exhibit low compositional generalization abiliti… (voir plus)es. Scaling such models has been shown to improve their performance on various NLP tasks even just by conditioning them on a few examples to solve the task without any fine-tuning (also known as in-context learning). In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. In the ID settings, the demonstrations are from the same split (\textit{test} or \textit{train}) that the model is being evaluated on, and in the OOD settings, they are from the other split. We look at how the relative generalization gap of in-context learning evolves as models are scaled up. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets, CFQ, SCAN and GeoQuery with different number of exemplars, and observe a trend of decreasing relative generalization gap as models are scaled up.
Compositional Generalization in Dependency Parsing
Compositional Generalization in Dependency Parsing
Compositionality— the ability to combine familiar units like words into novel phrases and sentences— has been the focus of intense inter… (voir plus)est in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behaviour of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser’s lower performance on the most challenging splits.
Combating False Negatives in Adversarial Imitation Learning
Konrad Żołna
Chitwan Saharia
Léonard Boussioux
David Y. T. Hui
Maxime Chevalier-Boisvert
In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the de… (voir plus)sired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the ‘False Negatives’ (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.
Understanding by Understanding Not: Modeling Negation in Language Models
Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language mode… (voir plus)ls often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
BabyAI 1.1
David Y. T. Hui
Maxime Chevalier-Boisvert
The BabyAI platform is designed to measure the sample efficiency of training an agent to follow grounded-language instructions. BabyAI 1.0 … (voir plus)presents baseline results of an agent trained by deep imitation or reinforcement learning. BabyAI 1.1 improves the agent’s architecture in three minor ways. This increases reinforcement learning sample efficiency by up to 3 × and improves imitation learning performance on the hardest level from 77% to 90 . 4% . We hope that these improvements increase the computational efficiency of BabyAI experiments and help users design better agents.