Publications

Online Continual Learning of Video Diffusion Models From a Single Video Stream
Jinsoo Yoo
Dylan Green
Geoff Pleiss
Recurrent Policies Are Not Enough for Continual Reinforcement Learning
Nathan Samuel de Lara
Veronica Chelu
Continual Reinforcement Learning (CRL) aims to develop algorithms that adapt to non-stationary sequences of tasks. A promising recent approa… (voir plus)ch utilizes Recurrent Neural Networks (RNNs) to learn contextual Markov Decision Process (MDP) embeddings. This enables a reinforcement learning (RL) agent to discern the optimality of actions across diverse tasks. In this study, we examine two critical failure modes in the learning of these contextual MDP embeddings. Specifically, we find that RNNs are prone to catastrophic forgetting, manifesting in two distinct ways: (i) embedding collapse---where agents initially learn a contextual task structure that later collapses to a single task, and (ii) embedding drift---where learning embeddings for new MDPs interferes with embeddings the RNN outputs for previous MDPs in the sequence, leading to suboptimal performance of downstream policy networks conditioned on stale embeddings. We explore the effects of various objective functions and network architectures concerning these failure modes, revealing that one of these modes consistently emerges across different setups.
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus
Maksim Kuznetsov
Roman Schutski
Shayakhmetov Rim
Daniil Polykovskiy
Alex Zhavoronkov
Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding … (voir plus)of the complex physical interactions between the molecule and its environment. In this paper, we present a novel generative model, BindGPT which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pretrain BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pretrained language model can serve at the same time as a 3D molecular generative model, conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such simple conceptual approach combined with pretraining and scaling can perform on par or better than the current best specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.
DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations
Nima Fathi
Amar Kumar
Brennan Nichyporuk
Mohammad Havaei
Deep learning classifiers are prone to latching onto dominant confounders present in a dataset rather than on the causal markers associated … (voir plus)with the target class, leading to poor generalization and biased predictions. Although explainability via counterfactual image generation has been successful at exposing the problem, bias mitigation strategies that permit accurate explainability in the presence of dominant and diverse artifacts remain unsolved. In this work, we propose the DeCoDEx framework and show how an external, pre-trained binary artifact detector can be leveraged during inference to guide a diffusion-based counterfactual image generator towards accurate explainability. Experiments on the CheXpert dataset, using both synthetic artifacts and real visual artifacts (support devices), show that the proposed method successfully synthesizes the counterfactual images that change the causal pathology markers associated with Pleural Effusion while preserving or ignoring the visual artifacts. Augmentation of ERM and Group-DRO classifiers with the DeCoDEx generated images substantially improves the results across underrepresented groups that are out of distribution for each class. The code is made publicly available at https://github.com/NimaFathi/DeCoDEx.
Early Detection of an Invasive Alien Plant (Phragmites australis) Using Unoccupied Aerial Vehicles and Artificial Intelligence
Antoine Caron-Guay
Mickaël Germain
The combination of unoccupied aerial vehicles (UAVs) and artificial intelligence to map vegetation represents a promising new approach to im… (voir plus)prove the detection of invasive alien plant species (IAPS). The high spatial resolution achievable with UAVs and recent innovations in computer vision, especially with convolutional neural networks, suggest that early detection of IAPS could be possible, thus facilitating their management. In this study, we evaluated the suitability of this approach for mapping the location of common reed (Phragmites australis subsp. australis) within a national park located in southern Quebec, Canada. We collected data on six distinct dates during the growing season, covering environments with different levels of reed invasion. Overall, model performance was high for the different dates and zones, especially for recall (mean of 0.89). The results showed an increase in performance, reaching a peak following the appearance of the inflorescence in September (highest F1-score at 0.98). Furthermore, a decrease in spatial resolution negatively affected recall (18% decrease between a spatial resolution of 0.15 cm pixel−1 and 1.50 cm pixel−1) but did not have a strong impact on precision (2% decrease). Despite challenges associated with common reed mapping in a post-treatment monitoring context, the use of UAVs and deep learning shows great potential for IAPS detection when supported by a suitable dataset. Our results show that, from an operational point of view, this approach could be an effective tool for speeding up the work of biologists in the field and ensuring better management of IAPS.
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat
Melissa Hall
Alicia Sun
Candace Ross
Michal Drozdzal
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
Benjamin Bucknall
Andreas Haupt
Kevin Wei
Jérémy Scheurer
Marius Hobbhahn
Lee Sharkey
Satyapriya Krishna
Marvin Von Hagen
Silas Alberti
Alan Chan
Qinyi Sun
Michael Gerovitch
David Bau
Max Tegmark
Dylan Hadfield-Menell
External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depe… (voir plus)nds on the degree of system access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to its training and deployment information (e.g., methodology, code, documentation, hyperparameters, data, deployment details, findings from internal evaluations) allows for auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.
Characterizing and Classifying Developer Forum Posts with their Intentions
Xingfang Wu
Eric Laufer
Heng Li
Santhosh Srinivasan
Jayden Luo
With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses diffi… (voir plus)culties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models
Jessica Ojo
Israel Abebe Azime
Zhuang Yun Jian
Jesujoba Oluwadara Alabi
Xuanli He
Millicent Ochieng
Sara Hooker
Andiswa Bukula
En-Shiun Annie Lee
Chiamaka Ijeoma Chukwuneke
Happy Buzaaba
Blessing Kudzaishe Sibanda
Godson Kalipe
Jonathan Mukiibi
Salomon Kabongo
Foutse Yuehgoh
M. Setaka
Lolwethu Ndolela
Nkiruka Bridget Odu … (voir 6 de plus)
Rooweither Mabuya
Shamsuddeen Hassan Muhammad
Salomey Osei
Sokhar Samb
Tadesse Kebede Guge
Pontus Stenetorp
Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languag… (voir plus)es. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58\% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.
Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework
Eshta Bhardwaj
Harshit Gujral
Siyi Wu
Ciara Zogheib
Christoph Becker
Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and… (voir plus) shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in machine learning dataset development through the lens of data curation. We evaluate data practices in machine learning as data curation practices. To do so, we develop a framework for evaluating machine learning datasets using data curation concepts and principles through a rubric. Through a mixed-methods analysis of evaluation results for 25 ML datasets, we study the feasibility of data curation principles to be adopted for machine learning data work in practice and explore how data curation is currently performed. We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles. Our findings illustrate difficulties at the intersection of these fields, such as evaluating dimensions that have shared terms in both fields but non-shared meanings, a high degree of interpretative flexibility in adapting concepts without prescriptive restrictions, obstacles in limiting the depth of data curation expertise needed to apply the rubric, and challenges in scoping the extent of documentation dataset creators are responsible for. We propose ways to address these challenges and develop an overall framework for evaluation that outlines how data curation concepts and methods can inform machine learning data practices.
Meta's AI translation model embraces overlooked languages.
Noisy Data Visualization using Functional Data Analysis
Haozhe Chen
Andres Felipe Duque Correa
Kevin R. Moon
Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many ex… (voir plus)isting methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.