Publications

Tensor-based Space Debris Detection for Satellite Mega-constellations

Olivier Daoust

Hasan Nayir

Irfan Azam

Antoine Lesage-Landry

Gunes Karabulut Kurt

2024-06-08

2024 IEEE International Conference on Communications Workshops (ICC Workshops) (publié)

doi.org

arxiv.org

Why Don't Prompt-Based Fairness Metrics Correlate?

Abdelrahman Zayed

Goncalo Mordido

Ioana Baldini

A. Chandar

The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led… (voir plus) to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across metrics. We show a significant improvement in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98 across metrics for gender and religion biases, respectively. Our code is available at https://github.com/chandar-lab/CAIRO.

2024-06-08

ArXiv (prépublication)

doi.org

arxiv.org

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Megh Thakkar

Quentin Fournier

Matthew D Riemer

Pin-Yu Chen

Amal Zouaq

Payel Das

A. Chandar

2024-06-06

ArXiv (prépublication)

doi.org

arxiv.org

Lifelong Learning of Video Diffusion Models From a Single Video Stream

Jinsoo Yoo

Yingchen He

Saeid Naderiparizi

Dylan Green

Gido M van de Ven

Geoff Pleiss

Frank N. Wood

2024-06-06

ArXiv (prépublication)

openreview.net

Online Continual Learning of Video Diffusion Models From a Single Video Stream

Jinsoo Yoo

Yingchen He

Saeid Naderiparizi

Dylan Green

Gido M van de Ven

Geoff Pleiss

Frank N. Wood

2024-06-06

ArXiv (prépublication)

doi.org

arxiv.org

Recurrent Policies Are Not Enough for Continual Reinforcement Learning

Nathan Samuel de Lara

Veronica Chelu

Doina Precup

Continual Reinforcement Learning (CRL) aims to develop algorithms that adapt to non-stationary sequences of tasks. A promising recent approa… (voir plus)ch utilizes Recurrent Neural Networks (RNNs) to learn contextual Markov Decision Process (MDP) embeddings. This enables a reinforcement learning (RL) agent to discern the optimality of actions across diverse tasks. In this study, we examine two critical failure modes in the learning of these contextual MDP embeddings. Specifically, we find that RNNs are prone to catastrophic forgetting, manifesting in two distinct ways: (i) embedding collapse---where agents initially learn a contextual task structure that later collapses to a single task, and (ii) embedding drift---where learning embeddings for new MDPs interferes with embeddings the RNN outputs for previous MDPs in the sequence, leading to suboptimal performance of downstream policy networks conditioned on stale embeddings. We explore the effects of various objective functions and network architectures concerning these failure modes, revealing that one of these modes consistently emerges across different setups.

2024-06-06

rl-conference.cc/RLC/2024/Workshop/ICBINB (poster)

openreview.net

Early Detection of an Invasive Alien Plant (Phragmites australis) Using Unoccupied Aerial Vehicles and Artificial Intelligence

Antoine Caron-Guay

Mickaël Germain

Étienne Laliberté

The combination of unoccupied aerial vehicles (UAVs) and artificial intelligence to map vegetation represents a promising new approach to im… (voir plus)prove the detection of invasive alien plant species (IAPS). The high spatial resolution achievable with UAVs and recent innovations in computer vision, especially with convolutional neural networks, suggest that early detection of IAPS could be possible, thus facilitating their management. In this study, we evaluated the suitability of this approach for mapping the location of common reed (Phragmites australis subsp. australis) within a national park located in southern Quebec, Canada. We collected data on six distinct dates during the growing season, covering environments with different levels of reed invasion. Overall, model performance was high for the different dates and zones, especially for recall (mean of 0.89). The results showed an increase in performance, reaching a peak following the appearance of the inflorescence in September (highest F1-score at 0.98). Furthermore, a decrease in spatial resolution negatively affected recall (18% decrease between a spatial resolution of 0.15 cm pixel−1 and 1.50 cm pixel−1) but did not have a strong impact on precision (2% decrease). Despite challenges associated with common reed mapping in a post-treatment monitoring context, the use of UAVs and deep learning shows great potential for IAPS detection when supported by a suitable dataset. Our results show that, from an operational point of view, this approach could be an effective tool for speeding up the work of biologists in the field and ensuring better management of IAPS.

2024-06-05

bioRxiv (prépublication)

doi.org

Black-Box Access is Insufficient for Rigorous Al Audits

Stephen Casper

Carson Ezell

Charlotte Siegmann

Noam Kolt

Taylor Lynn Curtis

Benjamin Bucknall

Andreas Haupt

Kevin Wei

Jérémy Scheurer

Marius Hobbhahn

Lee Sharkey

Satyapriya Krishna

Marvin Von Hagen

Silas Alberti

Alan Chan

Qinyi Sun

Michael Gerovitch

David Bau

Max Tegmark

David Krueger … (voir 1 de plus)

Dylan Hadfield-Menell

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depe… (voir plus)nds on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

2024-06-04

The 2024 ACM Conference on Fairness, Accountability, and Transparency (publié)

doi.org

arxiv.org

Characterizing and Classifying Developer Forum Posts with their Intentions

Xingfang Wu

Eric Laufer

Heng Li

Foutse Khomh

Santhosh Srinivasan

Jayden Luo

With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses diffi… (voir plus)culties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.

2024-06-04

Empirical Software Engineering (publié)

doi.org

arxiv.org

Individual Brain Charting dataset extension, third release for movie watching and retinotopy data

Ana Lúısa Pinho

Hugo Richard

Ana Fernanda Ponce

Michael Eickenberg

Alexis Amadon

Elvis Dopgima Dohmatob

Isabelle Denghien

Juan Jesús Torre

Swetha Shankar

Himanshu Aggarwal

Alexis Thual

Thomas Chapalain

Chantal Ginisty

Séverine Becuwe-Desmidt

Séverine Roger

Yann Lecomte

Valérie Berland

Laurence Laurier

Véronique Joly-Testault

Gaëlle Médiouni-Cloarec … (voir 6 de plus)

Christine Doublé

Bernadette Martins

Gael Varoquaux

Stanislas Dehaene

Lucie Hertz-Pannier

Bertrand Thirion

2024-06-04

Scientific Data (publié)

doi.org

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

David Ifeoluwa Adelani

Jessica Ojo

Israel Abebe Azime

Zhuang Yun Jian

Jesujoba Oluwadara Alabi

Xuanli He

Millicent Ochieng

Sara Hooker

Andiswa Bukula

En-Shiun Annie Lee

Chiamaka Ijeoma Chukwuneke

Happy Buzaaba

Blessing Kudzaishe Sibanda

Godson Kalipe

Jonathan Mukiibi

Salomon Kabongo

Foutse Yuehgoh

M. Setaka

Lolwethu Ndolela

Nkiruka Bridget Odu … (voir 6 de plus)

Rooweither Mabuya

Shamsuddeen Hassan Muhammad

Salomey Osei

Sokhar Samb

Tadesse Kebede Guge

Pontus Stenetorp

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languag… (voir plus)es. Additionally, many low-resource languages (e.g. African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 16 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based QA~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and four proprietary LLMs. Our evaluation reveals a significant performance gap between high-resource languages~(such as English and French) and low-resource African languages. We observe a significant performance gap between open and proprietary models, with the highest performing open model, Aya-101 only at 58\% of the best-performing proprietary model GPT-4o performance. Machine translating the test set to English before evaluation helped to close the gap for larger models that are English-centric, like LLaMa 3 70B. These findings suggest that more efforts are needed to develop and adapt LLMs for African languages.

2024-06-04

ArXiv (prépublication)

doi.org

arxiv.org

Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework

Eshta Bhardwaj

Harshit Gujral

Siyi Wu

Ciara Zogheib

Tegan Maharaj

Christoph Becker

Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and… (voir plus) shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in machine learning dataset development through the lens of data curation. We evaluate data practices in machine learning as data curation practices. To do so, we develop a framework for evaluating machine learning datasets using data curation concepts and principles through a rubric. Through a mixed-methods analysis of evaluation results for 25 ML datasets, we study the feasibility of data curation principles to be adopted for machine learning data work in practice and explore how data curation is currently performed. We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles. Our findings illustrate difficulties at the intersection of these fields, such as evaluating dimensions that have shared terms in both fields but non-shared meanings, a high degree of interpretative flexibility in adapting concepts without prescriptive restrictions, obstacles in limiting the depth of data curation expertise needed to apply the rubric, and challenges in scoping the extent of documentation dataset creators are responsible for. We propose ways to address these challenges and develop an overall framework for evaluation that outlines how data curation concepts and methods can inform machine learning data practices.

2024-06-04

The 2024 ACM Conference on Fairness, Accountability, and Transparency (publié)

doi.org

arxiv.org

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Publications

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications