Publications

The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
Prakhar Ganesh
Ihsan Ibrahim Daldaban
Ignacio Cofone
Model multiplicity, the phenomenon where multiple models achieve similar performance despite different underlying learned functions, introdu… (voir plus)ces arbitrariness in model selection. While this arbitrariness may seem inconsequential in expectation, its impact on individuals can be severe. This paper explores various individual concerns stemming from multiplicity, including the effects of arbitrariness beyond final predictions, disparate arbitrariness for individuals belonging to protected groups, and the challenges associated with the arbitrariness of a single algorithmic system creating a monopoly across various contexts. It provides both an empirical examination of these concerns and a comprehensive analysis from the legal standpoint, addressing how these issues are perceived in the anti-discrimination law in Canada. We conclude the discussion with technical challenges in the current landscape of model multiplicity to meet legal requirements and the legal gap between current law and the implications of arbitrariness in model selection, highlighting relevant future research directions for both disciplines.
Towards a Reliable French Speech Recognition Tool for an Automated Diagnosis of Learning Disabilities
Jihene Rezgui
Félix Jobin
Younes Kechout
Chritine Turgeon
Dyslexia, characterized by severe challenges in reading and spelling acquisition, presents a substantial barrier to proficient literacy, res… (voir plus)ulting in significantly reduced reading speed (2 to 3 times slower) and diminished text comprehension. With a prevalence ranging from 5G to 10% in the population, early intervention by speech and language pathologists (SLPs) can mitigate dyslexia's effects, but the diagnosis bottleneck impedes timely support. To address this, we propose leveraging machine learning tools to expedite the diagnosis process, focusing on automating phonetic transcription, a critical step in dyslexia assessment. We investigated the practicality of two model configurations utilizing Google's speech-to-text API with children speech in evaluation scenarios and compared their results against transcriptions crafted by experts. The first configuration focuses on Google API's speech-to-text while the second integrates Phonemizer, a text-to-phonemes tool based on a dictionary. Results analysis indicate that our Google-Phonemizer model yields reading accuracies comparable to those computed from human-made transcriptions, offering promise for clinical application. These findings underscore the potential of AI-driven solutions to enhance dyslexia diagnosis efficiency, paving the way for improved accessibility to vital SLP services.
Understanding Intrinsic Socioeconomic Biases in Large Language Models
Mina Arzaghi
Florian Carichon
Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applicatio… (voir plus)ns, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications.
Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
Armin Moradi
Nicola Neophytou
Estimating Expectations without Sampling: Neural Stein Estimation
Mohsin Hasan
Dinghuai Zhang
Cheikh Ahmed
Awa Khouna
We propose a method for estimating the expected value of a given function …
Implicitly Bayesian Prediction Rules in Deep Learning
Bruno Mlodozeniec
Richard E. Turner
The Bayesian approach leads to coherent updates of predictions under new data, which makes adhering to Bayesian principles appealing in deci… (voir plus)sion-making contexts. Traditionally, integrating Bayesian principles into models like deep neural networks involves setting priors on parameters and approximating posteriors. This is done despite the fact that, typically, priors on parameters reflect any prior beliefs only insofar as they dictate function space behaviour. In this paper, we rethink this approach and consider what properties characterise a prediction rule as being Bayesian. Algorithms meeting such criteria can be deemed implicitly Bayesian — they make the same predictions as some Bayesian model, without explicitly manifesting priors and posteriors. We argue this might be a more fruitful approach towards integrating Bayesian principles into deep learning. In this paper, we propose how to measure how close a general prediction rule is to being implicitly Bayesian, and empirically evaluate multiple prediction strategies using our approach. We also show theoretically that agents relying on non-implicitly Bayesian prediction rules can be easily exploited in adversarial betting settings.
An Introduction to Vision-Language Modeling
Florian Bordes
Richard Yuanzhe Pang
Anurag Ajay
Alexander C. Li
Adrien Bardes
Suzanne Petryk
Oscar Mañas
Zhiqiu Lin
Anas Mahmoud
Bargav Jayaraman
Mark Ibrahim
Melissa Hall
Yunyang Xiong
Jonathan Lebensold
Candace Ross
Srihari Jayakumar
Chuan Guo
Diane Bouchacourt
Haider Al-Tahan
Karthik Padthe … (voir 21 de plus)
Vasu Sharma
Huijuan Xu 0001
Xiaoqing Ellen Tan
Megan Richards
Samuel Lavoie
Pietro Astolfi
Reyhane Askari Hemmat
Jun Chen
Kushal Tirumala
Rim Assouel
Mazda Moayeri
Arjang Talattof
Kamalika Chaudhuri
Zechun Liu
Xilun Chen
Quentin Garrido
Karen Ullrich
Kate Saenko
Asli Celikyilmaz
Vikas Chandra
Neuro-GPT: Towards A Foundation Model for EEG
Wenhui Cui
Woojae Jeong
Philipp Thölke
Takfarinas Medani
Anand A. Joshi
Richard M. Leahy
To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the p… (voir plus)ower of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model. The foundation model is pre-trained on a large-scale data set using a self-supervised task that learns how to reconstruct masked EEG segments. We then fine-tune the model on a Motor Imagery Classification task to validate its performance in a low-data regime (9 subjects). Our experiments demonstrate that applying a foundation model can significantly improve classification performance compared to a model trained from scratch, which provides evidence for the generalizability of the foundation model and its ability to address challenges of data scarcity and heterogeneity in EEG. The code is publicly available at github.com/wenhui0206/NeuroGPT.
Partial Models for Building Adaptive Model-Based Reinforcement Learning Agents
Safa Alver
Ali Rahimi-Kalahroudi
In neuroscience, one of the key behavioral tests for determining whether a subject of study exhibits model-based behavior is to study its ad… (voir plus)aptiveness to local changes in the environment. In reinforcement learning, however, recent studies have shown that modern model-based agents display poor adaptivity to such changes. The main reason for this is that modern agents are typically designed to improve sample efficiency in single task settings and thus do not take into account the challenges that can arise in other settings. In local adaptation settings, one particularly important challenge is in quickly building and maintaining a sufficiently accurate model after a local change. This is challenging for deep model-based agents as their models and replay buffers are monolithic structures lacking distribution shift handling capabilities. In this study, we show that the conceptually simple idea of partial models can allow deep model-based agents to overcome this challenge and thus allow for building locally adaptive model-based agents. By modeling the different parts of the state space through different models, the agent can not only maintain a model that is accurate across the state space, but it can also quickly adapt it in the presence of a local change in the environment. We demonstrate this by showing that the use of partial models in agents such as deep Dyna-Q, PlaNet and Dreamer can allow for them to effectively adapt to the local changes in their environments.
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Louis Fournier
Adel Nabli
Masih Aminbeidokhti
Marco Pedersoli
Edouard Oyallon
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at … (voir plus)an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Mahdi S. Hosseini
First-order optimization methods are currently the mainstream in training deep neural networks (DNNs). Optimizers like Adam incorporate limi… (voir plus)ted curvature information by employing the diagonal matrix preconditioning of the stochastic gradient during the training. Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD. However, their practicality in training DNNs are still limited due to increased per-iteration computations and suboptimal accuracy compared to the first order methods. We present AdaFisher--an adaptive second-order optimizer that leverages a block-diagonal approximation to the Fisher information matrix for adaptive gradient preconditioning. AdaFisher aims to bridge the gap between enhanced convergence capabilities and computational efficiency in second-order optimization framework for training DNNs. Despite the slow pace of second-order optimizers, we showcase that AdaFisher can be reliably adopted for image classification, language modelling and stand out for its stability and robustness in hyperparameter tuning. We demonstrate that AdaFisher outperforms the SOTA optimizers in terms of both accuracy and convergence speed. Code available from \href{https://github.com/AtlasAnalyticsLab/AdaFisher}{https://github.com/AtlasAnalyticsLab/AdaFisher}
Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers
Fatemeh Nourilenjan Nokabadi
Jean-Francois Lalonde