Publications

Survey on Explainable AI: Techniques, challenges and open issues
Adel Abusitta
Miles Q. Li
TARIC-SLU: A Tunisian Benchmark Dataset for Spoken Language Understanding
Salima Mdhaffar
Fethi Bougares
Renato de Mori
Salah Zaiem
Yannick Estève
In recent years, there has been a significant increase in interest in developing Spoken Language Understanding (SLU) systems. SLU involves e… (see more)xtracting a list of semantic information from the speech signal. A major issue for SLU systems is the lack of sufficient amount of bi-modal (audio and textual semantic annotation) training data. Existing SLU resources are mainly available in high-resource languages such as English, Mandarin and French. However, one of the current challenges concerning low-resourced languages is data collection and annotation. In this work, we present a new freely available corpus, named TARIC-SLU, composed of railway transport conversations in Tunisian dialect that is continuously annotated in dialogue acts and slots. We describe the semantic model of the dataset, the data and experiments conducted to build ASR-based and SLU-based baseline models. To facilitate its use, a complete recipe, including data preparation, training and evaluation scripts, has been built and will be integrated to SpeechBrain, a popular open-source conversational AI toolkit based on PyTorch.
Temporal Graph Analysis with TGX
Razieh Shirzadkhani
Shenyang Huang
Elahe Kooshafar
Farimah Poursafaei
Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely d… (see more)esigned for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.
The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory before In-context Learning.
Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
On the Societal Impact of Open Foundation Models
Sayash Kapoor
Rishi Bommasani
Kevin Klyman
Shayne Longpre
Ashwin Ramaswami
Peter Cihon
Aspen Hopkins
Kevin Bankston
Stella Biderman
Miranda Bogen
Rumman Chowdhury
Alex Engler
Peter Henderson
Yacine Jernite
Seth Lazar
Stefano Maffulli
Alondra Nelson
Aviya Skowron
Dawn Song … (see 5 more)
Victor Storchan
Daniel Zhang
Daniel E. Ho
Percy Liang
Arvind Narayanan
Triage Software Update Impact via Release Notes Classification
Solomon Berhe
Vanessa Kan
Omhier Khan
Nathan Pader
Ali Zain Farooqui
Marc Maynard
Two Families of Indexable Partially Observable Restless Bandits and Whittle Index Computation
Nima Akbarzadeh
Uncertainty-aware hybrid paradigm of nonlinear MPC and model-based RL for offroad navigation: Exploration of transformers in the predictive model
Faraz Lotfi
Khalil Virji
Farnoosh Faraji
Lucas Berry
Andrew Holliday
In this paper, we investigate a hybrid scheme that combines nonlinear model predictive control (MPC) and model-based reinforcement learning … (see more)(RL) for navigation planning of an autonomous model car across offroad, unstructured terrains without relying on predefined maps. Our innovative approach takes inspiration from BADGR, an LSTM-based network that primarily concentrates on environment modeling, but distinguishes itself by substituting LSTM modules with transformers to greatly elevate the performance our model. Addressing uncertainty within the system, we train an ensemble of predictive models and estimate the mutual information between model weights and outputs, facilitating dynamic horizon planning through the introduction of variable speeds. Further enhancing our methodology, we incorporate a nonlinear MPC controller that accounts for the intricacies of the vehicle's model and states. The model-based RL facet produces steering angles and quantifies inherent uncertainty. At the same time, the nonlinear MPC suggests optimal throttle settings, striking a balance between goal attainment speed and managing model uncertainty influenced by velocity. In the conducted studies, our approach excels over the existing baseline by consistently achieving higher metric values in predicting future events and seamlessly integrating the vehicle's kinematic model for enhanced decision-making. The code and the evaluation data are available at https://github.com/FARAZLOTFI/offroad_autonomous_navigation/).
Validation of Vigilance Decline Capability in A Simulated Test Environment: A Preliminary Step Towards Neuroadaptive Control
Andra Mahu
Amandeep Singh
Florian Tambon
Benoit Ouellette
Jean-françois Delisle
Tanya Paul
Alexandre Marois
Philippe Doyon-poulin
Vigilance is the ability to sustain attention. It is crucial in tasks like piloting and driving that involve the ability to sustain attentio… (see more)n. However, cognitive performance often falters with prolonged tasks, leading to reduced efficiency, slower reactions, and increased error likelihood. Identifying and addressing diminished vigilance is essential for enhancing driving safety. Neuro-physiological indicators have shown promising results to monitor vigilance, paving the way for neuroadaptive control of vigilance. In fact, the collection of vigilance-related physiological markers could allow, using neuroadaptive intelligent systems, a real-time adaption of tasks or the presentation of countermeasures to prevent errors that would ensue from such hypovigilant situations. Before reaching this goal, one must however collect valid data truly representative of hypovigilance which, in turn, can be used to develop prediction models of the vigilant state. This study serves as a proof of concept to assess validity of a testbed to induce and measure vigilance decline through a simulated test environment, validating controlled induction, and evaluating its impact on participants’ performance and subjective experiences. In total, 28 participants (10 females, 18 males) aged 18 to 35 (M = 23.75 years), were recruited. All participants held valid driving licenses and had corrected-to-normal vision. Data collection involved Psychomotor Vigilance Task (PVT), Karolinska Sleepiness Scale (KSS) and the Stanford Sleepiness Scale (SSS) along with neuro-physiological specialized equipment: Enobio 8 EEG, Empatica E4, Polar H10 and Tobii Nano Pro eye tracker. Notably, this study is limited to demonstrating the results of PVT, KSS, and SSS, with the aim of assessing the effectiveness of the test setup. Participants self-reported their loss of vigilance by pressing a marker on the steering wheel. To induce hypovigilance, participants drove an automatic car in a low-traffic, monotonous environment for 60 minutes, featuring empty fields of grass and desert, employing specific in-game procedures. The driving task included instructions for lane-keeping, indicator usage, and maintaining speeds of up to 80 km/h, with no traffic lights or stop signs present. Experiments were conducted before lunch, between 9 am and 12 pm, ensuring maximum participant alertness, with instructions to abstain from caffeine, alcohol, nicotine, and cannabis on the experiment day. Results showed that the mean reaction time (RT) increased from 257.7 ms before driving to 276.8 ms after driving, t = 4.82, p .0001, d = -0.61 whereas the median RT changed from 246.07 ms to 260.89 ms, t = 3.58, p = 0.0013, d= -0.53 indicating a statistically significant alteration in participant's psychomotor performance. The mean number of minor lapses in attention (RT >500ms) to the PVT increased from 1.11 before driving to 1.67 after driving, but was not statistically significant t = 1.66, p = 0.11, d = -0.28. KSS showed a considerable rise of sleepiness, with a mean of 4.11 (rather alert) before driving increasing to 5.96 (some signs of sleepiness) after driving, t = 5.65, p .0001, d = -1.04. Similarly, the SSS demonstrated an increase in mean values from 2.57 (able to concentrate) before driving to 3.96 (somewhat foggy) after driving, t = 8.42, p .0001, d = -1.20, signifying an increased perception of sleepiness following the driving activity. Lastly, the mean time of the first marker press was 17:38 minutes (SD = 9:47 minutes) indicating that the self-reported loss of vigilance occurred during the first 30 minutes of the driving task. The observed increase in PVT reaction time aligns with the declined alertness reported on both the KSS and SSS responses, suggesting a consistent decline in vigilance and alertness post-driving. In conclusion, the study underscores the effectiveness and validity of the simulated test environment in inducing vigilance decline, providing valuable insights into the impact on both objective and subjective measures. At the same time, the research sets the stage for exploring neuroadaptive control strategies, aiming to enhance task performance and safety. Ultimately, this will contribute to the development of a non-invasive artificial intelligence system capable of detecting vigilance states in extreme/challenging environments, e.g. for pilots and drivers.
Voices Unheard: NLP Resources and Models for Yor\`ub\'a Regional Dialects
Orevaoghene Ahia
Aremu Anuoluwapo
Diana Abagyan
Hila Gonen
Daud Abolade
Noah A. Smith
Yulia Tsvetkov
VulEXplaineR: XAI for Vulnerability Detection on Assembly Code
Samaneh Mahdavifar
Mohd Saqib
Philippe Charland
Andrew Walenstein
What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models
Emily M. Bender
Jeongrok Yu
Timnit Gebru
Seong Ug Kim
Angelina McMillan-642
Jacob Choi
Jinho D. Choi
Su Lin Blodgett
Solon Barocas
Hal Daumé III
Gilsinia Lopez
Robert Sim
Hanna Wallach. 2021
Stereotyp-657
Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (M… (see more)LMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.