Publications

JaxPruner: A concise library for sparsity research

Joo Hyung Lee

Wonpyo Park

Nicole Elyse Mitchell

Jonathan Pilault

Johan Samir Obando Ceron

Han-Byul Kim

Namhoon Lee

Elias Frantar

Yun Long

Amir Yazdanbakhsh

Shivani Agrawal

Suvinay Subramanian

Xin Wang

Sheng-Chun Kao

Xingyao Zhang

Trevor Gale

Aart J.C. Bik

Woohyun Han

Milen Ferev

Zhonglin Han … (see 5 more)

Hong-Seok Kim

Yann Dauphin

Gintare Karolina Dziugaite

Pablo Samuel Castro

Utku Evci

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims … (see more)to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

2024-01-08

Conference on Parsimony and Learning (published)

doi.org

openreview.net

GABAergic inhibition shapes behavior and neural dynamics in human visual working memory

Jan Kujala

Carolina Ciumas

Julien Jung

Sandrine Bouvard

Françoise Lecaignard

Amélie Lothe

Romain Bouet

Philippe Ryvlin

Karim Jerbi

Abstract Neuronal inhibition, primarily mediated by GABAergic neurotransmission, is crucial for brain development and healthy cognition. Gam… (see more)ma-aminobutyric acid concentration levels in sensory areas have been shown to correlate with hemodynamic and oscillatory neuronal responses. How these measures relate to one another during working memory, a higher-order cognitive process, is still poorly understood. We address this gap by collecting magnetoencephalography, functional magnetic resonance imaging, and Flumazenil positron emission tomography data within the same subject cohort using an n-back working-memory paradigm. By probing the relationship between GABAA receptor distribution, neural oscillations, and Blood Oxygen Level Dependent (BOLD) modulations, we found that GABAA receptor density in higher-order cortical areas predicted the reaction times on the working-memory task and correlated positively with the peak frequency of gamma power modulations and negatively with BOLD amplitude. These findings support and extend theories linking gamma oscillations and hemodynamic responses to gamma-aminobutyric acid neurotransmission and to the excitation-inhibition balance and cognitive performance in humans. Considering the small sample size of the study, future studies should test whether these findings also hold for other, larger cohorts as well as to examine in detail how the GABAergic system and neural fluctuations jointly support working-memory task performance.

2024-01-06

Cerebral Cortex (published)

doi.org

Functional Labeled Optimal Partitioning

Jacob M. Kaufman

Alyssa J. Stenberg

Toby Dylan Hocking

2024-01-05

Journal of Computational And Graphical Statistics (published)

doi.org

On the Stability of a non-hyperbolic nonlinear map with non-bounded set of non-isolated fixed points with applications to Machine Learning

Roberta Hansen

Matias Vera

Lautaro Estienne

LUCIANA FERRER

Pablo Piantanida

2024-01-05

ArXiv (preprint)

arxiv.org

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Mehil B. Shah

Mohammad Masudur Rahman

Foutse Khomh

Context: Deep learning has achieved remarkable progress in various domains. However, like any software system, deep learning systems contain… (see more) bugs, some of which can have severe impacts, as evidenced by crashes involving autonomous vehicles. Despite substantial advancements in deep learning techniques, little research has focused on reproducing deep learning bugs, which is an essential step for their resolution. Existing literature suggests that only 3% of deep learning bugs are reproducible, underscoring the need for further research. Objective: This paper examines the reproducibility of deep learning bugs. We identify edit actions and useful information that could improve the reproducibility of deep learning bugs. Method: First, we construct a dataset of 668 deep-learning bugs from Stack Overflow and GitHub across three frameworks and 22 architectures. Second, out of the 668 bugs, we select 165 bugs using stratified sampling and attempt to determine their reproducibility. While reproducing these bugs, we identify edit actions and useful information for their reproduction. Third, we used the Apriori algorithm to identify useful information and edit actions required to reproduce specific types of bugs. Finally, we conducted a user study involving 22 developers to assess the effectiveness of our findings in real-life settings. Results: We successfully reproduced 148 out of 165 bugs attempted. We identified ten edit actions and five useful types of component information that can help us reproduce the deep learning bugs. With the help of our findings, the developers were able to reproduce 22.92% more bugs and reduce their reproduction time by 24.35%. Conclusions: Our research addresses the critical issue of deep learning bug reproducibility. Practitioners and researchers can leverage our findings to improve deep learning bug reproducibility.

2024-01-05

ArXiv (preprint)

doi.org

arxiv.org

Are LLMs Robust for Spoken Dialogues?

Seyed Mahed Mousavi

Gabriel Roccabruna

Simone Alghisi

Massimo Rizzoli

Mirco Ravanelli

Giuseppe Riccardi

Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tra… (see more)cking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In this work, we have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets. Due to the lack of proper spoken dialogue datasets, we have automatically transcribed a development set of spoken dialogues with a state-of-the-art ASR engine. We have characterized the ASR-error types and their distributions and simulated these errors in a large dataset of dialogues. We report the intrinsic (perplexity) and extrinsic (human evaluation) performance of fine-tuned GPT-2 and T5 models in two subtasks of response generation and dialogue state tracking, respectively. The results show that LLMs are not robust to spoken noise by default, however, fine-tuning/training such models on a proper dataset of spoken TODs can result in a more robust performance.

2024-01-04

ArXiv (preprint)

doi.org

arxiv.org

A primer on the use of machine learning to distil knowledge from data in biological psychiatry.

Thomas P. Quinn

Jonathan L. Hess

Victoria S. Marshe

Michelle M. Barnett

Anne-Christin Hauschild

Malgorzata Maciukiewicz

Samar S. M. Elsheikh

Xiaoyu Men

Emanuel Schwarz

Yannis Trakadis

Michael S. Breen

Eric J. Barnett

Yanli Zhang-James

Mehmet Eren Ahsen

Han Cao

Junfang Chen

Jiahui Hou

Asif Salekin

Ping-I Lin

Kristin K. Nicodemus … (see 7 more)

Andreas Meyer-Lindenberg

Isabelle Bichindaritz

Stephen V. Faraone

Murray J. Cairns

Gaurav Pandey

Daniel J. Müller

Stephen J. Glatt

2024-01-04

Molecular Psychiatry (published)

doi.org

A primer on the use of machine learning to distil knowledge from data in biological psychiatry.

Thomas P. Quinn

Jonathan L. Hess

Victoria S. Marshe

Michelle M. Barnett

Anne-Christin Hauschild

Malgorzata Maciukiewicz

Samar S. M. Elsheikh

Xiaoyu Men

Emanuel Schwarz

Yannis Trakadis

Michael S. Breen

Eric J. Barnett

Yanli Zhang-James

Mehmet Eren Ahsen

Han Cao

Junfang Chen

Jiahui Hou

Asif Salekin

Ping-I Lin

Kristin K. Nicodemus … (see 7 more)

Andreas Meyer-Lindenberg

Isabelle Bichindaritz

Stephen V. Faraone

Murray J. Cairns

Gaurav Pandey

Daniel J. Müller

Stephen J. Glatt

2024-01-04

Molecular Psychiatry (published)

doi.org

AITA: AI trustworthiness assessment

Bertrand Braunschweig

Stefan Buijsman

Faicel Chamroukhi

Fredrik Heintz

Foutse Khomh

Juliette Mattioli

Maximilian Poretschkin

2024-01-03

AI and Ethics (published)

doi.org

Bag of Tricks for Fully Test-Time Adaptation

Saypraseuth Mounsaveng

Florent Chiaroni

Malik Boudiaf

Marco Pedersoli

Ismail Ben Ayed

Fully Test-Time Adaptation (TTA), which aims at adapting models to data drifts, has recently attracted wide interest. Numerous tricks and te… (see more)chniques have been proposed to ensure robust learning on arbitrary streams of unlabeled data. However, assessing the true impact of each individual technique and obtaining a fair comparison still constitutes a significant challenge. To help consolidate the community’s knowledge, we present a categorization of selected orthogonal TTA techniques, including small batch normalization, stream rebalancing, reliable sample selection, and network confidence calibration. We meticulously dissect the effect of each approach on different scenarios of interest. Through our analysis, we shed light on trade-offs induced by those techniques between accuracy, the computational power required, and model complexity. We also uncover the synergy that arises when combining techniques and are able to establish new state-of-the-art results.

2024-01-03

2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

doi.org

arxiv.org

A Column Generation Scheme for Distributionally Robust Multi-Item Newsvendor Problems

Shanshan Wang

Erick Delage

This paper studies a distributionally robust multi-item newsvendor problem, where the demand distribution is unknown but specified with a ge… (see more)neral event-wise ambiguity set. Using the event-wise affine decision rules, we can obtain a conservative approximation formulation of the problem, which can typically be further reformulated as a linear program. In order to efficiently solve the resulting large-scale linear program, we develop a column generation-based decomposition scheme and speed up the computational efficiency by exploiting a special column selection strategy and stopping early based on a Karush-Kuhn-Tucker condition test. Focusing on the Wasserstein ambiguity set and the event-wise mean absolute deviation set, a computational study demonstrates both the computational efficiency of the proposed algorithm, which significantly outperforms a commercial solver and a Benders decomposition method, and the out-of-sample superiority of distributionally robust solutions relative to their sample average approximation counterparts. History: Accepted by Nicola Secomandi, Area Editor for Stochastic Models & Reinforcement Learning. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [492997-2016, RGPIN-2016-05208], the National Natural Science Foundation of China [71972012], Alliance de recherche numérique du Canada, and Canada Research Chairs [CRC-2018-00105]. It was also supported by Groupe d’études et de recherche en analyse des décisions (GERAD). Finally, this research was enabled in part by support provided by Digital Research Alliance of Canada ( https://alliancecan.ca/en ). Supplemental Material: The software that supports the findings of this study is available within the paper and its supplemental information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0010 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0010 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

2024-01-03

INFORMS Journal on Computing (published)

doi.org

Dataset Difficulty and the Role of Inductive Bias

Devin Kwok

Nikhil Anand

Jonathan Frankle

Gintare Karolina Dziugaite

David Rolnick

Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (see more)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.

2024-01-03

ArXiv (preprint)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications