Publications

Proving Linear Mode Connectivity of Neural Networks via Optimal Transport

Aymeric Dieuleveut

The energy landscape of high-dimensional non-convex optimization problems is crucial to understanding the effectiveness of modern deep neura… (see more)l network architectures. Recent works have experimentally shown that two different solutions found after two runs of a stochastic training are often connected by very simple continuous paths (e.g., linear) modulo a permutation of the weights. In this paper, we provide a framework theoretically explaining this empirical observation. Based on convergence rates in Wasserstein distance of empirical measures, we show that, with high probability, two wide enough two-layer neural networks trained with stochastic gradient descent are linearly connected. Additionally, we express upper and lower bounds on the width of each layer of two deep neural networks with independent neuron weights to be linearly connected. Finally, we empirically demonstrate the validity of our approach by showing how the dimension of the support of the weight distribution of neurons, which dictates Wasserstein convergence rates is correlated with linear mode connectivity.

2023-12-31

AISTATS (published)

doi.org

proceedings.mlr.press

Quantifying learning-style adaptation in effectiveness of LLM teaching

Ruben Weijers

Gabrielle Fidelis de Castilho

Jean-François Godbout

Reihaneh Rabbany

Kellin Pelrine

This preliminary study aims to investigate whether AI, when prompted based on individual learning styles, can effectively improve comprehens… (see more)ion and learning experiences in educational settings. It involves tailoring LLMs baseline prompts and comparing the results of a control group receiving standard content and an experimental group receiving learning style-tailored content. Preliminary results suggest that GPT-4 can generate responses aligned with various learning styles, indicating the potential for enhanced engagement and comprehension. However, these results also reveal challenges, including the model’s tendency for sycophantic behavior and variability in responses. Our findings suggest that a more sophisticated prompt engineering approach is required for integrating AI into education (AIEd) to improve educational outcomes.

2023-12-31

PERSONALIZE (published)

doi.org

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Simon Chamorro

Victor Klemm

Miguel de La Iglesia Valls

Christopher Pal

Roland Siegwart

2023-12-31

ICRA (published)

doi.org

arxiv.org

Reinforcement Learning Informed Evolutionary Search for Autonomous Systems Testing

Dmytro Humeniuk

Foutse Khomh

Giuliano Antoniol

Evolutionary search-based techniques are commonly used for testing autonomous robotic systems. However, these approaches often rely on compu… (see more)tationally expensive simulator-based models for test scenario evaluation. To improve the computational efficiency of the search-based testing, we propose augmenting the evolutionary search (ES) with a reinforcement learning (RL) agent trained using surrogate rewards derived from domain knowledge. In our approach, known as RIGAA (Reinforcement learning Informed Genetic Algorithm for Autonomous systems testing), we first train an RL agent to learn useful constraints of the problem and then use it to produce a certain part of the initial population of the search algorithm. By incorporating an RL agent into the search process, we aim to guide the algorithm towards promising regions of the search space from the start, enabling more efficient exploration of the solution space. We evaluate RIGAA on two case studies: maze generation for an autonomous ant robot and road topology generation for an autonomous vehicle lane keeping assist system. In both case studies, RIGAA converges faster to fitter solutions and produces a better test suite (in terms of average test scenario fitness and diversity). RIGAA also outperforms the state-of-the-art tools for vehicle lane keeping assist system testing, such as AmbieGen and Frenetic.

2023-12-31

ACM Trans. Softw. Eng. Methodol. (published)

doi.org

arxiv.org

Reproducible Spinal Cord Quantitative MRI Analysis with the Spinal Cord Toolbox

Jan Valosek

Julien Cohen-Adad

The spinal cord plays a pivotal role in the central nervous system, providing communication between the brain and the body and containing cr… (see more)itical motor and sensory networks. Recent advancements in spinal cord MRI data acquisition and image analysis have shown a potential to improve the diagnostics, prognosis, and management of a variety of pathological conditions. In this review, we first discuss the significance of standardized spinal cord MRI acquisition protocol in multi-center and multi-manufacturer studies. Then, we cover open-access spinal cord MRI datasets, which are important for reproducible science and validation of new methods. Finally, we elaborate on the recent advances in spinal cord MRI data analysis techniques implemented in the open-source software package Spinal Cord Toolbox (SCT).

2023-12-31

Magnetic Resonance in Medical Sciences (published)

doi.org

Resilience and Mental-Health Symptoms in ICU Healthcare Professionals Facing Repeated COVID-19 Waves

Elie Azoulay

Frédéric Pochard

Laurent Argaud

Alain Cariou

Raphael Clere-Jehl

Olivier Guisset

Vincent Labbé

Fabienne Tamion

Fabrice Bruneel

Mercé Jourdain

Danielle Reuter

Kada Klouche

Achille Kouatchet

Virginie Souppart

Alexandre Lautrette

Julien Bohé

Antoine Vieillard Baron

Jean Dellamonica

Laurent Papazian

Jean Reignier … (see 3 more)

François Barbier

Guillaume Dumas

Nancy Kentish-Barnes

2023-12-31

American Journal of Respiratory and Critical Care Medicine (published)

doi.org

RGP: Achieving Memory-Efficient Model Fine-tuning Via Randomized Gradient Projection

Ali Saheb Pasand

Pouya Bashivan

2023-12-31

ENLSP (published)

proceedings.mlr.press

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Bac Nguyen

Stefan Uhlich

Fabien Cardinaux

Lukas Mauch

Marzieh Edraki

Aaron Courville

Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the fie… (see more)ld of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-shot performance, further adaptation of the model to downstream tasks leads to undesirable degradation for OOD data. In this work, we introduce Sparse Adaptation for Fine-Tuning (SAFT), a method that prevents fine-tuning from forgetting the general knowledge in the pre-trained model. SAFT only updates a small subset of important parameters whose gradient magnitude is large, while keeping the other parameters frozen. SAFT is straightforward to implement and conceptually simple. Extensive experiments show that with only 0.1% of the model parameters, SAFT can significantly improve the performance of CLIP. It consistently outperforms baseline methods across several benchmarks. On the few-shot learning benchmark of ImageNet and its variants, SAFT gives a gain of 5.15% on average over the conventional fine-tuning method in OOD settings.

2023-12-31

ECCV (69) (published)

doi.org

arxiv.org

Scaling Laws Do Not Scale

Fernando Diaz

Michael Madaio

Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) … (see more)models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.

2023-12-31

AIES (1) (published)

doi.org

arxiv.org

SCIsegV2: A Universal Tool for Segmentation of Intramedullary Lesions in Spinal Cord Injury

Enamundram Naga Karthik

Jan Valosek

Lynn Farner

Dario Pfyffer

Simon Schading-Sassenhausen

Anna Lebret

Gergely David

Andrew C. Smith

Kenneth A. Weber

Maryam Seif

Rhscir Network Imaging Group

Patrick Freund

Julien Cohen-Adad

2023-12-31

AMAI@MICCAI (published)

doi.org

arxiv.org

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Siva Reddy

Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope ambiguiti… (see more)es. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models -- GPT-2, GPT-3/3.5, Llama 2 and GPT-4 -- treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).

2023-12-31

Trans. Assoc. Comput. Linguistics (published)

doi.org

arxiv.org

Self-supervised multimodal learning for group inferences from MRI data: Discovering disorder-relevant brain regions and multimodal links

Alex Fedorov

Eloy Geenjaar

Lei Wu

Tristan Sylvain

Thomas P. DeRamus

Margaux Luck

Maria Misiura

Girish Mittapalle

R. Devon Hjelm

Sergey M. Plis

Vince D. Calhoun

In recent years, deep learning approaches have gained significant attention in predicting brain disorders using neuroimaging data. However, … (see more)conventional methods often rely on single-modality data and supervised models, which provide only a limited perspective of the intricacies of the highly complex brain. Moreover, the scarcity of accurate diagnostic labels in clinical settings hinders the applicability of the supervised models. To address these limitations, we propose a novel self-supervised framework for extracting multiple representations from multimodal neuroimaging data to enhance group inferences and enable analysis without resorting to labeled data during pre-training. Our approach leverages Deep InfoMax (DIM), a self-supervised methodology renowned for its efficacy in learning representations by estimating mutual information without the need for explicit labels. While DIM has shown promise in predicting brain disorders from single-modality MRI data, its potential for multimodal data remains untapped. This work extends DIM to multimodal neuroimaging data, allowing us to identify disorder-relevant brain regions and explore multimodal links. We present compelling evidence of the efficacy of our multimodal DIM analysis in uncovering disorder-relevant brain regions, including the hippocampus, caudate, insula, - and multimodal links with the thalamus, precuneus, and subthalamus hypothalamus. Our self-supervised representations demonstrate promising capabilities in predicting the presence of brain disorders across a spectrum of Alzheimer's phenotypes. Comparative evaluations against state-of-the-art unsupervised methods based on autoencoders, canonical correlation analysis, and supervised models highlight the superiority of our proposed method in achieving improved classification performance, capturing joint information, and interpretability capabilities. The computational efficiency of the decoder-free strategy enhances its practical utility, as it saves compute resources without compromising performance. This work offers a significant step forward in addressing the challenge of understanding multimodal links in complex brain disorders, with potential applications in neuroimaging research and clinical diagnosis.

2023-12-31

NeuroImage (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications