Publications

Gradient Dissent in Language Model Training and Saturation

We seek to shed light on language model (LM) saturation from the perspective of learning dynamics. To this end, we define a decomposition o… (see more)f the cross-entropy gradient, which forms a shared low-dimensional basis for analyzing the training dynamics of models across scales. Intuitively, this decomposition consists of attractive and repulsive components that increase the logit of the correct class and decrease the logits of incorrect classes, respectively. Our analysis in this subspace reveals a phenomenon we term \textit{gradient dissent}, characterized by gradient components becoming systematically opposed such that loss cannot be improved along one component without being degraded along the other. Notably, we find that complete opposition, which we term \textit{total dissent}, reliably occurs in tandem with the saturation of smaller LMs. Based on these results, we hypothesize that gradient dissent can provide a useful foundation for better understanding and mitigating saturation.

2024-06-15

ICML.cc/2024/Workshop/HiLD (poster)

openreview.net

Inpainting Galaxy Counts onto N-Body Simulations over Multiple Cosmologies and Astrophysics

Matthew Ho

Laurence Perreault-Levasseur

2024-06-15

ICML.cc/2024/Workshop/AI4Science (poster)

openreview.net

Local lateral connectivity is sufficient for replicating cortex-like topographical organization in deep neural networks

Xinyu Qian

Amir Ozhan Dehghani

Asa Borzabadi Farahani

Pouya Bashivan

Across the primate cortex, neurons that perform similar functions tend to be spatially grouped together. This biological principle extends t… (see more)o many other species as well, reflecting a common way of organizing sensory processing across diverse forms of life. In the visual cortex, this biological principle manifests itself as a modular organization of neuronal clusters, each tuned to a specific visual property. The tendency toward short connections is widely believed to explain the existence of such an organization in the brains of many animals. However, the neural mechanisms underlying this phenomenon remain unclear. Here, we use artificial deep neural network models to demonstrate that a topographical organization akin to that in the primary, intermediate, and high-level human visual cortex emerges when units in these models are locally laterally connected and their weight parameters are tuned by top-down credit assignment. The emergence of modular organization without explicit topography-inducing learning rules or objective functions challenges their necessity and suggests that local lateral connectivity alone may suffice for the formation of topographic organization across the cortex. Furthermore, the incorporation of lateral connections in deep convolutional networks enhances their robustness to subtle alterations in visual inputs, such as those designed to deceive the model (i.e. adversarial examples), indicating an additional role for these connections in learning robust representations.

2024-06-15

ICML.cc/2024/Workshop/AI4Science (poster)

doi.org

openreview.net

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Oren Kraus

Kian Kenyon-Dean

Saber Saberian

Maryam Fallah

Peter McLean

Jess Leung

Vasudev Sharma

Ayla Khan

Jia Balakrishnan

Safiye Celik

Dominique Beaini

Maciej Sypetkowski

Chi Vicky Cheng

Kristen Morse

Maureen Makes

Ben Mabey

Berton Earnshaw

2024-06-15

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

doi.org

arxiv.org

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Jack Urbanek

Florian Bordes

Pietro Astolfi

Mary Williamson

Vasu Sharma

Adriana Romero

2024-06-15

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

doi.org

arxiv.org

A Survey on Fairness Without Demographics

Patrik Joslin Kenfack

Éts Montréal

S Ebrahimi Kahou

Ulrich Matchi Aïvodji

The issue of bias in Machine Learning (ML) models is a significant challenge for the machine learning community. Real-world biases can be em… (see more)bedded in the data used to train models, and prior studies have shown that ML models can learn and even amplify these biases. This can result in unfair treatment of individuals based on their inherent characteristics or sensitive attributes such as gender, race, or age. Ensuring fairness is crucial with the increasing use of ML models in high-stakes scenarios and has gained significant attention from researchers in recent years. However, the challenge of ensuring fairness becomes much greater when the assumption of full access to sensitive attributes does not hold. The settings where the hypothesis does not hold include cases where (1) only limited or noisy demographic information is available or (2) demographic information is entirely unobserved due to privacy restrictions. This survey reviews recent research efforts to enforce fairness when sensitive attributes are missing. We propose a taxonomy of existing works and, more importantly, highlight current challenges and future research directions to stimulate research in ML fairness in the setting of missing sensitive attributes.

2024-06-15

TMLR (accepted)

openreview.net

The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge

Gül Sena Altıntaş

Devin Kwok

David Rolnick

Neural network training begins with a chaotic phase in which the network is sensitive to small perturbations, such as those caused by stocha… (see more)stic gradient descent (SGD). This sensitivity can cause identically initialized networks to diverge both in parameter space and functional similarity. However, the exact degree to which networks are sensitive to perturbation, and the sensitivity of networks as they transition out of the chaotic phase, is unclear. To address this uncertainty, we apply a controlled perturbation at a single point in training time and measure its effect on otherwise identical training trajectories. We find that both the

2024-06-15

ICML.cc/2024/Workshop/HiLD (poster)

openreview.net

TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations

Bo Sun

Thibault Groueix

Chen Song

Qixing Huang

Noam Aigerman

2024-06-15

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

doi.org

arxiv.org

Using neural biomarkers to personalize dosing of vagus nerve stimulation

Antonin Berthon

Lorenz Wernisch

Myrta Stoukidi

Michael Thornton

Olivier Tessier-Larivière

Pascal Fortier-Poisson

Jorin Mamen

Max Pinkney

Susannah Lee

Elvijs Sarkans

Luca Annecchino

Ben Appleton

Philip Garsed

Bret Patterson

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Tristan Edwards

Aleksi Tukiainen

Joel Jennings … (see 3 more)

Guillaume Lajoie

Emil Hewage

Oliver Armitage

Vagus nerve stimulation (VNS) is an established therapy for treating a variety of chronic diseases, such as epilepsy, depression, obesity, a… (see more)nd for stroke rehabilitation. However, lack of precision and side-effects have hindered its efficacy and extension to new conditions. To achieve a better understanding of the relationship between VNS parameters and neural and physiological responses to enable the design of personalized dosing procedures to improve precision and efficacy of VNS therapies. We used biomarkers from recorded evoked neural activity and short-term physiological responses (throat muscle, cardiac and respiratory activity) to understand the response to a wide range of VNS parameters in anaesthetised pigs. Using signal processing, Gaussian processes (GP) and parametric regression models we analyse the relationship between VNS parameters and neural and physiological responses. Firstly, we observe inter-subject variability for both neural and physiological responses. Secondly, we illustrate how considering multiple stimulation parameters in VNS dosing can improve the efficacy and precision of VNS therapies. Thirdly, we describe the relationship between different VNS parameters and the evoked neural activity and show how spatially selective electrodes can be used to improve fibre recruitment. Fourthly, we provide a detailed exploration of the relationship between the activations of neural fibre types and different physiological effects, and show that recordings of evoked neural activity are powerful biomarkers for predicting the short-term physiological effects of VNS. Finally, based on these results, we discuss how recordings of evoked neural activity can help design VNS dosing procedures that optimize short-term physiological effects safely and efficiently. Understanding of evoked neural activity during VNS provide powerful biomarkers that could improve the precision, safety and efficacy of VNS therapies.

2024-06-15

Bioelectronic Medicine (published)

doi.org

Variable Star Light Curves in Koopman Space

Nicolas Mekhaël

Mario Pasquato

GAIA CARENINI

Vittorio F. Braga

PIERO TREVISAN

Giuseppe Bono

Yashar Hezaveh

2024-06-15

ICML.cc/2024/Workshop/AI4Science (spotlight)

openreview.net

The past as an imaginative resource. Hybridity, pattern and adaptation in Alexandru I. Alexandrescu’s historical novel

A.R. Olteanu

2024-06-14

Quaestiones Romanicae (published)

doi.org

Using machine learning to predict student science achievement based on science curriculum type in TIMSS 2019

Yajie Song

Maria Cutumisu

2024-06-14

International Journal of Science Education (published)

doi.org

Mila Techaide 2026

Disinformation 2.0: When AI Blurs the Lines

AI Advantage: Productivity in Public Service

Publications

Mila Techaide 2026

Disinformation 2.0: When AI Blurs the Lines

AI Advantage: Productivity in Public Service

Popular keywords:

Publications