Introduction to the special issue on Computational Terminology
Patrick Drouin
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks
Gavin McCracken
Gabriela Moisescu-Pareja
Vincent Létourneau
Jonathan Love
We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of mo… (voir plus)dular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.
Dimension-adapted Momentum Outscales SGD
Damien Ferbach
Katie Everett
Elliot Paquette
Dimension-adapted Momentum Outscales SGD
Damien Ferbach
Katie Everett
Elliot Paquette
We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by dat… (voir plus)a complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.
Structure-Aligned Protein Language Model
Can Chen
David Heurtel-Depeiges
Robert M. Vernon
Christopher J. Langmead
Quentin Fournier
Structure-Aligned Protein Language Model
Can Chen
David Heurtel-Depeiges
Robert M. Vernon
Christopher J. Langmead
Quentin Fournier
ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties
Kevin Bijan Givechian
João Felipe Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki
Adaptive Cyclic Diffusion for Inference Scaling
Gyubin Lee
Truong Nhat Nguyen Bao
Jaesik Yoon
Dongwoo Lee
Minsu Kim
Sungjin Ahn
Adaptive Cyclic Diffusion for Inference Scaling
Gyubin Lee
Truong Nhat Nguyen Bao
Jaesik Yoon
Dongwoo Lee
Minsu Kim
Sungjin Ahn
Determinants of surgical approach to pediatric appendicitis in Brazil.
Ayla Gerk
Paulo Henrique Moreira Melo
Mohsen Amoei
Shreenik Kundu
Luiza Telles
Justina O. Seyi-Olajide
Dunya Moghul
Gabriel Schnitman
Cristina Camargo
David P. Mooney
Joaquim Bustorff-Silva
Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy
Max Schwarzer
Jesse Farebrother
Joshua Greaves
Ekin Dogus Cubuk
Sergei Kalinin
Igor Mordatch
Kevin M Roccapriore
We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimu… (voir plus)lated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.