Uncovering a Universal Abstract Algorithm for Modular Addition in Neural Networks
Gavin McCracken
Gabriela Moisescu-Pareja
Vincent Létourneau
Jonathan Love
We propose a testable universality hypothesis, asserting that seemingly disparate neural network solutions observed in the simple task of mo… (see more)dular addition are unified under a common abstract algorithm. While prior work interpreted variations in neuron-level representations as evidence for distinct algorithms, we demonstrate - through multi-level analyses spanning neurons, neuron clusters, and entire networks - that multilayer perceptrons and transformers universally implement the abstract algorithm we call the approximate Chinese Remainder Theorem. Crucially, we introduce approximate cosets and show that neurons activate exclusively on them. Furthermore, our theory works for deep neural networks (DNNs). It predicts that universally learned solutions in DNNs with trainable embeddings or more than one hidden layer require only O(log n) features, a result we empirically confirm. This work thus provides the first theory-backed interpretation of multilayer networks solving modular addition. It advances generalizable interpretability and opens a testable universality hypothesis for group multiplication beyond modular addition.
VinePPO: Refining Credit Assignment in RL Training of LLMs
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (see more)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.
Virtual Cells: Predict, Explain, Discover
Emmanuel Noutahi
Jason Hartford
Prudencio Tossou
Shawn Whitfield
Ali Denton
Cas Wognum
Kristina Ulicna
Michael Craig
Jonathan Hsu
Michael Cuccarese
Christopher Gibson
Daniel Cohen
Berton Earnshaw
When to retrain a machine learning model
Florence Regol
Leo Schwinn
Kyle Sprague
Thomas Markovich
A significant challenge in maintaining real-world machine learning models is responding to the continuous and unpredictable evolution of dat… (see more)a. Most practitioners are faced with the difficult question: when should I retrain or update my machine learning model? This seemingly straightforward problem is particularly challenging for three reasons: 1) decisions must be made based on very limited information - we usually have access to only a few examples, 2) the nature, extent, and impact of the distribution shift are unknown, and 3) it involves specifying a cost ratio between retraining and poor performance, which can be hard to characterize. Existing works address certain aspects of this problem, but none offer a comprehensive solution. Distribution shift detection falls short as it cannot account for the cost trade-off; the scarcity of the data, paired with its unusual structure, makes it a poor fit for existing offline reinforcement learning methods, and the online learning formulation overlooks key practical considerations. To address this, we present a principled formulation of the retraining problem and propose an uncertainty-based method that makes decisions by continually forecasting the evolution of model performance evaluated with a bounded metric. Our experiments, addressing classification tasks, show that the method consistently outperforms existing baselines on 7 datasets. We thoroughly assess its robustness to varying cost trade-off values and mis-specified cost trade-offs.
Caffeine induces age-dependent increases in brain complexity and criticality during sleep
Philipp Thölke
Maxine Arcand-Lavigne
Tarek Lajnef
Sonia Frenette
Julie Carrier
ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties
Kevin Bijan Givechian
João Felipe Rocha
Edward Yang
Chen Liu
Kerrie Greene
Rex Ying
Etienne Caron
Akiko Iwasaki
Rootlets-based registration to the spinal cord PAM50 template
Sandrine B'edard
Jan Valovsek
Valeria Oliva
Kenneth A. Weber
JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects
Kaveh Shahedi
Maxime Lamothe
Heng Li
Performance regressions and improvements are common phenomena in software development, occurring periodically as software evolves and mature… (see more)s. When developers introduce new changes to a program’s codebase, unforeseen performance variations may arise. Identifying these changes at the method level, however, can be challenging due to the complexity and scale of modern codebases. In this work, we present JPerfEvo, a tool designed to automate the evaluation of the method-level performance impact of each code commit (i.e., the performance variations between the two versions before and after a commit). Leveraging the Java Microbenchmark Harness (JMH) module for benchmarking the modified methods, JPerfEvo instruments their execution and applies robust statistical evaluations to detect performance changes. The tool can classify these changes as performance improvements, regressions, or neutral (i.e., no change), with the change magnitude. We evaluated JPerfEvo on three popular and mature open-source Java projects, demonstrating its effectiveness in identifying performance changes throughout their development histories.
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Mouad Abrini
Omri Abend
Dina M. Acklin
Henny Admoni
Gregor Aichinger
Nitay Alon
Zahra Ashktorab
Ashish Atreja
Moises Auron
Alexander Aufreiter
Raghav Awasthi
Soumya Banerjee
Joseph Barnby
Rhea Basappa
Severin Bergsmann
Djallel Bouneffouf
Patrick Callaghan
Marc Cavazza
Thierry Chaminade
Sonia Chernova … (see 88 more)
Mohamed Chetouan
Moumita Choudhury
Axel Cleeremans
J. Cywinski
Fabio Cuzzolin
Hokin Deng
N'yoma Diamond
C. D. Pasquasio
Max J. van Duijn
Mahapatra Dwarikanath
Qingying Gao
Ashok Goel
Rebecca R. Goldstein
Matthew C. Gombolay
Gabriel Enrique Gonzalez
Amar Halilovic
Tobias Halmdienst
Mahimul Islam
Julian Jara-Ettinger
Natalie Kastel
Renana Keydar
Ashish K. Khanna
Mahdi Khoramshahi
Jihyun Kim
Mihyeon Kim
Youngbin Kim
Senka Krivic
Nikita Krasnytskyi
Arun Kumar
Junehyoung Kwon
EunJu Lee
Shane Lee
Peter R. Lewis 0001
Xue Li
Yijiang Li
Michal Lewandowski
Nathan Lloyd
Matthew B. Luebbers
Dezhi Luo
Haiyun Lyu
Dwarikanath Mahapatra
Kamal Maheshwari
Mallika Mainali
P. Mathur
Patrick Mederitsch
Shuwa Miura
Manuel Preston de Miranda
Reuth Mirsky
Shreya Mishra
Nina M. Moorman
Katelyn Morrison
John Muchovej
Bernhard Nessler
Felix Nessler
Hieu Minh Jord Nguyen
Abby Ortego
F. Papay
Antoine Pasquali
Hamed Rahimi
C. Raghu
Amanda L. Royka
Stefan Sarkadi
Jaelle Scheuerman
Simon Schmid
Paul Schrater
Anik Sen
Zahra Sheikhbahaee
Ke Shi
Reid G. Simmons
Nishant Singh
Mason O. Smith
Ramira van der Meulen
Anthia Solaki
Haoran Sun
Viktor Szolga
Matthew E. Taylor
Travis Taylor
Sanne van Waveren
Juan David Vargas
R. Verbrugge
Eitan Wagner
Justin D. Weisz
Ximing Wen
William Yeoh
Wenlong Zhang
Michelle Zhao
Shlomo Zilberstein
Solving Combinatorial Pricing Problems using Embedded Dynamic Programming Models
Quang Minh Bui
José Neto
The combinatorial pricing problem (CPP) is a bilevel problem in which the leader maximizes their revenue by imposing tolls on certain items … (see more)that they can control. Based on the tolls set by the leader, the follower selects a subset of items corresponding to an optimal solution of a combinatorial optimization problem. To accomplish the leader's goal, the tolls need to be sufficiently low to discourage the follower from choosing the items offered by the competitors. In this paper, we derive a single-level reformulation for the CPP by rewriting the follower's problem as a longest path problem using a dynamic programming model, and then taking its dual and applying strong duality. We proceed to solve the reformulation in a dynamic fashion with a cutting plane method. We apply this methodology to 2 distinct dynamic programming models, namely, a novel formulation designated as selection diagram and the well-known decision diagram. We also produce numerical results to evaluate their performances across 3 different specializations of the CPP and a closely related problem that is the knapsack interdiction problem. Our results showcase the potential of the 2 proposed reformulations over the natural value function approach, expanding the set of tools to solve combinatorial bilevel programs.
How Programmers Interact with Multimodal Software Documentation
Deeksha M. Arya
Martin P. Robillard
There is a wide variety of online documentation to learn about a given software technology, and prior research has reported that programmers… (see more) must invest time and effort to identify one that best suits their need. We evaluated five modalities to present information that enable a software document to cater to the different presentation needs of programmers. We developed a prototype tutorial with these modalities on three topics in Java, namely, regular expressions, inheritance, and exception handling. We investigated how people interact with the modalities in the tutorial given a programming topic and a type of task. We conducted a survey study with 56 respondents and confirm that although text content is most useful for solving conceptual tasks, code examples support deeper comprehension of the underlying concepts. Furthermore, we report that respondents' contradicting preferences for the modalities suggest the need to have multiple alternatives in a software tutorial.
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Prateek Yadav
Colin Raffel
Mohammed Muqeeth
Lucas Caccia
Haokun Liu
Tianlong Chen
Mohit Bansal
Leshem Choshen
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particula… (see more)r domain or task. Model MoErging methods aim to recycle expert models to create an aggregate system with improved performance or generalization. A key component of MoErging methods is the creation of a router that decides which expert model(s) to use for a particular input or application. The promise, effectiveness, and large design space of MoErging has spurred the development of many new methods over the past few years. This rapid pace of development has made it challenging to compare different MoErging methods, which are rarely compared to one another and are often validated in different experimental setups. To remedy such gaps, we present a comprehensive survey of MoErging methods that includes a novel taxonomy for cataloging key design choices and clarifying suitable applications for each method. Apart from surveying MoErging research, we inventory software tools and applications that make use of MoErging. We additionally discuss related fields of study such as model merging, multitask learning, and mixture-of-experts models. Taken as a whole, our survey provides a unified overview of existing MoErging methods and creates a solid foundation for future work in this burgeoning field.