Publications

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning

Supriyo Chakraborty

Nima Chitsazan

This work aims to understand how scaling improves language models, specifically in terms of training dynamics. We find that language models … (see more)undergo loss deceleration early in training; an abrupt slowdown in the rate of loss improvement, resulting in piecewise linear behaviour of the loss curve in log-log space. Scaling up the model mitigates this transition by (1) decreasing the loss at which deceleration occurs, and (2) improving the log-log rate of loss improvement after deceleration. We attribute loss deceleration to a type of degenerate training dynamics we term zero-sum learning (ZSL). In ZSL, per-example gradients become systematically opposed, leading to destructive interference in per-example changes in loss. As a result, improving loss on one subset of examples degrades it on another, bottlenecking overall progress. Loss deceleration and ZSL provide new insights into the training dynamics underlying language model scaling laws, and could potentially be targeted directly to improve language models independent of scale. We make our code and artefacts available at: https://github.com/mirandrom/zsl

2025-01-01

ACL (1) (published)

doi.org

arxiv.org

Training Language Models to Self-Correct via Reinforcement Learning

Aviral Kumar

Vincent Zhuang

Rishabh Agarwal

Yi Su

John D Co-Reyes

Avi Singh

Kate Baumli

Shariq Iqbal

Colton Bishop

Rebecca Roelofs

Lei M Zhang

Kay McKinney

Disha Shrivastava

Cosmin Paduraru

George Tucker

Doina Precup

Feryal Behbahani

Aleksandra Faust

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffecti… (see more)ve in modern LLMs. Existing approaches for training self-correction either require multiple models or rely on a more capable model or other forms of supervision. To this end, we develop a multi-turn online reinforcement learning (RL) approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data. To build SCoRe, we first show that variants of supervised fine-tuning (SFT) on offline model-generated correction traces are insufficient for instilling self-correction behavior. In particular, we observe that training via SFT either suffers from a distribution mismatch between the training data and the model's own responses or implicitly prefers only a certain mode of correction behavior that is often not effective at test time. SCoRe addresses these challenges by training under the model's own distribution of self-generated correction traces and using appropriate regularization to steer the learning process into learning a self-correction strategy that is effective at test time as opposed to simply fitting high-reward responses for a given prompt. This regularization prescribes running a first phase of RL on a base model to generate a policy initialization that is less susceptible to collapse and then using a reward bonus to amplify self-correction during training. When applied to Gemini 1.0 Pro and 1.5 Flash models, we find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on the MATH and HumanEval benchmarks.

2025-01-01

ICLR (published)

doi.org

openreview.net

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Yoshua Bengio

Bhavya Kailkhura

2025-01-01

arXiv.org (preprint)

doi.org

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

Brian R. Bartoldson

Siddarth Venkatraman

James Diffenderfer

Moksh J. Jain

Tal Ben-Nun

Seanie Lee

Minsu Kim

Johan Samir Obando Ceron

Yoshua Bengio

Bhavya Kailkhura

2025-01-01

arXiv.org (preprint)

doi.org

TransCeption: Enhancing medical image segmentation with an inception-like transformer design for efficient feature fusion

Reza Azad

Yiwei Jia

Ehsan Khodapanah Aghdam

Julien Cohen-Adad

Dorit Merhof

2025-01-01

Computational Visual Media (published)

doi.org

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey

Emily Sheng

Su Lin Blodgett

Alexandra Chouldechova

Jean Garcia-Gathright

Alexandra Olteanu

Hanna Wallach

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language mo… (see more)del (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers impeding their uptake. Drawing on measurement theory and pragmatic measurement, we provide recommendations for addressing these challenges to better meet practitioner needs.

2025-01-01

ACL (Findings) (published)

doi.org

arxiv.org

Variation Matters: from Mitigating to Embracing Zero-Shot NAS Ranking Function Variation

Pavel Rumiantsev

Mark Coates

Neural Architecture Search (NAS) is a powerful automatic alternative to manual design of a neural network. In the zero-shot version, a fast … (see more)ranking function is used to compare architectures without training them. The outputs of the ranking functions often vary significantly due to different sources of randomness, including the evaluated architecture's weights' initialization or the batch of data used for calculations. A common approach to addressing the variation is to average a ranking function output over several evaluations. We propose taking into account the variation in a different manner, by viewing the ranking function output as a random variable representing a proxy performance metric. During the search process, we strive to construct a stochastic ordering of the performance metrics to determine the best architecture. Our experiments show that the proposed stochastic ordering can effectively boost performance of a search on standard benchmark search spaces.

2025-01-01

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Visual Story-Writing: Writing by Manipulating Visual Representations of Stories

Damien Masson

Zixin Zhao

Fanny Chevalier

We define"visual story-writing"as using visual representations of story elements to support writing and revising narrative texts. To demonst… (see more)rate this approach, we developed a text editor that automatically visualizes a graph of entity interactions, movement between locations, and a timeline of story events. Interacting with these visualizations results in suggested text edits: for example, connecting two characters in the graph creates an interaction between them, moving an entity updates their described location, and rearranging events on the timeline reorganizes the narrative sequence. Through two user studies on narrative text editing and writing, we found that visuals supported participants in planning high-level revisions, tracking story elements, and exploring story variations in ways that encourage creativity. Broadly, our work lays the foundation for writing support, not just through words, but also visuals.

2025-01-01

UIST (published)

doi.org

arxiv.org

Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation

Senyu Li

Zipeng Sun

Jiayi Wang

Xue (Steve) Liu

Pontus Stenetorp

Siva Reddy

David Ifeoluwa Adelani

2025-01-01

ACL (1) (published)

doi.org

arxiv.org

Sociodemographic characteristics of SARS-CoV-2 serosurveillance studies with diverse recruitment strategies, Canada, 2020 to 2023

Matthew J Knight

Yuan Yu

Jiacheng Chen

Sheila F O’Brien

David Buckeridge

Carmen Charlton

W Alton Russell

Background. Serological testing was a key component of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) surveillance. Social dis… (see more)tancing interventions, resource limitations, and the need for timely data led to serosurveillance studies using a range of recruitment strategies, which likely influenced study representativeness. Characterizing representativeness in surveillance is crucial to identify gaps in sampling coverage and to assess health inequities. Methods. We retrospectively analyzed three pre-existing longitudinal cohorts, two convenience samples using residual blood, and one de novo probabilistic survey conducted in Canada between April 2020 - November 2023. We calculated study specimen counts by age, sex, urbanicity, race/ethnicity, and neighborhood deprivation quintiles. We derived a 'representation ratio' as a simple metric to assess generalizability to a target population and various sociodemographic strata. Results. The six studies included 1,321,675 specimens. When stratifying by age group and sex, 65% of racialized minority subgroups were moderately underrepresented (representation ratio 0.75). Representation was generally higher for older Canadians, urban neighborhoods, and neighborhoods with low material deprivation. Rural representation was highest in a study that used outpatient laboratory blood specimens. Racialized minority representation was highest in a de novo probabilistic survey cohort. Conclusion. While no study had adequate representation of all subgroups, less traditional recruitment strategies were more representative of some population dimensions. Understanding demographic representativeness and barriers to recruitment are important considerations when designing population health surveillance studies.

2024-12-31

medRxiv (preprint)

doi.org

The Romantic Historicism and The Rise of the Historical Novel in the 19th Century Romanian Literature

Alexandra Olteanu

2024-12-31

Philobiblon. Transylvanian Journal of Multidisciplinary Research in the Humanities (published)

doi.org

Medium-scale flexible integrated circuits based on 2D semiconductors

Yalin Peng

Chenyang Cui

Li Li

Yuchen Wang

Qinqin Wang

Jinpeng Tian

Zhiheng Huang

Biying Huang

Yangkun Zhang

Xiuzhen Li

Jian Tang

Yanbang Chu

Wei Yang

Dongxia Shi

Luojun Du

Na Li

Guangyu Zhang

2024-12-30

Nature Communications (published)

doi.org

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Publications

Custom AI Learning Programs

Mil'Haq Fest 2025

Mila Community of Practice

Supervision Requests

Popular keywords:

Publications