Alessandro Sordoni

Instilling Parallel Reasoning into Language Models

Matthew Macfarlane

Minseon Kim

Nebojsa Jojic

Weijia Xu

Lucas Caccia

Xingdi Yuan

Wanru Zhao

Zhengyan Shi

Sequential chain-of-thought reasoning significantly improves the performance of Large language models (LLMs) on complex tasks. However, sequ… (voir plus)ential reasoning has structural limitations: Long chains are expensive due to attention's quadratic complexity, and multiple diverse strategies cannot be considered simultaneously. To address this we propose a method that instills parallel reasoning capabilities in LLMs by distilling parallel reasoning traces from a teacher model. This approach enables models to decompose problems, explore diverse strategies via concurrent reasoning traces, and aggregate trace outputs for the final answer. Evaluating on a variety of math and puzzle benchmarks such as MATH 500, AIME and Countdown, we show our approach can decompose parallelizable problems, and that the performance scales with the number of parallel traces. The resulting model can dynamically allocate reasoning strategies based on problem complexity, outperforming standard sampling methods.

2025-07-09

ICML.cc/2025/Workshop/AI4MATH (poster)

Learning to Solve Complex Problems via Dataset Decomposition

Wanru Zhao

Lucas Caccia

Zhengyan Shi

Minseon Kim

Xingdi Yuan

Weijia Xu

Marc-Alexandre Côté

Curriculum learning is a class of training strategies that organizes the data being exposed to a model by difficulty, gradually from simpler… (voir plus) to more complex examples. This research explores a reverse curriculum generation approach that recursively decomposes complex datasets into simpler, more learnable components. We propose a teacher-student framework where the teacher is equipped with the ability to reason step-by-step, which is used to recursively generate easier versions of examples, enabling the student model to progressively master difficult tasks. We propose a novel scoring system to measure data difficulty based on its structural complexity and conceptual depth, allowing curriculum construction over decomposed data. Experiments on math datasets (MATH and AIME) demonstrate that models trained with curricula generated by our approach exhibit superior performance compared to standard training on original datasets.

2025-07-09

ICML.cc/2025/Workshop/AI4MATH (poster)

Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings

Jean-Philippe Corbeil

Minseon Kim

Francois Beaulieu

Paul Vozila

2025-07-09

ArXiv (prépublication)

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Morgane M Moss

2025-07-09

ICML.cc/2025/Workshop/AI4MATH (poster)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment

Jean-Philippe Corbeil

Amin Dada

Jean-Michel Attendu

Asma Ben Abacha

Lucas Caccia

Franccois Beaulieu

Thomas Lin

Jens Kleesiek

Paul Vozila

2025-05-15

ArXiv (prépublication)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment

Jean-Philippe Corbeil

Amin Dada

Jean-Michel Attendu

Asma Ben Abacha

Lucas Caccia

Franccois Beaulieu

Thomas Lin

Jens Kleesiek

Paul Vozila

High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (voir plus) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.

2025-05-15

ArXiv (prépublication)

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Morgane M Moss

2025-05-07

ArXiv (prépublication)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment

Jean-Philippe Corbeil

Amin Dada

Jean-Michel Attendu

Asma Ben Abacha

Lucas Caccia

Franccois Beaulieu

Thomas Lin

Jens Kleesiek

Paul Vozila

High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (voir plus) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.

2025-05-01

arXiv (publié)

VinePPO: Refining Credit Assignment in RL Training of LLMs

Amirhossein Kazemnejad

Milad Aghajohari

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receivi… (voir plus)ng any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal Policy Optimization (PPO), a common reinforcement learning (RL) algorithm used for LLM finetuning, employs value networks to tackle credit assignment. However, recent approaches achieve strong results without it, raising questions about the efficacy of value networks in practice. In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they often produce poor estimate of expected return and barely outperform a random baseline when comparing alternative steps. This motivates our key question: Can improved credit assignment enhance RL training for LLMs? To address this, we propose VinePPO, a straightforward approach that leverages the flexibility of language environments to compute unbiased Monte Carlo-based estimates. Our method consistently outperforms PPO and other baselines across MATH and GSM8K datasets in less wall-clock time (up to 3.0x). Crucially, it achieves higher test accuracy for a given training accuracy, capturing more generalization signal per sample. These results emphasize the importance of accurate credit assignment in RL training of LLM.

2025-05-01

ICML.cc/2025/Conference (poster)

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

Marc-Alexandre Côté

2025-03-27

ArXiv (prépublication)

debug-gym: A Text-Based Environment for Interactive Debugging

Xingdi Yuan

Morgane M Moss

Charbel Feghali

Chinmay Singh

Darya Moldavskaya

Drew MacPhee

Lucas Caccia

Matheus Pereira

Minseon Kim

Marc-Alexandre Côté

2025-03-27

ArXiv (prépublication)