Publications

Self-Evolving Curriculum for LLM Reasoning

Nicolas Gontier

Ehsan Kamalloo

Reinforcement learning (RL) has proven effective for fine-tuning large language models (LLMs), significantly enhancing their reasoning abili… (see more)ties in domains such as mathematics and code generation. A crucial factor influencing RL fine-tuning success is the training curriculum: the order in which training problems are presented. While random curricula serve as common baselines, they remain suboptimal; manually designed curricula often rely heavily on heuristics, and online filtering methods can be computationally prohibitive. To address these limitations, we propose Self-Evolving Curriculum (SEC), an automatic curriculum learning method that learns a curriculum policy concurrently with the RL fine-tuning process. Our approach formulates curriculum selection as a non-stationary Multi-Armed Bandit problem, treating each problem category (e.g., difficulty level or problem type) as an individual arm. We leverage the absolute advantage from policy gradient methods as a proxy measure for immediate learning gain. At each training step, the curriculum policy selects categories to maximize this reward signal and is updated using the TD(0) method. Across three distinct reasoning domains: planning, inductive reasoning, and mathematics, our experiments demonstrate that SEC significantly improves models'reasoning capabilities, enabling better generalization to harder, out-of-distribution test problems. Additionally, our approach achieves better skill balance when fine-tuning simultaneously on multiple reasoning domains. These findings highlight SEC as a promising strategy for RL fine-tuning of LLMs.

2025-05-20

ArXiv (preprint)

doi.org

arxiv.org

Self-Evolving Curriculum for LLM Reasoning

Nicolas Gontier

Ehsan Kamalloo

Reinforcement learning (RL) has proven effective for fine-tuning large language models (LLMs), significantly enhancing their reasoning abili… (see more)ties in domains such as mathematics and code generation. A crucial factor influencing RL fine-tuning success is the training curriculum: the order in which training problems are presented. While random curricula serve as common baselines, they remain suboptimal; manually designed curricula often rely heavily on heuristics, and online filtering methods can be computationally prohibitive. To address these limitations, we propose Self-Evolving Curriculum (SEC), an automatic curriculum learning method that learns a curriculum policy concurrently with the RL fine-tuning process. Our approach formulates curriculum selection as a non-stationary Multi-Armed Bandit problem, treating each problem category (e.g., difficulty level or problem type) as an individual arm. We leverage the absolute advantage from policy gradient methods as a proxy measure for immediate learning gain. At each training step, the curriculum policy selects categories to maximize this reward signal and is updated using the TD(0) method. Across three distinct reasoning domains: planning, inductive reasoning, and mathematics, our experiments demonstrate that SEC significantly improves models'reasoning capabilities, enabling better generalization to harder, out-of-distribution test problems. Additionally, our approach achieves better skill balance when fine-tuning simultaneously on multiple reasoning domains. These findings highlight SEC as a promising strategy for RL fine-tuning of LLMs.

2025-05-20

ArXiv (preprint)

doi.org

arxiv.org

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-05-20

ArXiv (preprint)

arxiv.org

Building spatial world models from sparse transitional episodic memories

Zizhan He

Maxime Daigle

Pouya Bashivan

2025-05-19

ArXiv (preprint)

arxiv.org

Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down

Yingzhi Wang

Anas Alhmoud

Saad Alsahly

Muhammad Alqurishi

Mirco Ravanelli

2025-05-19

ArXiv (preprint)

arxiv.org

Field-Level Comparison and Robustness Analysis of Cosmological N-Body Simulations

Adrian E. Bayer

Francisco Villaescusa-navarro

Sammy Nasser Sharief

Romain Teyssier

Lehman H. Garrison

Laurence Perreault-Levasseur

Greg L. Bryan

Marco Gatti

E. Visbal

2025-05-19

ArXiv (preprint)

arxiv.org

Field-Level Comparison and Robustness Analysis of Cosmological N-Body Simulations

Adrian E. Bayer

Francisco Villaescusa-navarro

Sammy Nasser Sharief

Romain Teyssier

Lehman H. Garrison

Laurence Perreault-Levasseur

Greg L. Bryan

Marco Gatti

E. Visbal

2025-05-19

ArXiv (preprint)

doi.org

arxiv.org

Generalizable Imitation Learning Through Pre-Trained Representations

Wei-Di Chang

Francois Hogan

Scott Fujimoto

David Meger

Gregory Dudek

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (see more)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

2025-05-19

2025 IEEE International Conference on Robotics and Automation (ICRA) (published)

doi.org

arxiv.org

Half Search Space is All You Need

Pavel Rumiantsev

Mark Coates

2025-05-19

ArXiv (preprint)

arxiv.org

RobusTAD: reference panel based annotation of nested topologically associating domains

Yanlin Zhang

Rola Dali

Mathieu Blanchette

Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (see more) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.

2025-05-19

Genome Biology (published)

doi.org

Topological mapping for traversability-aware long-range navigation in off-road terrain

Jean-François Tremblay

Julie Alhosh

Louis Petit

Faraz Lotfi

Lara Landauro

David Meger

Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been stud… (see more)ied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topological map, which is a set of panoramic snapshots connected with edges containing traversability information. A novel traversability analysis method is demonstrated, predicting the existence of a safe path towards a target in an image. Navigating between nodes is done using goal-conditioned behavior cloning, leveraging the power of a pretrained vision transformer. An exploration planner is presented, efficiently covering an unknown off-road area with unknown traversability using a frontiers-based approach. The approach is successfully deployed to autonomously explore two 400 meters squared forest sites unseen during training, in difficult conditions for navigation.

2025-05-19

2025 IEEE International Conference on Robotics and Automation (ICRA) (published)

doi.org

arxiv.org

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava

Siamak Ravanbakhsh

Adam M. Oberman

2025-05-17

ArXiv (preprint)

arxiv.org