Publications

Field-Level Comparison and Robustness Analysis of Cosmological N-Body Simulations
Adrian E. Bayer
Francisco Villaescusa-navarro
Romain Teyssier
Lehman H. Garrison
Greg L. Bryan
Marco Gatti
E. Visbal
Generalizable Imitation Learning Through Pre-Trained Representations
Wei-Di Chang
Francois Hogan
Scott Fujimoto
In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abil… (see more)ities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.
Half Search Space is All You Need
Pavel Rumiantsev
Learning active tactile perception through belief-space control
Jean-François Tremblay
Johanna Hansen
Francois Hogan
Robot operating in an open world can encounter novel objects with unknown physical properties, such as mass, friction, or size. It is desira… (see more)ble to be able to sense those property through contact-rich interaction, before performing downstream tasks with the objects. We propose a method for autonomously learning active tactile perception policies, by learning a generative world model leveraging a differentiable bayesian filtering algorithm, and designing an information- gathering model predictive controller. We test the method on three simulated tasks: mass estimation, height estimation and toppling height estimation. Our method is able to discover policies which gather information about the desired property in an intuitive manner.
RobusTAD: reference panel based annotation of nested topologically associating domains
Yanlin Zhang
Rola Dali
Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (see more) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.
Topological mapping for traversability-aware long-range navigation in off-road terrain
Jean-François Tremblay
Louis Petit
Faraz Lotfi
Lara Landauro
Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been stud… (see more)ied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topological map, which is a set of panoramic snapshots connected with edges containing traversability information. A novel traversability analysis method is demonstrated, predicting the existence of a safe path towards a target in an image. Navigating between nodes is done using goal-conditioned behavior cloning, leveraging the power of a pretrained vision transformer. An exploration planner is presented, efficiently covering an unknown off-road area with unknown traversability using a frontiers-based approach. The approach is successfully deployed to autonomously explore two 400 meters squared forest sites unseen during training, in difficult conditions for navigation.
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting
Mike He Zhu
Na Li
Xiaoxiao Li
Dianbo Liu
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre R. Falet
Oliver E. Richardson
Moksh J. Jain
Sungjin Ahn
Sungsoo Ahn
A multi-ancestry genetic reference for the Quebec population
Peyton McClelland
Georgette Femerling
R. Laflamme
Alejandro Mejia-Garcia
Mohadese Sayahian Dehkordi
Hongyu Xiao
Alex Diaz-Papkovich
Justin Pelletier
Jean-Christophe Grenier
Ken Sin Lo
Luke Anderson-Trocmé
Justin Bellavance
Vincent Chapdelaine
Genevieve Gagnon
Annelie De Mori
Gerardo Martinez
Kristen Mohler
Thibault de Malliard
Catherine Labbé
Marjorie Labrecque … (see 14 more)
Alexandre Montpetit
D. Spiegelman
Guy A. Rouleau
Jean-francois Théroux
Hufeng Zhou
Simon L. Girard
Anne-Marie Laberge
Claude Bhérer
Martine Tétreault
Sarah A. Gagliano Taliun
Daniel Taliun
Simon Gravel
Guillaume Lettre
While international efforts have characterized genetic variation in millions of individuals, the interplay of environmental, social, cultura… (see more)l, and genetic factors is poorly understood for most worldwide populations. The province of Quebec in Canada has been the site of numerous genetic studies, often focusing on individual Mendelian diseases in founder sub-populations. Here, we profiled and analyzed genome-wide genotyped variation in 29,337 Quebec residents from the large population-based cohort CARTaGENE (CaG), including rich phenotype and environmental data. We also sequenced the whole-genome of 2,173 CaG participants, including 163 and 132 individuals with grandparents born in Haiti and Morocco, respectively. We use this genetic information to gain insight into Quebec's demography and to help interpret the potential significance of variants identified in clinically important genes. We built an imputation panel by phasing the CaG whole-genome sequence data and showed, using genome-wide association studies (GWAS), how it improves the discovery of phenotype-genotype associations in this population. We provide allele frequency information and GWAS results through dedicated and publicly available websites. The genetic data, paired with phenotypic and environmental information, is also available for research use upon scientific and ethical review.
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Lucas Caccia
Franccois Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (see more) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Lucas Caccia
Franccois Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila