Half Search Space is All You Need
Pavel Rumiantsev
RobusTAD: reference panel based annotation of nested topologically associating domains
Yanlin Zhang
Rola Dali
Topologically associating domains (TADs) are fundamental units of 3D genomes and play essential roles in gene regulation. Hi-C data suggests… (see more) a hierarchical organization of TADs. Accurately annotating nested TADs from Hi-C data remains challenging, both in terms of the precise identification of boundaries and the correct inference of hierarchies. While domain boundary is relatively well conserved across cells, few approaches have taken advantage of this fact. Here, we present RobusTAD to annotate TAD hierarchies. It incorporates additional Hi-C data to refine boundaries annotated from the study sample. RobusTAD outperforms existing tools at boundary and domain annotation across several benchmarking tasks. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-025-03568-9.
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models
Lucas Berry
Axel Brando
Wei-Di Chang
Juan Higuera
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Mehran Shakerinava
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Mehran Shakerinava
FedWeight: mitigating covariate shift of federated learning on electronic health records data through patients re-weighting
Mike He Zhu
Jun Bai
Na Li
Xiaoxiao Li
Dianbo Liu
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre R. Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh J. Jain
Sungjin Ahn
Sungsoo Ahn
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre R. Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh J. Jain
Sungjin Ahn
Sungsoo Ahn
A multi-ancestry genetic reference for the Quebec population
Peyton McClelland
Georgette Femerling
R. Laflamme
Alejandro Mejia-Garcia
Mohadese Sayahian Dehkordi
Hongyu Xiao
Alex Diaz-Papkovich
Justin Pelletier
Jean-Christophe Grenier
Ken Sin Lo
Luke Anderson-Trocmé
Justin Bellavance
Vincent Chapdelaine
Geneviève Gagnon
Annelie De Mori
Gerardo Martinez
Kristen Mohler
Thibault de Malliard
Catherine Labbé
Marjorie Labrecque … (see 14 more)
Alexandre Montpetit
D. Spiegelman
Guy A. Rouleau
Jean-francois Théroux
Hufeng Zhou
Simon L. Girard
Anne-Marie Laberge
C. Bhérer
Martine Tétreault
Sarah A. Gagliano Taliun
Daniel Taliun
Simon Gravel
Guillaume Lettre
While international efforts have characterized genetic variation in millions of individuals, the interplay of environmental, social, cultura… (see more)l, and genetic factors is poorly understood for most worldwide populations. The province of Quebec in Canada has been the site of numerous genetic studies, often focusing on individual Mendelian diseases in founder sub-populations. Here, we profiled and analyzed genome-wide genotyped variation in 29,337 Quebec residents from the large population-based cohort CARTaGENE (CaG), including rich phenotype and environmental data. We also sequenced the whole-genome of 2,173 CaG participants, including 163 and 132 individuals with grandparents born in Haiti and Morocco, respectively. We use this genetic information to gain insight into Quebec's demography and to help interpret the potential significance of variants identified in clinically important genes. We built an imputation panel by phasing the CaG whole-genome sequence data and showed, using genome-wide association studies (GWAS), how it improves the discovery of phenotype-genotype associations in this population. We provide allele frequency information and GWAS results through dedicated and publicly available websites. The genetic data, paired with phenotypic and environmental information, is also available for research use upon scientific and ethical review.
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Lucas Caccia
Franccois Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment
Jean-Philippe Corbeil
Amin Dada
Jean-Michel Attendu
Asma Ben Abacha
Lucas Caccia
Franccois Beaulieu
Thomas Lin
Jens Kleesiek
Paul Vozila
High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (see more) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.
Persistent signs of poisoning after massive drug ingestion: move the ultrasound probe to the stomach.
N. Lautrou-cabasson
H. Pirollet
C. Lombois