Publications

MSR37 Improve Analyst Accuracy in Systematic Literature Reviews Using Reliant Tabular and LLM-Based Relevance Scoring

Christoph R. Schlegel

Sam Work

Marc Gendron-Bellemare

2025-07-01

Value in Health (published)

doi.org

MSR37 Improve Analyst Accuracy in Systematic Literature Reviews Using Reliant Tabular and LLM-Based Relevance Scoring

Christoph R. Schlegel

Sam Work

Marc Gendron-Bellemare

2025-07-01

Value in Health (published)

doi.org

Multi-Armed Sampling Problem and the End of Exploration

Mohammad Pedramfar

Siamak Ravanbakhsh

This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits. O… (see more)ur primary motivation is to rigorously examine the exploration-exploitation trade-off in the context of sampling. We systematically define plausible notions of regret for this framework and establish corresponding lower bounds. We then propose a simple algorithm that achieves these optimal regret bounds. Our theoretical results demonstrate that in contrast to optimization, sampling does not require exploration. To further connect our findings with those of multi-armed bandits, we define a continuous family of problems and associated regret measures that smoothly interpolates and unifies multi-armed sampling and multi-armed bandit problems using a temperature parameter. We believe the multi-armed sampling framework, and our findings in this setting can have a foundational role in the study of sampling including recent neural samplers, akin to the role of multi-armed bandits in reinforcement learning. In particular, our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning, fine-tuning of pretrained models and reinforcement learning with human feedback (RLHF).

2025-07-01

arXiv (published)

doi.org

arxiv.org

Multiscale Neural PDE Surrogates for Prediction and Downscaling: Application to Ocean Currents

Abdessamad El-Kabid

Loubna Benabbou

Redouane Lguensat

Alex Hernandez-Garcia

Accurate modeling of physical systems governed by partial differential equations is a central challenge in scientific computing. In oceanogr… (see more)aphy, high-resolution current data are critical for coastal management, environmental monitoring, and maritime safety. However, available satellite products, such as Copernicus data for sea water velocity at ~0.08 degrees spatial resolution and global ocean models, often lack the spatial granularity required for detailed local analyses. In this work, we (a) introduce a supervised deep learning framework based on neural operators for solving PDEs and providing arbitrary resolution solutions, and (b) propose downscaling models with an application to Copernicus ocean current data. Additionally, our method can model surrogate PDEs and predict solutions at arbitrary resolution, regardless of the input resolution. We evaluated our model on real-world Copernicus ocean current data and synthetic Navier-Stokes simulation datasets.

2025-07-01

arXiv (published)

doi.org

arxiv.org

Optimizers Qualitatively Alter Solutions And We Should Leverage This

Razvan Pascanu

Clare Lyle

Ionut-Vlad Modoranu

Naima Elosegui Borras

Dan Alistarh

Petar Veličković

Sarath Chandar

Soham De

James Martens

Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when us… (see more)ing optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.

2025-07-01

arXiv (published)

doi.org

arxiv.org

Parsing Autism Heterogeneity: Transcriptomic Subgrouping of Imaging-Derived Phenotypes in Autism.

Johanna Leyhausen

Caroline Gurr

Lisa M. Berg

Hanna Seelemeyer

Bassem Hermila

Tim Schäfer

Andreas Chiocchetti

Charlotte M. Pretzsch

Eva Loth

Beth Oakley

Jan K. Buitelaar

Christian Beckmann

Tony Charman

Thomas Bourgeron

Eli Barthome

Tobias Banaschewski

Jumana Ahmad

Sara Ambrosino

Bonnie Auyeung

Simon Baron-Cohen … (see 56 more)

Sarah Baumeister

Sven Bölte

Carsten Bours

Michael Brammer

Daniel Brandeis

Claudia Brogna

Yvette de Bruijn

Bhismadev Chakrabarti

Ineke Cornelissen

Daisy Crawley

Flavio Dell’Acqua

Guillaume Dumas

Sarah Durston

Christine Ecker

Jessica Faulkner

Vincent Frouin

Pilar Garcés

David Goyard

Lindsay Ham

Hannah Hayward

Joerg F. Hipp

Rosemary Holt

Mark Johnson

Emily J. H. Jones

Prantik Kundu

Meng-Chuan Lai

Xavier Liogier D’ardhuy

Michael V. Lombardo

David J. Lythgoe

René Mandl

Andre Marquand

Luke Mason

Maarten Mennes

Andreas Meyer-Lindenberg

Carolin Moessnang

Nico Bast

Larry O’Dwyer

Marianne Oldehinkel

Bob Oranje

Gahan Pandina

Antonio Persico

Barbara Ruggeri

Amber N. V. Ruigrok

Jessica Sabet

Roberto Sacco

Antonia San José Cáceres

Emily Simonoff

Will Spooren

Julian Tillmann

Roberto Toro

Heike Tost

Jack Waldman

Steve C. R. Williams

Caroline Wooldridge

Marcel P. Zwiers

Declan Murphy

2025-07-01

Biological Psychiatry: Cognitive Neuroscience and Neuroimaging (published)

doi.org

Perpetua: Multi-Hypothesis Persistence Modeling for Semi-Static Environments

Miguel Saavedra-Ruiz

Samer B. Nashed

Charlie Gauthier

Liam Paull

Many robotic systems require extended deployments in complex, dynamic environments. In such deployments, parts of the environment may change… (see more) between subsequent robot observations. Most robotic mapping or environment modeling algorithms are incapable of representing dynamic features in a way that enables predicting their future state. Instead, they opt to filter certain state observations, either by removing them or some form of weighted averaging. This paper introduces Perpetua, a method for modeling the dynamics of semi-static features. Perpetua is able to: incorporate prior knowledge about the dynamics of the feature if it exists, track multiple hypotheses, and adapt over time to enable predicting of future feature states. Specifically, we chain together mixtures of"persistence"and"emergence"filters to model the probability that features will disappear or reappear in a formal Bayesian framework. The approach is an efficient, scalable, general, and robust method for estimating the states of features in an environment, both in the present as well as at arbitrary future times. Through experiments on simulated and real-world data, we find that Perpetua yields better accuracy than similar approaches while also being online adaptable and robust to missing observations.

2025-07-01

arXiv (published)

doi.org

arxiv.org

Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images

Zahra Tehrani Nasab

Hujun Ni

Amar Kumar

Tal Arbel

2025-07-01

arXiv (published)

doi.org

arxiv.org

Public perceptions of Montréal's streets: Implications for inclusive public space making and management

Rashid A. Mushkani

Hugo Berard

Toumadher Ammar

Shin (Alexandre) Koseki

2025-07-01

Journal of Urban Management (published)

doi.org

RainShift: A Benchmark for Precipitation Downscaling Across Geographies

Paula Harder

Luca Schmidt

Francis Pelletier

Nicole Ludwig 0002

Matthew Chantry

Christian Lessig

Alex Hernandez-Garcia

David Rolnick

Earth System Models (ESM) are our main tool for projecting the impacts of climate change. However, running these models at sufficient resolu… (see more)tion for local-scale risk-assessments is not computationally feasible. Deep learning-based super-resolution models offer a promising solution to downscale ESM outputs to higher resolutions by learning from data. Yet, due to regional variations in climatic processes, these models typically require retraining for each geographical area-demanding high-resolution observational data, which is unevenly available across the globe. This highlights the need to assess how well these models generalize across geographic regions. To address this, we introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts. We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South. Our findings reveal substantial performance drops in out-of-distribution regions, depending on model and geographic area. While expanding the training domain generally improves generalization, it is insufficient to overcome shifts between geographically distinct regions. We show that addressing these shifts through, for example, data alignment can improve spatial generalization. Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.

2025-07-01

arXiv (published)

doi.org

arxiv.org

ReCatcher: Towards LLMs Regression Testing for Code Generation

Altaf Allah Abbassi

Leuson Da Silva

Amin Nikanjam

Foutse Khomh

2025-07-01

arXiv (published)

doi.org