Publications

Optimizers Qualitatively Alter Solutions And We Should Leverage This
Clare Lyle
Ionut-Vlad Modoranu
Naima Elosegui Borras
Dan Alistarh
Petar Veličković
Soham De
James Martens
Due to the nonlinear nature of Deep Neural Networks (DNNs), one can not guarantee convergence to a unique global minimum of the loss when us… (voir plus)ing optimizers relying only on local information, such as SGD. Indeed, this was a primary source of skepticism regarding the feasibility of DNNs in the early days of the field. The past decades of progress in deep learning have revealed this skepticism to be misplaced, and a large body of empirical evidence shows that sufficiently large DNNs following standard training protocols exhibit well-behaved optimization dynamics that converge to performant solutions. This success has biased the community to use convex optimization as a mental model for learning, leading to a focus on training efficiency, either in terms of required iteration, FLOPs or wall-clock time, when improving optimizers. We argue that, while this perspective has proven extremely fruitful, another perspective specific to DNNs has received considerably less attention: the optimizer not only influences the rate of convergence, but also the qualitative properties of the learned solutions. Restated, the optimizer can and will encode inductive biases and change the effective expressivity of a given class of models. Furthermore, we believe the optimizer can be an effective way of encoding desiderata in the learning process. We contend that the community should aim at understanding the biases of already existing methods, as well as aim to build new optimizers with the explicit intent of inducing certain properties of the solution, rather than solely judging them based on their convergence rates. We hope our arguments will inspire research to improve our understanding of how the learning process can impact the type of solution we converge to, and lead to a greater recognition of optimizers design as a critical lever that complements the roles of architecture and data in shaping model outcomes.
Parsing Autism Heterogeneity: Transcriptomic Subgrouping of Imaging-Derived Phenotypes in Autism.
Johanna Leyhausen
Caroline Gurr
Lisa M. Berg
Hanna Seelemeyer
Bassem Hermila
Tim Schäfer
Andreas Chiocchetti
Charlotte M. Pretzsch
Eva Loth
Beth Oakley
Jan K. Buitelaar
Christian Beckmann
Tony Charman
Thomas Bourgeron
Eli Barthome
Tobias Banaschewski
Jumana Ahmad
Sara Ambrosino
Bonnie Auyeung
Simon Baron-Cohen … (voir 56 de plus)
Sarah Baumeister
Sven Bölte
Carsten Bours
Michael Brammer
Daniel Brandeis
Claudia Brogna
Yvette de Bruijn
Bhismadev Chakrabarti
Ineke Cornelissen
Daisy Crawley
Flavio Dell’Acqua
Sarah Durston
Christine Ecker
Jessica Faulkner
Vincent Frouin
Pilar Garcés
David Goyard
Lindsay Ham
Hannah Hayward
Joerg F. Hipp
Rosemary Holt
Mark Johnson
Emily J. H. Jones
Prantik Kundu
Meng-Chuan Lai
Xavier Liogier D’ardhuy
Michael V. Lombardo
David J. Lythgoe
René Mandl
Andre Marquand
Luke Mason
Maarten Mennes
Andreas Meyer-Lindenberg
Carolin Moessnang
Nico Bast
Larry O’Dwyer
Marianne Oldehinkel
Bob Oranje
Gahan Pandina
Antonio Persico
Barbara Ruggeri
Amber N. V. Ruigrok
Jessica Sabet
Roberto Sacco
Antonia San José Cáceres
Emily Simonoff
Will Spooren
Julian Tillmann
Roberto Toro
Heike Tost
Jack Waldman
Steve C. R. Williams
Caroline Wooldridge
Marcel P. Zwiers
Declan Murphy
Perpetua: Multi-Hypothesis Persistence Modeling for Semi-Static Environments
Miguel Saavedra-Ruiz
Samer B. Nashed
Many robotic systems require extended deployments in complex, dynamic environments. In such deployments, parts of the environment may change… (voir plus) between subsequent robot observations. Most robotic mapping or environment modeling algorithms are incapable of representing dynamic features in a way that enables predicting their future state. Instead, they opt to filter certain state observations, either by removing them or some form of weighted averaging. This paper introduces Perpetua, a method for modeling the dynamics of semi-static features. Perpetua is able to: incorporate prior knowledge about the dynamics of the feature if it exists, track multiple hypotheses, and adapt over time to enable predicting of future feature states. Specifically, we chain together mixtures of"persistence"and"emergence"filters to model the probability that features will disappear or reappear in a formal Bayesian framework. The approach is an efficient, scalable, general, and robust method for estimating the states of features in an environment, both in the present as well as at arbitrary future times. Through experiments on simulated and real-world data, we find that Perpetua yields better accuracy than similar approaches while also being online adaptable and robust to missing observations.
Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images
Public perceptions of Montréal's streets: Implications for inclusive public space making and management
Rashid A. Mushkani
Toumadher Ammar
RainShift: A Benchmark for Precipitation Downscaling Across Geographies
Luca Schmidt
Nicole Ludwig 0002
Matthew Chantry
Christian Lessig
Alex Hernandez-Garcia
Earth System Models (ESM) are our main tool for projecting the impacts of climate change. However, running these models at sufficient resolu… (voir plus)tion for local-scale risk-assessments is not computationally feasible. Deep learning-based super-resolution models offer a promising solution to downscale ESM outputs to higher resolutions by learning from data. Yet, due to regional variations in climatic processes, these models typically require retraining for each geographical area-demanding high-resolution observational data, which is unevenly available across the globe. This highlights the need to assess how well these models generalize across geographic regions. To address this, we introduce RainShift, a dataset and benchmark for evaluating downscaling under geographic distribution shifts. We evaluate state-of-the-art downscaling approaches including GANs and diffusion models in generalizing across data gaps between the Global North and Global South. Our findings reveal substantial performance drops in out-of-distribution regions, depending on model and geographic area. While expanding the training domain generally improves generalization, it is insufficient to overcome shifts between geographically distinct regions. We show that addressing these shifts through, for example, data alignment can improve spatial generalization. Our work advances the global applicability of downscaling methods and represents a step toward reducing inequities in access to high-resolution climate information.
ReCatcher: Towards LLMs Regression Testing for Code Generation
Altaf Allah Abbassi
Leuson Da Silva
Amin Nikanjam
Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs
MohammadReza Davari
Utkarsh Garg
Weixin Cai
Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in do… (voir plus)mains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective,
Speciation of coral-associated barnacles: generalists versus specialists in the Indo-West Pacific
Lorenzo C. Halasan
Yoko Nozawa
Benny Kwok Kan Chan
On the Fundamental Limitations of Dual Static CVaR Decompositions in Markov Decision Processes
Mathieu Godbout
Using machine learning algorithms to predict students' general self-efficacy in PISA 2018
Bin Tan
Hao-Yue Jin