Publications

Balancing Profit and Fairness in Risk-Based Pricing Markets
Dynamic, risk-based pricing can systematically exclude vulnerable consumer groups from essential resources such as health insurance and cons… (see more)umer credit. We show that a regulator can realign private incentives with social objectives through a learned, interpretable tax schedule. First, we provide a formal proposition that bounding each firm's \emph{local} demographic gap implicitly bounds the \emph{global} opt-out disparity, motivating firm-level penalties. Building on this insight we introduce \texttt{MarketSim} -- an open-source, scalable simulator of heterogeneous consumers and profit-maximizing firms -- and train a reinforcement learning (RL) social planner (SP) that selects a bracketed fairness-tax while remaining close to a simple linear prior via an
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification
Yunzhen Feng
Elvis Dohmatob
Pu Yang
Francois Charton
Julia Kempe
Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of… (see more) the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of verification on synthesized data to prevent model collapse. We provide a theoretical characterization using Gaussian mixtures, linear classifiers, and linear verifiers to derive conditions with measurable proxies to assess whether the verifier can effectively select synthesized data that leads to optimal performance. We experiment with two practical tasks -- computing matrix eigenvalues with transformers and news summarization with LLMs -- which both exhibit model collapse when trained on generated data, and show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse and that our proposed proxy measure strongly correlates with performance.
Bidirectional Information Flow (BIF) -- A Sample Efficient Hierarchical Gaussian Process for Bayesian Optimization
Hierarchical Gaussian Process (H-GP) models divide problems into different subtasks, allowing for different models to address each part, mak… (see more)ing them well-suited for problems with inherent hierarchical structure. However, typical H-GP models do not fully take advantage of this structure, only sending information up or down the hierarchy. This one-way coupling limits sample efficiency and slows convergence. We propose Bidirectional Information Flow (BIF), an efficient H-GP framework that establishes bidirectional information exchange between parent and child models in H-GPs for online training. BIF retains the modular structure of hierarchical models - the parent combines subtask knowledge from children GPs - while introducing top-down feedback to continually refine children models during online learning. This mutual exchange improves sample efficiency, enables robust training, and allows modular reuse of learned subtask models. BIF outperforms conventional H-GP Bayesian Optimization methods, achieving up to 4x and 3x higher
A Biodiversity Observation Network to support conservation action and mainstream knowledge in Canada
Andrew Gonzalez
Mary I. O'Connor
Amanda E. Bates
Kyle Bobiwash
A. Cole Burton
Paul van Dam-Bates
Isaac Eckert
Dominique Gravel
C. Julián Idrobo
Laura Pollock
Andrew D.F. Simon
Margaret A. Slein
Péter Sólymos
Brian M. Starzomski
Jennifer Sunday
Eden Tekwa
Canada has begun an ambitious project to build an observing system to monitor the changing state of its biodiversity and ecosystems. A Canad… (see more)a-wide Biodiversity Observation Network (CAN BON) can support the measurement, mapping, and modelling of biodiversity change—the losses and gains in the diversity of plant, animal, and microbial life—and ecosystem services. This initiative responds to eight challenges presently constraining Canada's capacity to deliver timely and robust knowledge to achieve its biodiversity goals. CAN BON is conceived as a network connecting diverse organizations to support sustained biodiversity monitoring by collaboration among universities, museums, governments, industries, NGOs, community groups, and Indigenous organizations. This inclusive network will “mobilize monitoring data” to (1) combine observation and computing infrastructures and traditional knowledge to track and understand biodiversity losses and gains across the country; and (2) link the accumulated data and knowledge to models to inform the detection and attribution of biodiversity change needed to support biodiversity policy with forecasts from local to national levels. We expect that CAN BON will foster the mainstreaming of biodiversity data and knowledge into other sectors of the economy and society, and thereby support the technical and social innovation in Canada's transition to a nature-positive future.
A Blockchain Framework for Equitable and Secure Task Allocation in Robot Swarms
Alexandre Pacheco
Xue Liu
Marco Dorigo
Recent studies demonstrate the potential of blockchain to enable robots in a swarm to achieve secure consensus about the environment, partic… (see more)ularly when robots are homogeneous and perform identical tasks. Typically, robots receive rewards for their contributions to consensus achievement, but no studies have yet targeted heterogeneous swarms, in which the robots have distinct physical capabilities suited to different tasks. We present a novel framework that leverages domain knowledge to decompose the swarm mission into a hierarchy of tasks within smart contracts. This allows the robots to reach a consensus about both the environment and the action plan, allocating tasks among robots with diverse capabilities to improve their performance while maintaining security against faults and malicious behaviors. We refer to this concept as equitable and secure task allocation. Validated in Simultaneous Localization and Mapping missions, our approach not only achieves equitable task allocation among robots with varying capabilities, improving mapping accuracy and efficiency, but also shows resilience against malicious attacks.
Body size and intracranial volume interact with the structure of the central nervous system: A multi-center in vivo neuroimaging study
René Labounek
Monica T. Bondy
Amy L. Paulson
Mihael Abramovic
Eva Alonso-Ortiz
Nicole T. Atcheson
Laura R. Barlow
Robert L. Barry
Markus Barth
Marco Battiston
Christian Büchel
Matthew D. Budde
Virginie Callot
Anna Combes
Benjamin De Leener
Maxime Descoteaux
Paulo Loureiro de Sousa
Marek Dostál
Julien Doyon … (see 74 more)
Adam V. Dvorak
Falk Eippert
Karla R. Epperson
Kevin S. Epperson
Patrick Freund
Jürgen Finsterbusch
Alexandru Foias
Michela Fratini
Issei Fukunaga
Claudia A.M. Gandini Wheeler-Kingshott
Giancarlo Germani
Guillaume Gilbert
Federico Giove
Francesco Grussu
Akifumi Hagiwara
Pierre-Gilles Henry
Tomáš Horák
Masaaki Hori
James M. Joers
Kouhei Kamiya
Haleh Karbasforoushan
Miloš Keřkovský
Ali Khatibi
Joo-Won Kim
Nawal Kinany
Hagen Kitzler
Shannon Kolind
Yazhuo Kong
Petr Kudlička
Paul Kuntke
Nyoman D. Kurniawan
Slawomir Kusmia
Maria Marcella Laganà
Cornelia Laule
Christine S.W. Law
Christine S.W. Law
Tobias Leutritz
Yaou Liu
Sara Llufriu
Sean Mackey
Allan R. Martin
Eloy Martinez-Heras
Loan Mattera
Kristin P. O'Grady
Nico Papinutto
Daniel Papp
Deborah Pareto
Todd B. Parrish
Anna Pichiecchio
Ferran Prados
Àlex Rovira
Marc J. Ruitenberg
Rebecca S. Samson
Giovanni Savini
Maryam Seif
Alan C. Seifert
Alex K. Smith
Seth A. Smith
Zachary A. Smith
Elisabeth Solana
Yuichi Suzuki
George W Tackley
Alexandra Tinnermann
Dimitri Van De Ville
Marios C. Yiannakas
Kenneth A. Weber II
Nikolaus Weiskopf
Richard G. Wise
Patrik O. Wyss
Junqian Xu
Christophe Lenglet
Igor Nestrasil
Clinical research emphasizes the implementation of rigorous and reproducible study designs that rely on between-group matching or controllin… (see more)g for sources of biological variation such as subject’s sex and age. However, corrections for body size (i.e., height and weight) are mostly lacking in clinical neuroimaging designs. This study investigates the importance of body size parameters in their relationship with spinal cord (SC) and brain magnetic resonance imaging (MRI) metrics. Data were derived from a cosmopolitan population of 267 healthy human adults (age 30.1 ± 6.6 years old, 125 females). We show that body height correlates with brain gray matter (GM) volume, cortical GM volume, total cerebellar volume, brainstem volume, and cross-sectional area (CSA) of cervical SC white matter (CSA-WM; 0.44 ≤ r ≤ 0.62). Intracranial volume (ICV) correlates with body height (r = 0.46) and the brain volumes and CSA-WM (0.37 ≤ r ≤ 0.77). In comparison, age correlates with cortical GM volume, precentral GM volume, and cortical thickness (-0.21 ≥ r ≥ -0.27). Body weight correlates with magnetization transfer ratio in the SC WM, dorsal columns, and lateral corticospinal tracts (-0.20 ≥ r ≥ -0.23). Body weight further correlates with the mean diffusivity derived from diffusion tensor imaging (DTI) in SC WM (r = -0.20) and dorsal columns (-0.21), but only in males. CSA-WM correlates with brain volumes (0.39 ≤ r ≤ 0.64), and with precentral gyrus thickness and DTI-based fractional anisotropy in SC dorsal columns and SC lateral corticospinal tracts (-0.22 ≥ r ≥ -0.25). Linear mixture of age, sex, or sex and age, explained 2 ± 2%, 24 ± 10%, or 26 ± 10%, of data variance in brain volumetry and SC CSA. The amount of explained variance increased to 33 ± 11%, 41 ± 17%, or 46 ± 17%, when body height, ICV, or body height and ICV were added into the mixture model. In females, the explained variances halved suggesting another unidentified biological factor(s) determining females’ central nervous system (CNS) morphology. In conclusion, body size and ICV are significant biological variables. Along with sex and age, body size should therefore be included as a mandatory variable in the design of clinical neuroimaging studies examining SC and brain structure; and body size and ICV should be considered as covariates in statistical analyses. Normalization of different brain regions with ICV diminishes their correlations with body size, but simultaneously amplifies ICV-related variance (r = 0.72 ± 0.07) and suppresses volume variance of the different brain regions (r = 0.12 ± 0.19) in the normalized measurements.
Can We Learn Communication-Efficient Optimizers?
Causal Machine Learning: A Survey and Open Problems
Jean Kaddour
Aengus Lynch
Qi Liu
Matt J. Kusner
Ricardo Silva
Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structur… (see more)al causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.
CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments
Syrielle Montariol
Angelika Romanou
Beatriz Borges
Antoine Bosselut
Changer le regard des étudiants sur les métiers de la comptabilité : Les effets de la simulation de gestion
Yann QUÉMÉNER
La comptabilité véhicule souvent injustement, une image terne et ennuyeuse, auprès du grand public et des jeunes étudiants choisissant l… (see more)eur orientation. Dans cet article, nous questionnons l’effet de pratiques pédagogiques sur la perception par les étudiants, des soft skills attendues par les employeurs. Pour cela nous réalisons une quasi-expérimentation dans laquelle nous comparons les perceptions des étudiants selon que le cours ait été animé sous un format classique (application des connaissances par le biais d’exercices avec corrigé par l’enseignant) ou sous la forme d’une simulation de gestion (application des connaissances en vue de prendre des décisions et piloter une entreprise fictive). Les résultats de la recherche montrent qu’une simulation de gestion, plus que les travaux dirigés classiques, permettent aux primo-apprenants en comptabilité, d’avoir une meilleure perception des soft skills attendues par les praticiens et les recruteurs. Nos résultats rappellent l’importance de donner une représentation réaliste (éloignée des clichés) de la profession, afin de rendre les filières d’enseignement de la comptabilité plus attractives.
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
Jesujoba Oluwadara Alabi
Michael A. Hedderich
Dietrich Klakow
Child- and Proxy-reported Differences in Patient-reported Outcome and Experience Measures in Pediatric Surgery: Systematic Review and Meta-analysis
Zanib Nafees
Siena O'Neill
Alexandra Dimmer
Elena Guadagno
Julia Ferreira
Nancy Mayo