Publications

Balancing Profit and Fairness in Risk-Based Pricing Markets

Dynamic, risk-based pricing can systematically exclude vulnerable consumer groups from essential resources such as health insurance and cons… (see more)umer credit. We show that a regulator can realign private incentives with social objectives through a learned, interpretable tax schedule. First, we provide a formal proposition that bounding each firm's \emph{local} demographic gap implicitly bounds the \emph{global} opt-out disparity, motivating firm-level penalties. Building on this insight we introduce \texttt{MarketSim} -- an open-source, scalable simulator of heterogeneous consumers and profit-maximizing firms -- and train a reinforcement learning (RL) social planner (SP) that selects a bracketed fairness-tax while remaining close to a simple linear prior via an

2024-12-31

arXiv (preprint)

doi.org

arxiv.org

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Verification

Yunzhen Feng

Elvis Dohmatob

Pu Yang

Francois Charton

Julia Kempe

Large Language Models (LLM) are increasingly trained on data generated by other LLM, either because generated text and images become part of… (see more) the pre-training corpus, or because synthetized data is used as a replacement for expensive human-annotation. This raises concerns about \emph{model collapse}, a drop in model performance when their training sets include generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investigate the use of verification on synthesized data to prevent model collapse. We provide a theoretical characterization using Gaussian mixtures, linear classifiers, and linear verifiers to derive conditions with measurable proxies to assess whether the verifier can effectively select synthesized data that leads to optimal performance. We experiment with two practical tasks -- computing matrix eigenvalues with transformers and news summarization with LLMs -- which both exhibit model collapse when trained on generated data, and show that verifiers, even imperfect ones, can indeed be harnessed to prevent model collapse and that our proposed proxy measure strongly correlates with performance.

2024-12-31

ICLR (published)

doi.org

openreview.net

Bidirectional Information Flow (BIF) -- A Sample Efficient Hierarchical Gaussian Process for Bayesian Optimization

Juan D. Guerra

Thomas Garbay

Guillaume Lajoie

Marco Bonizzato

Hierarchical Gaussian Process (H-GP) models divide problems into different subtasks, allowing for different models to address each part, mak… (see more)ing them well-suited for problems with inherent hierarchical structure. However, typical H-GP models do not fully take advantage of this structure, only sending information up or down the hierarchy. This one-way coupling limits sample efficiency and slows convergence. We propose Bidirectional Information Flow (BIF), an efficient H-GP framework that establishes bidirectional information exchange between parent and child models in H-GPs for online training. BIF retains the modular structure of hierarchical models - the parent combines subtask knowledge from children GPs - while introducing top-down feedback to continually refine children models during online learning. This mutual exchange improves sample efficiency, enables robust training, and allows modular reuse of learned subtask models. BIF outperforms conventional H-GP Bayesian Optimization methods, achieving up to 4x and 3x higher

2024-12-31

arXiv (preprint)

doi.org

arxiv.org

A Biodiversity Observation Network to support conservation action and mainstream knowledge in Canada

Andrew Gonzalez

Mary I. O'Connor

Amanda E. Bates

Kyle Bobiwash

A. Cole Burton

Paul van Dam-Bates

Isaac Eckert

Dominique Gravel

C. Julián Idrobo

Laura Pollock

Andrew D.F. Simon

Margaret A. Slein

Péter Sólymos

Brian M. Starzomski

Jennifer Sunday

Eden Tekwa

Canada has begun an ambitious project to build an observing system to monitor the changing state of its biodiversity and ecosystems. A Canad… (see more)a-wide Biodiversity Observation Network (CAN BON) can support the measurement, mapping, and modelling of biodiversity change—the losses and gains in the diversity of plant, animal, and microbial life—and ecosystem services. This initiative responds to eight challenges presently constraining Canada's capacity to deliver timely and robust knowledge to achieve its biodiversity goals. CAN BON is conceived as a network connecting diverse organizations to support sustained biodiversity monitoring by collaboration among universities, museums, governments, industries, NGOs, community groups, and Indigenous organizations. This inclusive network will “mobilize monitoring data” to (1) combine observation and computing infrastructures and traditional knowledge to track and understand biodiversity losses and gains across the country; and (2) link the accumulated data and knowledge to models to inform the detection and attribution of biodiversity change needed to support biodiversity policy with forecasts from local to national levels. We expect that CAN BON will foster the mainstreaming of biodiversity data and knowledge into other sectors of the economy and society, and thereby support the technical and social innovation in Canada's transition to a nature-positive future.

2024-12-31

Formal Aspects of Component Software (published)

doi.org

A Blockchain Framework for Equitable and Secure Task Allocation in Robot Swarms

Hanqing Zhao

Alexandre Pacheco

Giovanni Beltrame

Xue Liu

Marco Dorigo

Gregory Dudek

Recent studies demonstrate the potential of blockchain to enable robots in a swarm to achieve secure consensus about the environment, partic… (see more)ularly when robots are homogeneous and perform identical tasks. Typically, robots receive rewards for their contributions to consensus achievement, but no studies have yet targeted heterogeneous swarms, in which the robots have distinct physical capabilities suited to different tasks. We present a novel framework that leverages domain knowledge to decompose the swarm mission into a hierarchy of tasks within smart contracts. This allows the robots to reach a consensus about both the environment and the action plan, allocating tasks among robots with diverse capabilities to improve their performance while maintaining security against faults and malicious behaviors. We refer to this concept as equitable and secure task allocation. Validated in Simultaneous Localization and Mapping missions, our approach not only achieves equitable task allocation among robots with varying capabilities, improving mapping accuracy and efficiency, but also shows resilience against malicious attacks.

2024-12-31

IEEE Robotics and Automation Letters (unknown)

doi.org

Body size and intracranial volume interact with the structure of the central nervous system: A multi-center in vivo neuroimaging study

René Labounek

Monica T. Bondy

Amy L. Paulson

Sandrine Bédard

Mihael Abramovic

Eva Alonso-Ortiz

Nicole T. Atcheson

Laura R. Barlow

Robert L. Barry

Markus Barth

Marco Battiston

Christian Büchel

Matthew D. Budde

Virginie Callot

Anna Combes

Benjamin De Leener

Maxime Descoteaux

Paulo Loureiro de Sousa

Marek Dostál

Julien Doyon … (see 74 more)

Adam V. Dvorak

Falk Eippert

Karla R. Epperson

Kevin S. Epperson

Patrick Freund

Jürgen Finsterbusch

Alexandru Foias

Michela Fratini

Issei Fukunaga

Claudia A.M. Gandini Wheeler-Kingshott

Giancarlo Germani

Guillaume Gilbert

Federico Giove

Francesco Grussu

Akifumi Hagiwara

Pierre-Gilles Henry

Tomáš Horák

Masaaki Hori

James M. Joers

Kouhei Kamiya

Haleh Karbasforoushan

Miloš Keřkovský

Ali Khatibi

Joo-Won Kim

Nawal Kinany

Hagen Kitzler

Shannon Kolind

Yazhuo Kong

Petr Kudlička

Paul Kuntke

Nyoman D. Kurniawan

Slawomir Kusmia

Maria Marcella Laganà

Cornelia Laule

Christine S.W. Law

Tobias Leutritz

Yaou Liu

Sara Llufriu

Sean Mackey

Allan R. Martin

Eloy Martinez-Heras

Loan Mattera

Kristin P. O'Grady

Nico Papinutto

Daniel Papp

Deborah Pareto

Todd B. Parrish

Anna Pichiecchio

Ferran Prados

Àlex Rovira

Marc J. Ruitenberg

Rebecca S. Samson

Giovanni Savini

Maryam Seif

Alan C. Seifert

Alex K. Smith

Seth A. Smith

Zachary A. Smith

Elisabeth Solana

Yuichi Suzuki

George W Tackley

Alexandra Tinnermann

Jan Valosek

Dimitri Van De Ville

Marios C. Yiannakas

Kenneth A. Weber II

Nikolaus Weiskopf

Richard G. Wise

Patrik O. Wyss

Junqian Xu

Julien Cohen-Adad

Christophe Lenglet

Igor Nestrasil

Clinical research emphasizes the implementation of rigorous and reproducible study designs that rely on between-group matching or controllin… (see more)g for sources of biological variation such as subject’s sex and age. However, corrections for body size (i.e., height and weight) are mostly lacking in clinical neuroimaging designs. This study investigates the importance of body size parameters in their relationship with spinal cord (SC) and brain magnetic resonance imaging (MRI) metrics. Data were derived from a cosmopolitan population of 267 healthy human adults (age 30.1 ± 6.6 years old, 125 females). We show that body height correlates with brain gray matter (GM) volume, cortical GM volume, total cerebellar volume, brainstem volume, and cross-sectional area (CSA) of cervical SC white matter (CSA-WM; 0.44 ≤ r ≤ 0.62). Intracranial volume (ICV) correlates with body height (r = 0.46) and the brain volumes and CSA-WM (0.37 ≤ r ≤ 0.77). In comparison, age correlates with cortical GM volume, precentral GM volume, and cortical thickness (-0.21 ≥ r ≥ -0.27). Body weight correlates with magnetization transfer ratio in the SC WM, dorsal columns, and lateral corticospinal tracts (-0.20 ≥ r ≥ -0.23). Body weight further correlates with the mean diffusivity derived from diffusion tensor imaging (DTI) in SC WM (r = -0.20) and dorsal columns (-0.21), but only in males. CSA-WM correlates with brain volumes (0.39 ≤ r ≤ 0.64), and with precentral gyrus thickness and DTI-based fractional anisotropy in SC dorsal columns and SC lateral corticospinal tracts (-0.22 ≥ r ≥ -0.25). Linear mixture of age, sex, or sex and age, explained 2 ± 2%, 24 ± 10%, or 26 ± 10%, of data variance in brain volumetry and SC CSA. The amount of explained variance increased to 33 ± 11%, 41 ± 17%, or 46 ± 17%, when body height, ICV, or body height and ICV were added into the mixture model. In females, the explained variances halved suggesting another unidentified biological factor(s) determining females’ central nervous system (CNS) morphology. In conclusion, body size and ICV are significant biological variables. Along with sex and age, body size should therefore be included as a mandatory variable in the design of clinical neuroimaging studies examining SC and brain structure; and body size and ICV should be considered as covariates in statistical analyses. Normalization of different brain regions with ICV diminishes their correlations with body size, but simultaneously amplifies ICV-related variance (r = 0.72 ± 0.07) and suppresses volume variance of the different brain regions (r = 0.12 ± 0.19) in the normalized measurements.

2024-12-31

Imaging Neuroscience (published)

doi.org

Can We Learn Communication-Efficient Optimizers?

Charles-Etienne Joseph

2024-12-31

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

Causal Machine Learning: A Survey and Open Problems

Jean Kaddour

Aengus Lynch

Qi Liu

Matt J. Kusner

Ricardo Silva

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structur… (see more)al causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

2024-12-31

Foundations and Trends in Optimization (published)

doi.org

arxiv.org

CAVE: Detecting and Explaining Commonsense Anomalies in Visual Environments

Rishika Bhagwatkar

Syrielle Montariol

Angelika Romanou

Beatriz Borges

Irina Rish

Antoine Bosselut

2024-12-31

EMNLP (published)

doi.org

arxiv.org

Changer le regard des étudiants sur les métiers de la comptabilité : Les effets de la simulation de gestion

Guillaume Dumas

Yann QUÉMÉNER

La comptabilité véhicule souvent injustement, une image terne et ennuyeuse, auprès du grand public et des jeunes étudiants choisissant l… (see more)eur orientation. Dans cet article, nous questionnons l’effet de pratiques pédagogiques sur la perception par les étudiants, des soft skills attendues par les employeurs. Pour cela nous réalisons une quasi-expérimentation dans laquelle nous comparons les perceptions des étudiants selon que le cours ait été animé sous un format classique (application des connaissances par le biais d’exercices avec corrigé par l’enseignant) ou sous la forme d’une simulation de gestion (application des connaissances en vue de prendre des décisions et piloter une entreprise fictive). Les résultats de la recherche montrent qu’une simulation de gestion, plus que les travaux dirigés classiques, permettent aux primo-apprenants en comptabilité, d’avoir une meilleure perception des soft skills attendues par les praticiens et les recruteurs. Nos résultats rappellent l’importance de donner une représentation réaliste (éloignée des clichés) de la profession, afin de rendre les filières d’enseignement de la comptabilité plus attractives.

2024-12-31

Finance Contrôle Stratégie (published)

doi.org

Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead

Jesujoba Oluwadara Alabi

Michael A. Hedderich

David Ifeoluwa Adelani

Dietrich Klakow

2024-12-31

EMNLP (published)

doi.org

arxiv.org

Child- and Proxy-reported Differences in Patient-reported Outcome and Experience Measures in Pediatric Surgery: Systematic Review and Meta-analysis

Zanib Nafees

Siena O'Neill

Alexandra Dimmer

Elena Guadagno

Julia Ferreira

Nancy Mayo

Dan Poenaru

2024-12-31

Journal of Pediatric Surgery (published)

doi.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications