Publications

An Effective Theory of Bias Amplification

Arjun Subramonian

Samuel J. Bell

Levent Sagun

Elvis Dohmatob

Machine learning models can capture and amplify biases present in data, leading to disparate test performance across social groups. To bette… (voir plus)r understand, evaluate, and mitigate these biases, a deeper theoretical understanding of how model design choices and data distribution properties contribute to bias is needed. In this work, we contribute a precise analytical theory in the context of ridge regression, both with and without random projections, where the former models feedforward neural networks in a simplified regime. Our theory offers a unified and rigorous explanation of machine learning bias, providing insights into phenomena such as bias amplification and minority-group bias in various feature and parameter regimes. For example, we observe that there may be an optimal regularization penalty or training time to avoid bias amplification, and there can be differences in test error between groups that are not alleviated with increased parameterization. Importantly, our theoretical predictions align with empirical observations reported in the literature on machine learning bias. We extensively empirically validate our theory on synthetic and semi-synthetic datasets.

2024-12-31

ICLR (publié)

doi.org

openreview.net

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Mete Kemertas

Allan Jepson

Amir-massoud Farahmand

2024-12-31

Trans. Mach. Learn. Res. (publié)

doi.org

arxiv.org

Embedding Cultural Diversity in Prototype-based Recommender Systems

Popularity bias in recommender systems can increase cultural overrepresentation by favoring norms from dominant cultures and marginalizing u… (voir plus)nderrepresented groups. This issue is critical for platforms offering cultural products, as they influence consumption patterns and human perceptions. In this work, we address popularity bias by identifying demographic biases within prototype-based matrix factorization methods. Using the country of origin as a proxy for cultural identity, we link this demographic attribute to popularity bias by refining the embedding space learning process. First, we propose filtering out irrelevant prototypes to improve representativity. Second, we introduce a regularization technique to enforce a uniform distribution of prototypes within the embedding space. Across four datasets, our results demonstrate a 27\% reduction in the average rank of long-tail items and a 2\% reduction in the average rank of items from underrepresented countries. Additionally, our model achieves a 2\% improvement in HitRatio@10 compared to the state-of-the-art, highlighting that fairness is enhanced without compromising recommendation quality. Moreover, the distribution of prototypes leads to more inclusive explanations by better aligning items with diverse prototypes.

2024-12-31

European Conference on Information Retrieval (publié)

doi.org

arxiv.org

An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration

Hiroki Naganuma

Ryuichiro Hataya

Kotaro Yoshida

Ioannis Mitliagkas

2024-12-31

Trans. Mach. Learn. Res. (publié)

openreview.net

EPISeg: Automated segmentation of the spinal cord on echo planar images using open-access multi-center data

Rohan Banerjee

Merve Kaptan

Alexandra Tinnermann

Ali Khatibi

Alice Dabbagh

Christian Büchel

Christian W. Kündig

Christine Law

Dario Pfyffer

David J. Lythgoe

Dimitra Tsivaka

Dimitri Van De Ville

Falk Eippert

Fauziyya Muhammad

Gary H. Glover

Gergely David

Grace Haynes

Jan Haaker

J.C. Brooks

Jürgen Finsterbusch … (voir 21 de plus)

Katherine T. Martucci

Kimberly J. Hemmerling

Mahdi Mobarak-Abadi

Mark A. Hoggarth

Matthew A. Howard

Molly G. Bright

Nawal Kinany

Olivia S. Kowalczyk

Patrick Freund

Robert Barry

Sean Mackey

Shahabeddin Vahdat

Simon Schading‐Sassenhausen

Stephen B. McMahon

Todd Parish

Véronique Marchand‐Pauvert

Yufen Chen

Zachary A. Smith

Kenneth A. Weber

Benjamin De Leener

Julien Cohen‐Adad

Functional magnetic resonance imaging (fMRI) of the spinal cord is relevant for studying sensation, movement, and autonomic function. Prepro… (voir plus)cessing of spinal cord fMRI data involves segmentation of the spinal cord on gradient-echo echo planar imaging (EPI) images. Current automated segmentation methods do not work well on these data, due to the low spatial resolution, susceptibility artifacts causing distortions and signal drop-out, ghosting, and motion-related artifacts. Consequently, this segmentation task demands a considerable amount of manual effort which takes time and is prone to user bias. In this work, we (i) gathered a multi-center dataset of spinal cord gradient-echo EPI with ground-truth segmentations and shared it on OpenNeuro https://openneuro.org/datasets/ds005143/versions/1.3.1 and (ii) developed a deep learning-based model, EPISeg, for the automatic segmentation of the spinal cord on gradient-echo EPI data. We observe a significant improvement in terms of segmentation quality compared with other available spinal cord segmentation models. Our model is resilient to different acquisition protocols as well as commonly observed artifacts in fMRI data. The training code is available at https://github.com/sct-pipeline/fmri-segmentation/, and the model has been integrated into the Spinal Cord Toolbox as a command-line tool.

2024-12-31

Imaging Neuroscience (publié)

doi.org

Estimating Head Motion in Structural MRI Using a Deep Neural Network Trained on Synthetic Artifacts

C Bricout

S Ebrahimi Kahou

Sylvain Bouix

2024-12-31

arXiv.org (prépublication)

doi.org

On Estimating the Strength of Differentially Private Mechanisms in a Black-Box Setting

Daniele Gorla

Louis Jalouzot

Federica Granese

Catuscia Palamidessi

Pablo Piantanida

We analyze to what extent final users can infer information about the level of protection of their data when the data obfuscation mechanism … (voir plus)is a priori unknown to them (the so-called “black-box” scenario). In particular, we explore four notions of differential privacy, namely local/central

2024-12-31

IEEE Transactions on Dependable and Secure Computing (inconnu)

doi.org

Evaluating machine learning-driven intrusion detection systems in IoT: Performance and energy consumption

Saeid Jamshidi

Kawser Wazed Nafi

Amin Nikanjam

Foutse Khomh

2024-12-31

Comput. Ind. Eng. (publié)

doi.org

arxiv.org

Evaluation of machine learning and deep learning models for the classification of a single extracellular vesicles spectral library

C. del Real Mata

Y. Lu

M. Jalali

A. Bocan

M. Khatami

L. Montermini

J. McCormack-llersich

W. W. Reisner

L. Garzia

J. Rak

D. Bzdok

S. Mahshid

Nanostructure-based sensors study extracellular vesicles; optimization of a single-vesicle resolution spectral library to enhance classifica… (voir plus)tion for future AI-driven diagnostics.

2024-12-31

Sensors and Diagnostics (publié)

doi.org

Evolution of High-Throughput Satellite Systems: A Vision of Programmable Regenerative Payload.

Olfa Ben Yahia

Zineb Garroussi

Olivier Bélanger

Brunilde Sansò

Jean-François Frigon

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

High-throughput satellite (HTS), with its digital payload technology, is expected to play a key role as an enabler of the upcoming sixth-gen… (voir plus)eration (6G) networks. HTS is mainly designed to provide higher data rates and capacities. Fueled by technological advancements, including beamforming, advanced modulation techniques, reconfigurable phased array technologies, and electronically steerable antennas, HTS has emerged as a fundamental component for future network generations. This paper offers a comprehensive state-of-the-art on HTS systems, focusing on standardization, patents, channel multiple access techniques, routing, load balancing, and the role of software-defined networking (SDN). In addition, we provide a vision for next-generation satellite systems that we have named Extremely-HTS (EHTS) toward autonomous satellites supported by the main requirements and key technologies expected for these systems. The EHTS system will be designed to maximize spectrum reuse and data rates and to flexibly steer the capacity to satisfy user demand. We introduce a novel architecture for future programmable regenerative payloads as well.

2024-12-31

IEEE Commun. Surv. Tutorials (publié)

doi.org

Exploiting Instruction-Following Retrievers for Malicious Information Retrieval

Parishad BehnamGhader

Nicholas Meade

Siva Reddy

Instruction-following retrievers have been widely adopted alongside LLMs in real-world applications, but little work has investigated the sa… (voir plus)fety risks surrounding their increasing search capabilities. We empirically study the ability of retrievers to satisfy malicious queries, both when used directly and when used in a retrieval augmented generation-based setup. Concretely, we investigate six leading retrievers, including NV-Embed and LLM2Vec, and find that given malicious requests, most retrievers can (for >50% of queries) select relevant harmful passages. For example, LLM2Vec correctly selects passages for 61.35% of our malicious queries. We further uncover an emerging risk with instruction-following retrievers, where highly relevant harmful information can be surfaced by exploiting their instruction-following capabilities. Finally, we show that even safety-aligned LLMs, such as Llama3, can satisfy malicious requests when provided with harmful retrieved passages in-context. In summary, our findings underscore the malicious misuse risks associated with increasing retriever capability.

2024-12-31

ACL (Findings) (publié)

doi.org

arxiv.org

A "fine-cuts" approach disentangling psychopathic, autistic and alexithymic traits in their associations with affective, cognitive and motor empathy

Julia Ayache

Nikki Stevenson

Elisha Patel

Alexander Sumich

Guillaume Dumas

Nadja Heym

2024-12-31

Personality and Individual Differences (publié)

doi.org

La plateforme Mila Ventures

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

Publications

La plateforme Mila Ventures

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Publications