Publications

Bugs in the Data: How ImageNet Misrepresents Biodiversity

Alexandra Luccioni

ImageNet-1k is a dataset often used for benchmarking machine learning (ML) models and evaluating tasks such as image recognition and object … (see more)detection. Wild animals make up 27% of ImageNet-1k but, unlike classes representing people and objects, these data have not been closely scrutinized. In the current paper, we analyze the 13,450 images from 269 classes that represent wild animals in the ImageNet-1k validation set, with the participation of expert ecologists. We find that many of the classes are ill-defined or overlapping, and that 12% of the images are incorrectly labeled, with some classes having >90% of images incorrect. We also find that both the wildlife-related labels and images included in ImageNet-1k present significant geographical and cultural biases, as well as ambiguities such as artificial animals, multiple species in the same image, or the presence of humans. Our findings highlight serious issues with the extensive use of this dataset for evaluating ML systems, the use of such algorithms in wildlife-related tasks, and more broadly the ways in which ML datasets are commonly created and curated.

2023-01-01

AAAI (published)

doi.org

arxiv.org

Cache-Efficient Dynamic Programming MDP Solver

Jaël Champagne Gareau

Guillaume Gosset

Éric Beaudry

Vladimir Makarenkov

2023-01-01

European Conference on Artificial Intelligence (published)

doi.org

Can AI Read the Minds of Corporate Executives?

Nicolas Chapados

Zhenzhen Fan

Ruslan Goyenko

Issam Hadj Laradji

Fred Liu

Chengyu Zhang

2023-01-01

SSRN Electronic Journal (published)

doi.org

Can Workers Meaningfully Consent to Workplace Wellbeing Technologies?

Shreya Chowdhary

Anna Kawakami

Jina Suh

Mary L Gray

Alexandra Olteanu

Koustuv Saha

2023-01-01

FAccT (published)

doi.org

arxiv.org

A circulating proteome-informed prognostic model of COVID-19 disease activity that relies on 1 routinely available clinical laboratories 2

William Ma

Antoine Soulé

Karine Tremblay

Simon Rousseau

Amin Emad

Abstract

2023-01-01

(published)

www.semanticscholar.org

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Guy Wolf

Yoshua Bengio

2023-01-01

arXiv.org (preprint)

doi.org

Constant Memory Attentive Neural Processes

Leo Feng

Frederick Tung

Hossein Hajimirsadeghi

Yoshua Bengio

Mohamed Osama Ahmed

2023-01-01

arXiv.org (preprint)

doi.org

Contrast-agnostic deep learning–based registration pipeline: Validation in spinal cord multimodal MRI data

E. Beal

Julien Cohen-Adad

2023-01-01

Aperture Neuro (published)

doi.org

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Le Zhang

Md. Rabiul Awal

Aishwarya Agrawal

2023-01-01

arXiv.org (preprint)

doi.org

Contrastive Positive Unlabeled Learning

Anish Acharya

Sujay Sanghavi

Li Jing

Bhargav Bhushanam

Michael Rabbat

I. Dhillon

Self-supervised pretraining on unlabeled data followed by supervised fine-tuning on labeled data is a popular paradigm for learning from lim… (see more)ited labeled examples. We extend this paradigm to the classical positive unlabeled (PU) setting, where the task is to learn a binary classifier given only a few labeled positive samples, and (often) a large amount of unlabeled samples (which could be positive or negative). We first propose a simple extension of standard infoNCE family of contrastive losses, to the PU setting; and show that this learns superior representations, as compared to existing unsupervised and supervised approaches. We then develop a simple methodology to pseudo-label the unlabeled samples using a new PU-specific clustering scheme; these pseudo-labels can then be used to train the final (positive vs. negative) classifier. Our method handily outperforms state-of-the-art PU methods over several standard PU benchmark datasets, while not requiring a-priori knowledge of any class prior (which is a common assumption in other PU methods). We also provide a simple theoretical analysis that motivates our methods.

2023-01-01

(published)

www.semanticscholar.org

Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity

Eduard Gorbunov

Adrien Taylor

Samuel Horváth

Gauthier Gidel

Algorithms for min-max optimization and variational inequalities are often studied under monotonicity assumptions. Motivated by non-monotone… (see more) machine learning applications, we follow the line of works (Diakonikolas et al., 2021; Lee & Kim, 2021; Pethick et al., 2022; Bohm,2022) aiming at going beyond monotonicity by considering the weaker *negative comonotonicity* assumption. In this work, we provide tight complexity analyses for the Proximal Point (PP), Extragradient (EG), and Optimistic Gradient (OG) methods in this setup, closing several questions on their working guarantees beyond monotonicity. In particular, we derive the first non-asymptotic convergence rates for PP under negative comonotonicity and star-negative comonotonicity and show their tightness via constructing worst-case examples; we also relax the assumptions for the last-iterate convergence guarantees for EG and OG and prove the tightness of the existing best-iterate guarantees for EG and OG via constructing counter-examples.

2023-01-01

ICML (published)

openreview.net

Cutting Planes from the Branch-and-Bound Tree: Challenges and Opportunities

Claudio Contardo

Andrea Lodi

Andrea Tramontani

2023-01-01

INFORMS J. Comput. (published)

doi.org

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Publications

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications