Yann Lecun

Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent… (see more) success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty into MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a Gaussian distribution. StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, StoP improves downstream MIM performance on a variety of downstream tasks, including

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes

Quentin Garrido

Jean Ponce

Xinlei Chen

Michael Rabbat

Mahmoud Assran

Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (see more)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

2024-02-15

ArXiv (preprint)

arxiv.org

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes

Quentin Garrido

Jean Ponce

Xinlei Chen

Michael Rabbat

Mahmoud Assran

Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (see more)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

2024-02-15

ArXiv (preprint)

arxiv.org

Blockwise Self-Supervised Learning at Scale

Shoaib Ahmed Siddiqui

David Scott Krueger

Stephane Deny

2024-01-30

TMLR (accepted)

openreview.net

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Mahmoud Assran

Quentin Duval

Ishan Misra

Piotr Bojanowski

This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations. W… (see more)e introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images. The idea behind I-JEPA is simple: from a single context block, predict the representations of various target blocks in the same image. A core design choice to guide I-JEPA towards producing semantic representations is the masking strategy; specifically, it is crucial to (a) sample target blocks with sufficiently large scale (semantic), and to (b) use a sufficiently informative (spatially distributed) context block. Empirically, when combined with Vision Transformers, we find I-JEPA to be highly scalable. For instance, we train a ViT-Huge/14 on ImageNet using 16 A100 GPUs in under 72 hours to achieve strong downstream performance across a wide range of tasks, from linear classification to object counting and depth prediction.

2023-06-17

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (published)

arxiv.org

Catalyzing next-generation Artificial Intelligence through NeuroAI

Anthony Zador

Sean Escola

Blake Richards

Bence Ölveczky

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Timothy P. Lillicrap

Adam

Adam Marblestone … (see 9 more)

Bruno Olshausen

Alexandre Pouget

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2023-03-22

Nature Communications (published)

Blockwise self-supervised learning with Barlow Twins

Shoaib Ahmed Siddiqui

David Scott Krueger

Stephane Deny

Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in… (see more) the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. Notably, we show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48\%, only 1.1\% below the accuracy of an end-to-end pretrained network (71.57\% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.

2023-02-01

ICLR.cc/2023/Conference (rejected)

openreview.net

Deep learning, reinforcement learning, and world models

Yu Matsuo

Maneesh Sahani

Doina Precup

David Silver

Masashi Sugiyama

Eiji Uchibe

J. Morimoto

2022-08-01

Neural Networks (published)

Biasly: a machine learning based platform for automatic racial discrimination detection in online texts

David Bamman

Chris Dyer

Noah A. Smith. 2014

Steven Bird

Ewan Klein

Edward Loper

Nat-527

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina Toutanova. 2019

Bert

Samuel Gehman

Suchin Gururangan

Maarten Sap

Dan Hendrycks

Kevin Gimpel. 2020

Gaussian

Alex Lamb

Di He … (see 22 more)

Anirudh Goyal

Guolin Ke

Feng Liao

Mirco Ravanelli

Zhenzhong Lan

Mingda Chen

Sebastian Goodman

Bernhard E. Boser

J. Denker

Don-608 nie Henderson

Robin Howard

Wayne Hubbard

Yinhan Liu

Myle Ott

Naman Goyal

Jingfei Du

Mandar Joshi

Danqi Chen

Omer Levy

Mike Lewis

Warning : this paper contains content that may 001 be offensive or upsetting. 002 Detecting hateful, toxic, and otherwise racist 003 or sexi… (see more)st language in user-generated online con-004 tents has become an increasingly important task 005 in recent years. Indeed, the anonymity, the 006 transience, the size of messages, and the dif-007 ficulty of management, facilitate the diffusion 008 of racist or hateful messages across the Inter-009 net. The critical influence of this cyber-racism 010 is no longer limited to social media, but also 011 has a significant effect on our society : corpo-012 rate business operation, users’ health, crimes, 013 etc. Traditional racist speech reporting chan-014 nels have proven inadequate due to the enor-015 mous explosion of information, so there is an 016 urgent need for a method to automatically and 017 promptly detect texts with racial discrimination. 018 We propose in this work, a machine learning-019 based approach to enable automatic detection 020 of racist text content over the internet. State-of-021 the-art machine learning models that are able 022 to grasp language structures are adapted in this 023 study. Our main contribution include 1) a large 024 scale racial discrimination data set collected 025 from three distinct sources and annotated ac-026 cording to a guideline developed by specialists, 027 2) a set of machine learning models with vari-028 ous architectures for racial discrimination de-029 tection, and 3) a web-browser-based software 030 that assist users to debias their texts when us-031 ing the internet. All these resources are made 032 publicly available.

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution

Anthony Zador

Blake Richards

Bence Ölveczky

Sean Escola

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Timothy P. Lillicrap

Adam Marblestone

Bruno Olshausen

Alexandre Pouget … (see 7 more)

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2022-01-01

arXiv.org (preprint)

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution

Anthony Zador

Blake Richards

Bence Ölveczky

Sean Escola

Kwabena Boahen

Matthew Botvinick

Dmitri Chklovskii

Anne Churchland

Claudia Clopath

James DiCarlo

Surya Ganguli

Jeff Hawkins

Konrad Paul Kording

Alexei Koulakov

Timothy P. Lillicrap

Adam Marblestone

Bruno Olshausen

Alexandre Pouget … (see 7 more)

Cristina Savin

Terrence Sejnowski

Eero Simoncelli

Sara Solla

David Sussillo

Andreas S. Tolias

Doris Tsao

2022-01-01

arXiv.org (preprint)

Deep learning for AI

Geoffrey Hinton

How can neural networks learn the rich internal representations required for difficult tasks such as recognizing objects or understanding la… (see more)nguage?

2021-06-21

Communications of the ACM (published)