Publications

Multiple-model coding scheme for electrical signal compression
Corentin Presvôts
Michel Kieffer
Thibault Prevost
Patrick Panciatici
Zuxing Li
Negative Language Transfer Identification in the English Writing of Chinese and Farsi Native Speakers
Mohammad Karimiabdolmaleki
Leticia Farias Wanderley
Mohsen Rezazadeh
Carrie Demmans Epp
Neural Kinematic Bases for Fluids
Yibo Liu
Paul Kry
Kenny Erleben
Sune Darkner
Teseo Schneider
Online Interior-point Methods for Time-varying Equality-constrained Optimization
Jean-Luc Lupien
Iman Shames
Performance Smells in ML and Non-ML Python Projects: A Comparative Study
Franccois Belias
Leuson Da Silva
Cyrine Zid
Prism: Dynamic and Flexible Benchmarking of LLMs Code Generation with Monte Carlo Tree Search
Vahid Majdinasab
Amin Nikanjam
Progressive Multi-Source Domain Adaptation for Personalized Facial Expression Recognition
Muhammad Osama Zeeshan
Alessandro Lameiras Koerich
Eric Grange
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
Xinlei Chen
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
Xinlei Chen
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
Semantic Commit: Helping Users Update Intent Specifications for AI Memory at Scale
Priyan Vaithilingam
Frida-Cecilia Acosta-Parenteau
Daniel Lee
Amine Mhedhbi
Elena L. Glassman
Sliced-Wasserstein Distance-based Data Selection
We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance for training data selection in machine learn… (voir plus)ing approaches. Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors, e.g., power systems, as it offers a conservative data selection and an optimal transport interpretation. To ensure the scalability of our method, we provide two efficient approximations. The first approximation processes reduced-cardinality representations of the datasets concurrently. The second makes use of a computationally light Euclidian distance approximation. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection. Finally, we employ our method as part of a first forecasting benchmark for our open-source dataset.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu Owen He
Ignacio Rocco
Mehdi S. M. Sajjadi