Publications

Multi-ancestry polygenic risk scores using phylogenetic regularization

Elliot Layne

Shadi Zabad

Yue Li

Mathieu Blanchette

2024-02-17

bioRxiv (prépublication)

doi.org

Deep Equilibrium Models For Algorithmic Reasoning

Sophie Xhonneux

Yu He

Andreea Deac

Jian Tang

Gauthier Gidel

In this blogpost we discuss the idea of teaching neural networks to reach fixed points when reasoning. Specifically, on the algorithmic reas… (voir plus)oning benchmark CLRS the current neural networks are told the number of reasoning steps they need. While a quick fix is to add a termination network that predicts when to stop, a much more salient inductive bias is that the neural network shouldn't change it's answer any further once the answer is correct, i.e. it should reach a fixed point. This is supported by denotational semantics, which tells us that while loops that terminate are the minimum fixed points of a function. We implement this idea with the help of deep equilibrium models and discuss several hurdles one encounters along the way. We show on several algorithms from the CLRS benchmark the partial success of this approach and the difficulty in making it work robustly across all algorithms.

2024-02-16

ICLR.cc/2024/BlogPosts (publié)

openreview.net

Deep Equilibrium Models For Algorithmic Reasoning

Sophie Xhonneux

Yu He

Andreea Deac

Jian Tang

Gauthier Gidel

In this blogpost we discuss the idea of teaching neural networks to reach fixed points when reasoning. Specifically, on the algorithmic reas… (voir plus)oning benchmark CLRS the current neural networks are told the number of reasoning steps they need. While a quick fix is to add a termination network that predicts when to stop, a much more salient inductive bias is that the neural network shouldn't change it's answer any further once the answer is correct, i.e. it should reach a fixed point. This is supported by denotational semantics, which tells us that while loops that terminate are the minimum fixed points of a function. We implement this idea with the help of deep equilibrium models and discuss several hurdles one encounters along the way. We show on several algorithms from the CLRS benchmark the partial success of this approach and the difficulty in making it work robustly across all algorithms.

2024-02-16

ICLR.cc/2024/BlogPosts (accepté)

openreview.net

Distributional GFlowNets with Quantile Flows

Dinghuai Zhang

Ling Pan

Ricky T. Q. Chen

Aaron Courville

Yoshua Bengio

Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating com… (voir plus)plex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.

2024-02-16

TMLR (accepté)

doi.org

openreview.net

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes

Quentin Garrido

Jean Ponce

Xinlei Chen

Michael Rabbat

Yann LeCun

Mahmoud Assran

Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (voir plus)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

2024-02-15

ArXiv (prépublication)

doi.org

arxiv.org

Revisiting Feature Prediction for Learning Visual Representations from Video

Adrien Bardes

Quentin Garrido

Jean Ponce

Xinlei Chen

Michael Rabbat

Yann LeCun

Mahmoud Assran

Nicolas Ballas

This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (voir plus)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.

2024-02-15

ArXiv (prépublication)

doi.org

arxiv.org

Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning

Ziyang Song

Qincheng Lu

He Zhu

David Buckeridge

Yue Li

Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (voir plus)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.

2024-02-14

ArXiv (prépublication)

arxiv.org

Diagnosis Model for Detection of e-threats Against Soft-Targets

Sónia M. A. Morgado

Margarida Carvalho

Sérgio Felgueiras

2024-02-14

Lecture Notes in Networks and Systems (publié)

doi.org

Gaussian-process-based Bayesian optimization for neurostimulation interventions in rats

Léo Choinière

Rose Guay-Hottin

Rémi Picard

Guillaume Lajoie

Marco Bonizzato

Numa Dancause

2024-02-14

STAR Protocols (publié)

doi.org

Information Complexity of Stochastic Convex Optimization: Applications to Generalization and Memorization

Idan Attias

Gintare Karolina Dziugaite

MAHDI HAGHIFAM

Roi Livni

Daniel M. Roy

In this work, we investigate the interplay between memorization and learning in the context of \emph{stochastic convex optimization} (SCO). … (voir plus)We define memorization via the information a learning algorithm reveals about its training data points. We then quantify this information using the framework of conditional mutual information (CMI) proposed by Steinke and Zakynthinou (2020). Our main result is a precise characterization of the tradeoff between the accuracy of a learning algorithm and its CMI, answering an open question posed by Livni (2023). We show that, in the

2024-02-14

ArXiv (prépublication)

doi.org

arxiv.org

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo

Yunpeng Liu

Frank Wood

Geoff Pleiss

In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning meth… (voir plus)ods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.

2024-02-14

ArXiv (prépublication)

doi.org

arxiv.org

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Jinsoo Yoo

Yunpeng Liu

Frank Wood

Geoff Pleiss

In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning meth… (voir plus)ods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory.

2024-02-14

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications