Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
When Do We Need Graph Neural Networks for Node Classification?
Human activity recognition (HAR) is a popular research field in computer vision that has already been widely studied. However, it is still a… (voir plus)n active research field since it plays an important role in many current and emerging real-world intelligent systems, like visual surveillance and human-computer interaction. Deep reinforcement learning (DRL) has recently been used to address the activity recognition problem with various purposes, such as finding attention in video data or obtaining the best network structure. DRL-based HAR has only been around for a short time, and it is a challenging, novel field of study. Therefore, to facilitate further research in this area, we have constructed a comprehensive survey on activity recognition methods that incorporate DRL. Throughout the article, we classify these methods according to their shared objectives and delve into how they are ingeniously framed within the DRL framework. As we navigate through the survey, we conclude by shedding light on the prominent challenges and lingering questions that await the attention of future researchers, paving the way for further advancements and breakthroughs in this exciting domain.
2024-02-19
IEEE Transactions on Neural Networks and Learning Systems (publié)
Human activity recognition (HAR) is a popular research field in computer vision that has already been widely studied. However, it is still a… (voir plus)n active research field since it plays an important role in many current and emerging real-world intelligent systems, like visual surveillance and human-computer interaction. Deep reinforcement learning (DRL) has recently been used to address the activity recognition problem with various purposes, such as finding attention in video data or obtaining the best network structure. DRL-based HAR has only been around for a short time, and it is a challenging, novel field of study. Therefore, to facilitate further research in this area, we have constructed a comprehensive survey on activity recognition methods that incorporate DRL. Throughout the article, we classify these methods according to their shared objectives and delve into how they are ingeniously framed within the DRL framework. As we navigate through the survey, we conclude by shedding light on the prominent challenges and lingering questions that await the attention of future researchers, paving the way for further advancements and breakthroughs in this exciting domain.
2024-02-19
IEEE Transactions on Neural Networks and Learning Systems (publié)
In this blogpost we discuss the idea of teaching neural networks to reach fixed points when reasoning. Specifically, on the algorithmic reas… (voir plus)oning benchmark CLRS the current neural networks are told the number of reasoning steps they need. While a quick fix is to add a termination network that predicts when to stop, a much more salient inductive bias is that the neural network shouldn't change it's answer any further once the answer is correct, i.e. it should reach a fixed point. This is supported by denotational semantics, which tells us that while loops that terminate are the minimum fixed points of a function. We implement this idea with the help of deep equilibrium models and discuss several hurdles one encounters along the way. We show on several algorithms from the CLRS benchmark the partial success of this approach and the difficulty in making it work robustly across all algorithms.
In this blogpost we discuss the idea of teaching neural networks to reach fixed points when reasoning. Specifically, on the algorithmic reas… (voir plus)oning benchmark CLRS the current neural networks are told the number of reasoning steps they need. While a quick fix is to add a termination network that predicts when to stop, a much more salient inductive bias is that the neural network shouldn't change it's answer any further once the answer is correct, i.e. it should reach a fixed point. This is supported by denotational semantics, which tells us that while loops that terminate are the minimum fixed points of a function. We implement this idea with the help of deep equilibrium models and discuss several hurdles one encounters along the way. We show on several algorithms from the CLRS benchmark the partial success of this approach and the difficulty in making it work robustly across all algorithms.
Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating com… (voir plus)plex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (voir plus)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.
This paper explores feature prediction as a stand-alone objective for unsupervised learning from video and introduces V-JEPA, a collection o… (voir plus)f vision models trained solely using a feature prediction objective, without the use of pretrained image encoders, text, negative examples, reconstruction, or other sources of supervision. The models are trained on 2 million videos collected from public datasets and are evaluated on downstream image and video tasks. Our results show that learning by predicting video features leads to versatile visual representations that perform well on both motion and appearance-based tasks, without adaption of the model's parameters; e.g., using a frozen backbone. Our largest model, a ViT-H/16 trained only on videos, obtains 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet1K.
Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (voir plus)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.