TRAIL : IA responsable pour les professionnels et les leaders
Apprenez à intégrer des pratique d'IA responsable dans votre organisation avec le programme TRAIL. Inscrivez-vous à la prochaine cohorte qui débutera le 15 avril.
Avantage IA : productivité dans la fonction publique
Apprenez à tirer parti de l’IA générative pour soutenir et améliorer votre productivité au travail. La prochaine cohorte se déroulera en ligne les 28 et 30 avril 2026.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
In this paper, hypernetworks are trained to generate behaviors across a range of unseen task conditions, via a novel TD-based training objec… (voir plus)tive and data from a set of near-optimal RL solutions for training tasks. This work relates to meta RL, contextual RL, and transfer learning, with a particular focus on zero-shot performance at test time, enabled by knowledge of the task parameters (also known as context). Our technical approach is based upon viewing each RL algorithm as a mapping from the MDP specifics to the near-optimal value function and policy and seek to approximate it with a hypernetwork that can generate near-optimal value functions and policies, given the parameters of the MDP. We show that, under certain conditions, this mapping can be considered as a supervised learning problem. We empirically evaluate the effectiveness of our method for zero-shot transfer to new reward and transition dynamics on a series of continuous control tasks from DeepMind Control Suite. Our method demonstrates significant improvements over baselines from multitask and meta RL approaches.
2023-06-25
Proceedings of the AAAI Conference on Artificial Intelligence (publié)
This work investigates the evolution of latent space when deep learning models are trained incrementally in non-stationary environments that… (voir plus) stem from concept drift. We propose a methodology for visualizing the incurred change in latent representations. We further show that classes not targeted by concept drift can be negatively affected, suggesting that the observation of all classes during learning may regularize the latent space.
2023-06-25
Proceedings of the AAAI Conference on Artificial Intelligence (publié)
This paper studies learning meaningful node representations for signed graphs, where both positive and negative links exist. This problem ha… (voir plus)s been widely studied by meticulously designing expressive signed graph neural networks, as well as capturing the structural information of the signed graph through traditional structure decomposition methods, e.g., spectral graph theory. In this paper, we propose a novel signed graph representation learning framework, called Signed Laplacian Graph Neural Network (SLGNN), which combines the advantages of both. Specifically, based on spectral graph theory and graph signal processing, we first design different low-pass and high-pass graph convolution filters to extract low-frequency and high-frequency information on positive and negative links, respectively, and then combine them into a unified message passing framework. To effectively model signed graphs, we further propose a self-gating mechanism to estimate the impacts of low-frequency and high-frequency information during message passing. We mathematically establish the relationship between the aggregation process in SLGNN and signed Laplacian regularization in signed graphs, and theoretically analyze the expressiveness of SLGNN. Experimental results demonstrate that SLGNN outperforms various competitive baselines and achieves state-of-the-art performance.
2023-06-25
Proceedings of the AAAI Conference on Artificial Intelligence (publié)
Recent studies show that task distribution plays a vital role in the meta-learner's performance. Conventional wisdom is that task diversity … (voir plus)should improve the performance of meta-learning. In this work, we find evidence to the contrary; (i) our experiments draw into question the efficacy of our learned models: similar manifolds can be learned with a subset of the data (lower task diversity). This finding questions the advantage of providing more data to the model, and (ii) adding diversity to the task distribution (higher task diversity) sometimes hinders the model and does not lead to a significant improvement in performance as previously believed. To strengthen our findings, we provide both empirical and theoretical evidence.
2023-06-25
Proceedings of the AAAI Conference on Artificial Intelligence (publié)
Partial Ordered Statistics Decoding with Enhanced Error Patterns
Marwan Jalaleddine
Huayi Zhou
Jiajie Li
Warren J. Gross
Guessing Random Additive Noise Decoding (GRAND) excels at decoding high-rate codes but struggles to decode low-rate codes with reasonable co… (voir plus)mplexity. Ordered Statistics Decoding (OSD) specifically excels in decoding short codes irrespective of rates; however, OSD necessitates the use of Gaussian elimination which introduces additional time, space and computational complexity. Partial Ordered Statistics Decoding (POSD) was proposed to reduce the time, space, and computational complexity of OSD; however, the current partition-based POSD has poor decoding performance since it does not generate test error patterns across partitions. In this paper, we propose to improve the decoding performance of POSD by incorporating test error patterns inspired by GRAND methods. This work offers a trade-off between performance and complexity compared to existing decoders such as GRAND and OSD. We enhance POSD by optimizing the scheduling of Test Error Patterns (TEPs) and show that our technique can be applied to any code in a standard form. At a target BER 10−4 with eBCH (128,64) the enhanced error patterns achieve more than 0.6 dB gain in performance compared to the POSD with partition-based error patterns. Moreover, at a target frame error rate of 10−5, POSD uses 10× less binary operations compared to GRAND when decoding eBCH (128,64) and RLC(128,64) codes. With BCH (127,29) and RLC(128,32), at a target frame error rate of 10−2, POSD with enhanced error patterns with a maximum number of queries (MQ) of 104 achieves up to a 2 dB gain to its GRAND equivalent which is using 107 maximum number of queries.
2023-06-24
2023 IEEE International Symposium on Information Theory (ISIT) (publié)
In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound f… (voir plus)or pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front.
Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement lea… (voir plus)rning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned from offline data. In this work, we follow the hypothesis that exploration and representation learning can be improved by separately learning two different models from a single offline dataset. We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward separately from a single collection of human demonstrations can significantly improve the sample efficiency on the challenging NetHack benchmark. We also ablate various components of our experimental setting and highlight crucial insights.
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (voir plus)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch