Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (voir plus)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (voir plus)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (voir plus)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further pro… (voir plus)gress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through"dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through"alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.
Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Setareh Dabiri
Vasileios Lioutas
Berend Zwartsenberg
Yunpeng Liu
Matthew Niedoba
Xiaoxuan Liang
Dylan Green
Justice Sefas
Jonathan Wilder Lavington
Frank Wood
Adam Ścibior
When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to … (voir plus)the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.
Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performan… (voir plus)ce relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Think Before You Act: Decision Transformers with Internal Working Memory
Jikun Kang
Romain Laroche
Xingdi Yuan
Adam P. Trischler
Xuefei Liu
Jie Fu
Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performan… (voir plus)ce relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Climate simulations are essential in guiding our understanding of climate change and responding to its effects. However, it is computational… (voir plus)ly expensive to resolve complex climate processes at high spatial resolution. As one way to speed up climate simulations, neural networks have been used to downscale climate variables from fast-running low-resolution simulations, but high-resolution training data are often unobtainable or scarce, greatly limiting accuracy. In this work, we propose a downscaling method based on the Fourier neural operator. It trains with data of a small upsampling factor and then can zero-shot downscale its input to arbitrary unseen high resolution. Evaluated both on ERA5 climate model data and on the Navier-Stokes equation solution data, our downscaling model significantly outperforms state-of-the-art convolutional and generative adversarial downscaling models, both in standard single-resolution downscaling and in zero-shot generalization to higher upsampling factors. Furthermore, we show that our method also outperforms state-of-the-art data-driven partial differential equation solvers on Navier-Stokes equations. Overall, our work bridges the gap between simulation of a physical process and interpolation of low-resolution output, showing that it is possible to combine both approaches and significantly improve upon each other.
Missing data is a common problem in many applications. Imputing missing values is a challenging task, as the imputations need to be accurate… (voir plus) and robust to avoid introducing bias in downstream analysis. In this paper, we propose an ensemble method that combines the strengths of a manifold learning-based imputation method called MAGIC and an autoencoder deep learning model. We call our method Deep MAGIC. Deep MAGIC is trained on a linear combination of the mean squared error of the original data and the mean squared error of the MAGIC-imputed data. Experimental results on three benchmark datasets show that Deep MAGIC outperforms several state-of-the-art imputation methods, demonstrating its effectiveness and robustness in handling large amounts of missing data.