Publications

Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models
Zhong Zhang
Junming Shao
Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets
Dinghuai Zhang
Hanjun Dai
Nikolay Malkin
Ling Pan
Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to appl… (voir plus)y machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from composite unnormalized densities sequentially and have the potential to amortize such solution-searching processes in CO, as well as generate diverse solution candidates. In this paper, we design Markov decision processes (MDPs) for different combinatorial problems and propose to train conditional GFlowNets to sample from the solution space. Efficient training techniques are also developed to benefit long-range credit assignment. Through extensive experiments on a variety of different CO tasks with synthetic and realistic data, we demonstrate that GFlowNet policies can efficiently find high-quality solutions. Our implementation is open-sourced at https://github.com/zdhNarsil/GFlowNet-CombOpt.
Motor cortex latent dynamics encode arm movement direction and urgency independently
Andrea Colins Rodriguez
Lee Miller
Mark D. Humphries
Testing Feedforward Neural Networks Training Programs
Houssem Ben Braiek
An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
Saba Ahmadi
A hierarchical Bayesian brain parcellation framework for fusion of functional imaging datasets
Da Zhi
Ladan Shahshahani
Caroline Nettekoven
Ana Lúısa Pinho
Jörn Diedrichsen
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
Jade Leung
Daniel Kokotajlo
Nahema A. Marchal
Markus Anderljung
Noam Kolt
Lewis Ho
Divya Siddarth
Shahar Avin
W. Hawkins
Been Kim
Iason Gabriel
Vijay Bolina
Jack Clark
Paul F. Christiano … (voir 1 de plus)
Allan Dafoe
De novo motor learning creates structure in neural activity space that shapes adaptation
Joanna C. Chang
Lee Miller
Juan A. Gallego
Claudia Clopath
Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models
Setareh Dabiri
Vasileios Lioutas
Berend Zwartsenberg
Yunpeng Liu
Matthew Niedoba
Xiaoxuan Liang
Dylan Green
Justice Sefas
Jonathan Wilder Lavington
Adam Ścibior
When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to … (voir plus)the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution.
Think Before You Act: Decision Transformers with Internal Working Memory
Jikun Kang
Romain Laroche
Xingdi Yuan
Adam Trischler
Jie Fu
Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performan… (voir plus)ce relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Think Before You Act: Decision Transformers with Internal Working Memory
Jikun Kang
Romain Laroche
Xingdi Yuan
Adam P. Trischler
Xuefei Liu
Jie Fu
Large language model (LLM)-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performan… (voir plus)ce relies on massive data and compute. We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training. As a result, training on a new task may deteriorate the model's performance on previous tasks. In contrast to LLMs' implicit memory mechanism, the human brain utilizes distributed memory storage, which helps manage and organize multiple skills efficiently, mitigating the forgetting phenomenon. Thus inspired, we propose an internal working memory module to store, blend, and retrieve information for different downstream tasks. Evaluation results show that the proposed method improves training efficiency and generalization in both Atari games and meta-world object manipulation tasks. Moreover, we demonstrate that memory fine-tuning further enhances the adaptability of the proposed architecture.
Fourier Neural Operators for Arbitrary Resolution Climate Data Downscaling
Qidong Yang
Alex Hernandez-Garcia
Paula Harder
Venkatesh Ramesh
Prasanna Sattegeri
D. Szwarcman
C. Watson
Climate simulations are essential in guiding our understanding of climate change and responding to its effects. However, it is computational… (voir plus)ly expensive to resolve complex climate processes at high spatial resolution. As one way to speed up climate simulations, neural networks have been used to downscale climate variables from fast-running low-resolution simulations, but high-resolution training data are often unobtainable or scarce, greatly limiting accuracy. In this work, we propose a downscaling method based on the Fourier neural operator. It trains with data of a small upsampling factor and then can zero-shot downscale its input to arbitrary unseen high resolution. Evaluated both on ERA5 climate model data and on the Navier-Stokes equation solution data, our downscaling model significantly outperforms state-of-the-art convolutional and generative adversarial downscaling models, both in standard single-resolution downscaling and in zero-shot generalization to higher upsampling factors. Furthermore, we show that our method also outperforms state-of-the-art data-driven partial differential equation solvers on Navier-Stokes equations. Overall, our work bridges the gap between simulation of a physical process and interpolation of low-resolution output, showing that it is possible to combine both approaches and significantly improve upon each other.