Publications

LLMs can learn self-restraint through iterative self-reflection
Alexandre Piché
Aristides Milios
Chris Pal
Unmasking Efficiency: Learning Salient Sparse Models in Non-IID Federated Learning
Riyasat Ohib
Bishal Thapaliya
Jingyu Liu 0001
Vince D. Calhoun
Sergey M. Plis
In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient commu… (see more)nication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are communicated each round between the clients and the server. We validate SSFL's effectiveness using standard non-IID benchmarks, noting marked improvements in the sparsity--accuracy trade-offs. Finally, we deploy our method in a real-world federated learning framework and report improvement in communication time.
Best Response Shaping
Milad Aghajohari
Tim Cooijmans
Juan Agustin Duque
Shunichi Akatsuka
We investigate the challenge of multi-agent deep reinforcement learning in partially competitive environments, where traditional methods str… (see more)uggle to foster reciprocity-based cooperation. LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent. However, there is a key limitation in these techniques. Because they consider a few optimization steps, a learning opponent that takes many steps to optimize its return may exploit them. In response, we introduce a novel approach, Best Response Shaping (BRS), which differentiates through an opponent approximating the best response, termed the "detective." To condition the detective on the agent's policy for complex games we propose a state-aware differentiable conditioning mechanism, facilitated by a question answering (QA) method that extracts a representation of the agent based on its behaviour on specific environment states. To empirically validate our method, we showcase its enhanced performance against a Monte Carlo Tree Search (MCTS) opponent, which serves as an approximation to the best response in the Coin Game. This work expands the applicability of multi-agent RL in partially competitive environments and provides a new pathway towards achieving improved social welfare in general sum games.
GRouNdGAN: GRN-guided simulation of single-cell RNA-seq data using causal generative adversarial networks
Yazdan Zinati
Abdulrahman Takiddeen
We introduce GRouNdGAN, a gene regulatory network (GRN)-guided causal implicit generative model for simulating single-cell RNA-seq data, in-… (see more)silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on three experimental datasets, we show that our model captures non-linear TF-gene dependences and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. Despite imposing rigid causality constraints, it outperforms state-of-the-art simulators in generating realistic cells. GRouNdGAN learns meaningful causal regulatory dynamics, allowing sampling from both observational and interventional distributions. This enables it to synthesize cells under conditions that do not occur in the dataset at inference time, allowing to perform in-silico TF knockout experiments. Our results show that in-silico knockout of cell type-specific TFs significantly reduces cells of that type being generated. Interactions imposed through the GRN are emphasized in the simulated datasets, resulting in GRN inference algorithms assigning them much higher scores than interactions not imposed but of equal importance in the experimental training dataset. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest. Our results show that GRouNdGAN is a stable, realistic, and effective simulator with various applications in single-cell RNA-seq analysis.
Imitation Learning from Observation through Optimal Transport
Wei-Di Chang
Scott Fujimoto
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Haque Ishfaq
Yixin Tan
Yu Yang
Qingfeng Lan
Jianfeng Lu
A. Rupam Mahmood
Pan Xu
Preface of UniReps: the First Workshop on Unifying Representations in Neural Models
Marco Fumero
Emanuele Rodolá
Clementine Domine
Francesco Locatello
Karolina Dziugaite
Caron Mathilde
Discover why, when and how distinct learning processes yield similar representations, and the degree to which these can be unified.
Protocol to perform integrative analysis of high-dimensional single-cell multimodal data using an interpretable deep learning technique
Manqi Zhou
Hao Zhang
Zilong Bai
Dylan Mann-Krzisnik
Fei Wang
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
Adriana Hugessen
Roger Creus Castanyer
Faisal Mohamed
Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be eff… (see more)ective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents
On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Johan Samir Obando Ceron
João Guilherme Madeira Araújo
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and car… (see more)eful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.
What Mechanisms Does Knowledge Distillation Distill?
Cindy Wu
Ekdeep Singh Lubana
Bruno Mlodozeniec
Robert Kirk
Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is uncle… (see more)ar if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.
CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots
Nikhil Kakodkar
Dmitriy Rivkin
Bobak H. Baghi
Francois Hogan