Portrait of Leo Feng is unavailable

Leo Feng

Alumni

Publications

Adaptive Teachers for Amortized Samplers
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnorma… (see more)lized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the \teacher) to guide the training of the primary amortized sampler (the \student). The \teacher, an auxiliary behavior model, is trained to sample high-loss regions of the \student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage. Source code is available at https://github.com/alstn12088/adaptive-teacher.
Were RNNs All We Needed?
Frederick Tung
Mohamed Osama Ahmed
Hossein Hajimirsadeghi
Were RNNs All We Needed?
Frederick Tung
Mohamed Osama Ahmed
Hossein Hajimirsadeghi
The introduction of Transformers in 2017 reshaped the landscape of deep learning. Originally proposed for sequence modelling, Transformers h… (see more)ave since achieved widespread success across various domains. However, the scalability limitations of Transformers - particularly with respect to sequence length - have sparked renewed interest in novel recurrent models that are parallelizable during training, offer comparable performance, and scale more effectively. In this work, we revisit sequence modelling from a historical perspective, focusing on Recurrent Neural Networks (RNNs), which dominated the field for two decades before the rise of Transformers. Specifically, we examine LSTMs (1997) and GRUs (2014). We demonstrate that by simplifying these models, we can derive minimal versions (minLSTMs and minGRUs) that (1) use fewer parameters than their traditional counterparts, (2) are fully parallelizable during training, and (3) achieve surprisingly competitive performance on a range of tasks, rivalling recent models including Transformers.
Memory Efficient Neural Processes via Constant Memory Attention Block
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Attention as an RNN
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Greg Mori
The advent of Transformers marked a significant breakthrough in sequence modelling, providing a highly performant architecture capable of le… (see more)veraging GPU parallelism. However, Transformers are computationally expensive at inference time, limiting their applications, particularly in low-resource settings (e.g., mobile and embedded devices). Addressing this, we (1) begin by showing that attention can be viewed as a special Recurrent Neural Network (RNN) with the ability to compute its \textit{many-to-one} RNN output efficiently. We then (2) show that popular attention-based models such as Transformers can be viewed as RNN variants. However, unlike traditional RNNs (e.g., LSTMs), these models cannot be updated efficiently with new tokens, an important property in sequence modelling. Tackling this, we (3) introduce a new efficient method of computing attention's \textit{many-to-many} RNN output based on the parallel prefix scan algorithm. Building on the new attention formulation, we (4) introduce \textbf{Aaren}, an attention-based module that can not only (i) be trained in parallel (like Transformers) but also (ii) be updated efficiently with new tokens, requiring only constant memory for inferences (like traditional RNNs). Empirically, we show Aarens achieve comparable performance to Transformers on
Generative Active Learning for the Search of Small-Molecule Protein Binders
Cheng-Hao Liu
Éric Jolicoeur
Edward Ruediger
Andrei Nica
Daniel St-Cyr
Doris Alexandra Schuetz
Victor Ion Butoi
Saikrishna Gottipati
Prateek Gupta
Sasikanth Avancha
William Hamilton
Brooks Paige
Sanchit Misra
Bharat Kaul
José Miguel Hernández-Lobato
Marwin Segler
Michael Bronstein
Anne Marinier
Mike Tyers
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (see more)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
Constant Memory Attention Block
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Latent Bottlenecked Attentive Neural Processes
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Neural Processes (NPs) are popular methods in meta-learning that can estimate predictive uncertainty on target datapoints by conditioning on… (see more) a context dataset. Previous state-of-the-art method Transformer Neural Processes (TNPs) achieve strong performance but require quadratic computation with respect to the number of context datapoints, significantly limiting its scalability. Conversely, existing sub-quadratic NP variants perform significantly worse than that of TNPs. Tackling this issue, we propose Latent Bottlenecked Attentive Neural Processes (LBANPs), a new computationally efficient sub-quadratic NP variant, that has a querying computational complexity independent of the number of context datapoints. The model encodes the context dataset into a constant number of latent vectors on which self-attention is performed. When making predictions, the model retrieves higher-order information from the context dataset via multiple cross-attention mechanisms on the latent vectors. We empirically show that LBANPs achieve results competitive with the state-of-the-art on meta-regression, image completion, and contextual multi-armed bandits. We demonstrate that LBANPs can trade-off the computational cost and performance according to the number of latent vectors. Finally, we show LBANPs can scale beyond existing attention-based NP variants to larger dataset settings.
Constant Memory Attentive Neural Processes
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Tree Cross Attention
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Cross Attention is a popular method for retrieving information from a set of context tokens for making predictions. At inference time, for e… (see more)ach prediction, Cross Attention scans the full set of
Efficient Queries Transformer Neural Processes
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization
The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem… (see more) can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.