Publications

From Points to Functions: Infinite-dimensional Representations in Diffusion Models
Sarthak Mittal
Stefan Bauer
Arash Mehrjou
Diffusion-based generative models learn to iteratively transfer unstructured noise to a complex target distribution as opposed to Generative… (see more) Adversarial Networks (GANs) or the decoder of Variational Autoencoders (VAEs) which produce samples from the target distribution in a single step. Thus, in diffusion models every sample is naturally connected to a random trajectory which is a solution to a learned stochastic differential equation (SDE). Generative models are only concerned with the final state of this trajectory that delivers samples from the desired distribution. Abstreiter et. al showed that these stochastic trajectories can be seen as continuous filters that wash out information along the way. Consequently, it is reasonable to ask if there is an intermediate time step at which the preserved information is optimal for a given downstream task. In this work, we show that a combination of information content from different time steps gives a strictly better representation for the downstream task. We introduce an attention and recurrence based modules that ``learn to mix'' information content of various time-steps such that the resultant representation leads to superior performance in downstream tasks.
Improving Source Separation by Explicitly Modeling Dependencies between Sources
Ethan Manilow
Curtis Hawthorne
Bryan A. Pardo
Jesse Engel
We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all c… (see more)ombinations of sources in a mixture. Rather than independently estimating each source from a mix, we reframe the source separation problem as an Orderless Neural Autoregressive Density Estimator (NADE), and estimate each source from both the mix and a random subset of the other sources. We adapt a standard source separation architecture, Demucs, with additional inputs for each individual source, in addition to the input mixture. We randomly mask these input sources during training so that the network learns the conditional dependencies between the sources. By pairing this training method with a blocked Gibbs sampling procedure at inference time, we demonstrate that the network can iteratively improve its separation performance by conditioning a source estimate on its earlier source estimates. Experiments on two source separation datasets show that training a Demucs model with an Orderless NADE approach and using Gibbs sampling (up to 512 steps) at inference time strongly outperforms a Demucs baseline that uses a standard regression loss and direct (one step) estimation of sources.
Learning What You Need from What You Did: Product Taxonomy Expansion with User Behaviors Supervision
Sijie Cheng
Zhouhong Gu
Rui Xie
Wei Wu
Yanghua Xiao
Taxonomies have been widely used in various domains to underpin numerous applications. Specially, product taxonomies serve an essential role… (see more) in the e-commerce domain for the recommendation, browsing, and query understanding. However, taxonomies need to constantly capture the newly emerged terms or concepts in e-commerce platforms to keep up-to-date, which is expensive and labor-intensive if it relies on manual maintenance and updates. Therefore, we target the taxonomy expansion task to attach new concepts to existing taxonomies automatically. In this paper, we present a self-supervised and user behavior-oriented product taxonomy expansion framework to append new concepts into existing taxonomies. Our framework extracts hyponymy relations that conform to users' intentions and cognition. Specifically, i) to fully exploit user behavioral information, we extract candidate hyponymy relations that match user interests from query-click concepts; ii) to enhance the semantic information of new concepts and better detect hyponymy relations, we model concepts and relations through both user-generated content and structural information in existing taxonomies and user click logs, by leveraging Pre-trained Language Models and Graph Neural Network combined with Contrastive Learning; iii) to reduce the cost of dataset construction and overcome data skews, we construct a high-quality and balanced training dataset from existing taxonomy with no supervision. Extensive experiments on real-world product taxonomies in Meituan Platform, a leading Chinese vertical e-commerce platform to order take-out with more than 70 million daily active users, demonstrate the superiority of our proposed framework over state-of-the-art methods. Notably, our method enlarges the size of real-world product taxonomies from 39,263 to 94,698 relations with 88% precision. Our implementation is available: https://github.com/AdaCheng/Product_Taxonomy_Expansion.
Forgetting Enhances Episodic Control With Structured Memories
Annik Yalnizyan-Carson
Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limi… (see more)tations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.
Inductive Biases for Relational Tasks
Current deep learning approaches have shown good in-distribution performance but struggle in out-of-distribution settings. This is especiall… (see more)y true in the case of tasks involving abstract relations like recognizing rules in sequences, as required in many intelligence tests. In contrast, our brains are remarkably flexible at such tasks, an attribute that is likely linked to anatomical constraints on computations. Inspired by this, recent work has explored how enforcing that relational representations remain distinct from sensory representations can help artificial systems. Building on this work, we further explore and formalize the advantages afforded by ``partitioned'' representations of relations and sensory details. We investigate inductive biases that ensure abstract relations are learned and represented distinctly from sensory data across several neural network architectures and show that they outperform existing architectures on out-of-distribution generalization for various relational tasks. These results show that partitioning relational representations from other information streams may be a simple way to augment existing network architectures' robustness when performing relational computations.
Neurobiological Correlates of Change in Adaptive Behavior in Autism.
Charlotte M. Pretzsch
Tim Schäfer
Michael V. Lombardo
Varun Warrier
Caroline Mann
Anke Bletsch
Chris H. Chatham
Dorothea L. Floris
Julian Tillmann
Afsheen Yousaf
Emily J. H. Jones
Tony Charman
Sara Ambrosino
Thomas Bourgeron
Eva Loth
Beth Oakley
Jan K. Buitelaar
Freddy Cliquet
Claire Leblond … (see 7 more)
Simon Baron-Cohen
Christian Beckmann
Tobias Banaschewski
Sarah Durston
Christine M. Freitag
Declan Murphy
Christine Ecker
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
Nasir M. Khalid
Tianhao Xie
Tiberiu S. Popa
We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms… (see more) the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model. While previous works have focused on stylization or required training of generative models we perform optimization on mesh parameters directly to generate shape, texture or both. To constrain the optimization to produce plausible meshes and textures we introduce a number of techniques using image augmentations and the use of a pretrained prior that generates CLIP image embeddings given a text embedding.
Monoallelic Heb/Tcf12 Deletion Reduces the Requirement for NOTCH1 Hyperactivation in T-Cell Acute Lymphoblastic Leukemia
Diogo F. T. Veiga
Mathieu Tremblay
Bastien Gerby
Sabine Herblot
André Haman
Patrick Gendron
Juan Carlos Zúñiga-Pflücker
Josée Hébert
Joseph Paul Cohen
Trang Hoang
Early T-cell development is precisely controlled by E proteins, that indistinguishably include HEB/TCF12 and E2A/TCF3 transcription factors,… (see more) together with NOTCH1 and pre-T cell receptor (TCR) signalling. Importantly, perturbations of early T-cell regulatory networks are implicated in leukemogenesis. NOTCH1 gain of function mutations invariably lead to T-cell acute lymphoblastic leukemia (T-ALL), whereas inhibition of E proteins accelerates leukemogenesis. Thus, NOTCH1, pre-TCR, E2A and HEB functions are intertwined, but how these pathways contribute individually or synergistically to leukemogenesis remain to be documented. To directly address these questions, we leveraged Cd3e-deficient mice in which pre-TCR signaling and progression through β-selection is abrogated to dissect and decouple the roles of pre-TCR, NOTCH1, E2A and HEB in SCL/TAL1-induced T-ALL, via the use of Notch1 gain of function transgenic (Notch1ICtg) and Tcf12+/- or Tcf3+/- heterozygote mice. As a result, we now provide evidence that both HEB and E2A restrain cell proliferation at the β-selection checkpoint while the clonal expansion of SCL-LMO1-induced pre-leukemic stem cells in T-ALL is uniquely dependent on Tcf12 gene dosage. At the molecular level, HEB protein levels are decreased via proteasomal degradation at the leukemic stage, pointing to a reversible loss of function mechanism. Moreover, in SCL-LMO1-induced T-ALL, loss of one Tcf12 allele is sufficient to bypass pre-TCR signaling which is required for Notch1 gain of function mutations and for progression to T-ALL. In contrast, Tcf12 monoallelic deletion does not accelerate Notch1IC-induced T-ALL, indicating that Tcf12 and Notch1 operate in the same pathway. Finally, we identify a tumor suppressor gene set downstream of HEB, exhibiting significantly lower expression levels in pediatric T-ALL compared to B-ALL and brain cancer samples, the three most frequent pediatric cancers. In summary, our results indicate a tumor suppressor function of HEB/TCF12 in T-ALL to mitigate cell proliferation controlled by NOTCH1 in pre-leukemic stem cells and prevent NOTCH1-driven progression to T-ALL.
Monoallelic Heb/Tcf12 Deletion Reduces the Requirement for NOTCH1 Hyperactivation in T-Cell Acute Lymphoblastic Leukemia
Diogo F. T. Veiga
Mathieu R. Tremblay
Bastien Gerby
Sabine Herblot
André Haman
Patrick Gendron
J. Zúñiga-Pflücker
Josée Hébert
Joseph Paul Cohen
T. Hoang
Early T-cell development is precisely controlled by E proteins, that indistinguishably include HEB/TCF12 and E2A/TCF3 transcription factors,… (see more) together with NOTCH1 and pre-T cell receptor (TCR) signalling. Importantly, perturbations of early T-cell regulatory networks are implicated in leukemogenesis. NOTCH1 gain of function mutations invariably lead to T-cell acute lymphoblastic leukemia (T-ALL), whereas inhibition of E proteins accelerates leukemogenesis. Thus, NOTCH1, pre-TCR, E2A and HEB functions are intertwined, but how these pathways contribute individually or synergistically to leukemogenesis remain to be documented. To directly address these questions, we leveraged Cd3e-deficient mice in which pre-TCR signaling and progression through β-selection is abrogated to dissect and decouple the roles of pre-TCR, NOTCH1, E2A and HEB in SCL/TAL1-induced T-ALL, via the use of Notch1 gain of function transgenic (Notch1ICtg) and Tcf12+/- or Tcf3+/- heterozygote mice. As a result, we now provide evidence that both HEB and E2A restrain cell proliferation at the β-selection checkpoint while the clonal expansion of SCL-LMO1-induced pre-leukemic stem cells in T-ALL is uniquely dependent on Tcf12 gene dosage. At the molecular level, HEB protein levels are decreased via proteasomal degradation at the leukemic stage, pointing to a reversible loss of function mechanism. Moreover, in SCL-LMO1-induced T-ALL, loss of one Tcf12 allele is sufficient to bypass pre-TCR signaling which is required for Notch1 gain of function mutations and for progression to T-ALL. In contrast, Tcf12 monoallelic deletion does not accelerate Notch1IC-induced T-ALL, indicating that Tcf12 and Notch1 operate in the same pathway. Finally, we identify a tumor suppressor gene set downstream of HEB, exhibiting significantly lower expression levels in pediatric T-ALL compared to B-ALL and brain cancer samples, the three most frequent pediatric cancers. In summary, our results indicate a tumor suppressor function of HEB/TCF12 in T-ALL to mitigate cell proliferation controlled by NOTCH1 in pre-leukemic stem cells and prevent NOTCH1-driven progression to T-ALL.
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Nader Asadi
Sudhir Mudur
Rahaf Aljundi
Continual Learning (CL) research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic fo… (see more)rgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.11The code to reproduce our results is publicly available at: https://github.com/rezazzr/Probing-Representation-Forgetting
Mapping parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs
Naums Mogers
Lu Li
Valentin Radu
Graphics Processing Units (GPUs) are notoriously hard to optimize for manually. What is needed are good automatic code generators and optimi… (see more)zers. Accelerate, Futhark and Lift demonstrated that a functional approach is well suited for this challenge. Lift, for instance, uses a system of rewrite rules with a multi-stage approach. Algorithmic optimizations are first explored, followed by hardware-specific optimizations such as using shared memory and mapping parallelism. While the algorithmic exploration leads to correct transformed programs by construction, it is not necessarily true for the latter phase. Exploiting shared memory and mapping parallelism while ensuring correct synchronization is a delicate balancing act, and is hard to encode in a rewrite system. Currently, Lift relies on heuristics with ad-hoc mechanisms to check for correctness. Although this practical approach eventually produces high-performance code, it is not an ideal state of affairs. This paper proposes to extract parallelization constraints automatically from a functional IR and use a solver to identify valid rewriting. Using a convolutional neural network on a mobile GPU as a use case, this approach matches the performance of the ARM Compute Library GEMM convolution and the TVM-generated kernel consuming between 2.7x and 3.6x less memory on average. Furthermore, a speedup of 12x is achieved over the ARM Compute Library direct convolution implementation.
Mapping parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs
Naums Mogers
Lu Li
Valentin Radu
Graphics Processing Units (GPUs) are notoriously hard to optimize for manually. What is needed are good automatic code generators and optimi… (see more)zers. Accelerate, Futhark and Lift demonstrated that a functional approach is well suited for this challenge. Lift, for instance, uses a system of rewrite rules with a multi-stage approach. Algorithmic optimizations are first explored, followed by hardware-specific optimizations such as using shared memory and mapping parallelism. While the algorithmic exploration leads to correct transformed programs by construction, it is not necessarily true for the latter phase. Exploiting shared memory and mapping parallelism while ensuring correct synchronization is a delicate balancing act, and is hard to encode in a rewrite system. Currently, Lift relies on heuristics with ad-hoc mechanisms to check for correctness. Although this practical approach eventually produces high-performance code, it is not an ideal state of affairs. This paper proposes to extract parallelization constraints automatically from a functional IR and use a solver to identify valid rewriting. Using a convolutional neural network on a mobile GPU as a use case, this approach matches the performance of the ARM Compute Library GEMM convolution and the TVM-generated kernel consuming between 2.7x and 3.6x less memory on average. Furthermore, a speedup of 12x is achieved over the ARM Compute Library direct convolution implementation.