Portrait de Eugene Belilovsky

Eugene Belilovsky

Membre académique associé
Professeur adjoint, Concordia University, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage continu
Apprentissage fédéré
Apprentissage profond
Grands modèles de langage (LLM)
Optimisation

Biographie

Eugene Belilovsky est professeur adjoint au Département d'informatique et de génie logiciel de l'Université Concordia. Il est également membre associé de Mila – Institut québécois d’intelligence artificielle et professeur adjoint à l'Université de Montréal. Ses travaux se concentrent sur la vision par ordinateur et l'apprentissage profond. Ses intérêts de recherche actuels comprennent l'apprentissage continu, l'apprentissage à partir de peu de données (few-shot learning) et leurs applications au carrefour de la vision par ordinateur et du traitement du langage.

Étudiants actuels

Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Postdoctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice alumni - UdeM
Superviseur⋅e principal⋅e :
Doctorat - Concordia
Co-superviseur⋅e :

Publications

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
Nasir Mohammad Khalid
Tianhao Xie
Tiberiu Popa
We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms… (voir plus) the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model. While previous works have focused on stylization or required training of generative models we perform optimization on mesh parameters directly to generate shape, texture or both. To constrain the optimization to produce plausible meshes and textures we introduce a number of techniques using image augmentations and the use of a pretrained prior that generates CLIP image embeddings given a text embedding.
Towards Scaling Difference Target Propagation by Learning Backprop Targets
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Sudhir Mudur
Continual Learning research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic forgett… (voir plus)ing is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Muawiz Sajjad Chaudhary
Christian Desrosiers
S Ebrahimi Kahou
Batch normalization is a staple of computer vision models, including those employed in few-shot learning. Batch nor-malization layers in con… (voir plus)volutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters
Parametric Scattering Networks
The wavelet scattering transform creates geometric invariants and deformation stability. In multiple signal domains, it has been shown to yi… (voir plus)eld more discriminative representations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering transform are typically selected to create a tight frame via a parameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is optimal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scattering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective representations.
Local Learning with Neuron Groups
Adeetya Patel
Michael Eickenberg
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (voir plus)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes' representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks
Generative Compositional Augmentations for Scene Graph Prediction
Cătălina Cangea
Graham W. Taylor
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of v… (voir plus)ision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. . However, test images might contain zero- and few-shot compositions of objects and relationships, e.g. . Despite each of the object categories and the predicate (e.g. 'on') being frequent in the training data, the models often fail to properly understand such unseen or rare compositions. To improve generalization, it is natural to attempt increasing the diversity of the training distribution. However, in the graph domain this is non-trivial. To that end, we propose a method to synthesize rare yet plausible scene graphs by perturbing real ones. We then propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs and learn from them in a joint fashion. When evaluated on the Visual Genome dataset, our approach yields marginal, but consistent improvements in zero- and few-shot metrics. We analyze the limitations of our approach indicating promising directions for future research.
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
Cătălina Cangea
Graham W. Taylor
Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between… (voir plus) them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, we identify two key issues that limit such generalization. Firstly, we show that the standard loss used in this task is unintentionally a function of scene graph density. This leads to the neglect of individual edges in large sparse graphs during training, even though these contain diverse few-shot examples that are important for generalization. Secondly, the frequency of relationships can create a strong bias in this task, such that a blind model predicting the most frequent relationship achieves good performance. Consequently, some state-of-the-art models exploit this bias to improve results. We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA. To address these issues, we introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics. Compared to other works in this direction, our enhancements require only a few lines of code and no added computational cost. We also highlight the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduce a novel weighted metric.
Online Continual Learning with Maximally Interfered Retrieval
Lucas Caccia
Massimo Caccia
Tinne Tuytelaars
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for mo… (voir plus)dern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at this https URL.
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Cătălina Cangea
Pietro Lio
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely o… (voir plus)n its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.
Blindfold Baselines for Embodied QA
We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question b… (voir plus)y intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.