Portrait of Eugene Belilovsky is unavailable

Eugene Belilovsky

Associate Academic Member
Assistant Professor, Concordia University, Department of Computer Science and Software Engineering
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning
Distributed Systems
Optimization

Biography

Eugene Belilovsky is an assistant professor in the Department of Computer Science and Software Engineering at Concordia University.

He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and an adjunct professor at Université de Montréal.

Belilovsky’s research specialties lie in computer vision and deep learning. His current interests include continual learning and few-shot learning, along with applications of these aspects at the intersection of computer vision and language processing.

Current Students

PhD - Concordia University
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
PhD - Concordia University
Postdoctorate - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Principal supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Concordia University

Publications

CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
Nasir M. Khalid
Tianhao Xie
Tiberiu Popa
Towards Scaling Difference Target Propagation by Learning Backprop Targets
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (see more) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
Parametric Scattering Networks
Shanel Gauthier
Laurent Alséne-Racicot
Michael Eickenberg
The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (see more)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Nader Asadi
Sudhir Mudur
Rahaf Aljundi
Continual Learning (CL) research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic fo… (see more)rgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.11The code to reproduce our results is publicly available at: https://github.com/rezazzr/Probing-Representation-Forgetting
Revisiting Learnable Affines for Batch Norm in Few-Shot Transfer Learning
Moslem Yazdanpanah
Christian Desrosiers
Mohammad Havaei
Batch normalization is a staple of computer vision models, including those employed in few-shot learning. Batch nor-malization layers in con… (see more)volutional neural networks are composed of a normalization step, followed by a shift and scale of these normalized features applied via the per-channel trainable affine parameters
Local Learning with Neuron Groups
Adeetya Patel
Michael Eickenberg
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
Nasir M. Khalid
Tianhao Xie
Tiberiu S. Popa
We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D supervision our method deforms… (see more) the control shape of a limit subdivided surface along with its texture map and normal map to obtain a 3D asset that corresponds to the input text prompt and can be easily deployed into games or modeling applications. We rely only on a pre-trained CLIP model that compares the input text prompt with differentiably rendered images of our 3D model. While previous works have focused on stylization or required training of generative models we perform optimization on mesh parameters directly to generate shape, texture or both. To constrain the optimization to produce plausible meshes and textures we introduce a number of techniques using image augmentations and the use of a pretrained prior that generates CLIP image embeddings given a text embedding.
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
MohammadReza Davari
Nader Asadi
Sudhir Mudur
Rahaf Aljundi
Continual Learning (CL) research typically focuses on tackling the phenomenon of catastrophic forgetting in neural networks. Catastrophic fo… (see more)rgetting is associated with an abrupt loss of knowledge previously learned by a model when the task, or more broadly the data distribution, being trained on changes. In supervised learning problems this forgetting, resulting from a change in the model's representation, is typically measured or observed by evaluating the decrease in old task performance. However, a model's representation can change without losing knowledge about prior tasks. In this work we consider the concept of representation forgetting, observed by using the difference in performance of an optimal linear classifier before and after a new task is introduced. Using this tool we revisit a number of standard continual learning benchmarks and observe that, through this lens, model representations trained without any explicit control for forgetting often experience small representation forgetting and can sometimes be comparable to methods which explicitly control for forgetting, especially in longer task sequences. We also show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning. Based on our results, we show that a simple yet competitive approach is to learn representations continually with standard supervised contrastive learning while constructing prototypes of class samples when queried on old samples.11The code to reproduce our results is publicly available at: https://github.com/rezazzr/Probing-Representation-Forgetting
New Insights on Reducing Abrupt Representation Change in Online Continual Learning
Lucas Caccia
Rahaf Aljundi
Nader Asadi
Tinne Tuytelaars
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (see more)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes’ representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks.
Generative Compositional Augmentations for Scene Graph Prediction
Harm de Vries
Cătălina Cangea
Graham W. Taylor
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of v… (see more)ision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. . However, test images might contain zero- and few-shot compositions of objects and relationships, e.g. . Despite each of the object categories and the predicate (e.g. ‘on’) being frequent in the training data, the models often fail to properly understand such unseen or rare compositions. To improve generalization, it is natural to attempt increasing the diversity of the training distribution. However, in the graph domain this is non-trivial. To that end, we propose a method to synthesize rare yet plausible scene graphs by perturbing real ones. We then propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs and learn from them in a joint fashion. When evaluated on the Visual Genome dataset, our approach yields marginal, but consistent improvements in zero- and few-shot metrics. We analyze the limitations of our approach indicating promising directions for future research.
Parametric Scattering Networks
Shanel Gauthier
Benjamin Th'erien
Laurent Alséne-Racicot
Michael Eickenberg
The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (see more)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
Harm de Vries
Cătălina Cangea
Graham W. Taylor