Portrait of Eugene Belilovsky is unavailable

Eugene Belilovsky

Associate Academic Member
Assistant Professor, Concordia University, Department of Computer Science and Software Engineering
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning
Distributed Systems
Optimization

Biography

Eugene Belilovsky is an assistant professor in the Department of Computer Science and Software Engineering at Concordia University.

He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and an adjunct professor at Université de Montréal.

Belilovsky’s research specialties lie in computer vision and deep learning. His current interests include continual learning and few-shot learning, along with applications of these aspects at the intersection of computer vision and language processing.

Current Students

PhD - Concordia University
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
PhD - Concordia University
Postdoctorate - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Principal supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Concordia University

Publications

Non-Uniform Parameter-Wise Model Merging
Albert M. Orozco Camacho
Stefan Horoi
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (see more)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.
Sketch-guided Cage-based 3D Gaussian Splatting Deformation
Tianhao Xie
Tiberiu Popa
3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and c… (see more)omputer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre ERBACHER
Louis Serrano
Edouard Oyallon
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
Controlling Forgetting with Test-Time Data in Continual Learning
Vaibhav Singh
Rahaf Aljundi
Foundational vision-language models excel in various tasks but require updates as new tasks or domains emerge. Current Continual Learning (C… (see more)L) methods, which focus on supervised training, often suffer from significant forgetting, performing worse than the original models in zero-shot scenarios. This work proposes leveraging test-time, unsupervised data in a self-supervised manner to refresh the model’s memory of previously learned tasks, minimizing forgetting without additional labeling. By introducing a student-teacher framework with gradient-based sparse parameter updates, the approach enhances performance on prior tasks and reduces reliance on offline memory buffers, effectively improving continual learning outcomes.
Understanding Permutation Based Model Merging with Feature Visualizations
Congshu Zou
geraldin nanfack
Stefan Horoi
Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (see more)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.
Understanding Permutation Based Model Merging with Feature Visualizations
Congshu Zou
geraldin nanfack
Stefan Horoi
Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (see more)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Louis Fournier
Adel Nabli
Masih Aminbeidokhti
Edouard Oyallon
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at … (see more)an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.
Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Humza Wajid Hameed
G'eraldin Nanfack
Spurious correlations are a major source of errors for machine learning models, in particular when aiming for group-level fairness. It has b… (see more)een recently shown that a powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset, isolating robust features for the predictor. However, key attributes can sometimes be discarded by neural networks towards the last layer. In this work, we thus consider retraining a classifier on a set of features derived from all layers. We utilize a recently proposed feature selection strategy to select unbiased features from all the layers. We observe this approach gives significant improvements in worst-group accuracy on several standard benchmarks.
Accelerating Training with Neuron Interaction and Nowcasting Networks
Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (see more) learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps and periodically, only every few steps, a WNN nowcasts (predicts near future) parameters. We improve WNNs by proposing neuron interaction and nowcasting (NiNo) networks. In contrast to WNNs, NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters. We further show that in some networks, such as Transformers, modeling neuron connectivity accurately is challenging. We address this and other limitations, which allows NiNo to accelerate Adam training by up to 50% in vision and language tasks.
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats Leon Richter
Quentin Gregory Anthony
Timothee LESORT