Portrait of Eugene Belilovsky is unavailable

Eugene Belilovsky

Associate Academic Member
Assistant Professor, Concordia University, Department of Computer Science and Software Engineering
Adjunct Professor, Université de Montréal, Department of Computer Science and Operations Research
Research Topics
Deep Learning
Distributed Systems
Optimization

Biography

Eugene Belilovsky is an assistant professor in the Department of Computer Science and Software Engineering at Concordia University.

He is also an associate academic member of Mila – Quebec Artificial Intelligence Institute and an adjunct professor at Université de Montréal.

Belilovsky’s research specialties lie in computer vision and deep learning. His current interests include continual learning and few-shot learning, along with applications of these aspects at the intersection of computer vision and language processing.

Current Students

Collaborating Alumni
Co-supervisor :
Master's Research - Concordia University
PhD - Concordia University
Research Intern - Concordia University
Master's Research - Concordia University
PhD - Concordia University
Co-supervisor :
Master's Research - Université de Montréal
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Master's Research - Concordia University
Co-supervisor :
Research Intern - Concordia University University
PhD - Concordia University
PhD - Concordia University
Postdoctorate - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
Collaborating researcher - Concordia University
Co-supervisor :
PhD - Concordia University
Co-supervisor :
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - Université de Montréal
Principal supervisor :
Master's Research - Concordia University
PhD - Concordia University
Master's Research - Concordia University

Publications

Understanding Permutation Based Model Merging with Feature Visualizations
Congshu Zou
geraldin nanfack
Stefan Horoi
Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (see more)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.
Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting
Humza Wajid Hameed
G'eraldin Nanfack
Spurious correlations are a major source of errors for machine learning models, in particular when aiming for group-level fairness. It has b… (see more)een recently shown that a powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset, isolating robust features for the predictor. However, key attributes can sometimes be discarded by neural networks towards the last layer. In this work, we thus consider retraining a classifier on a set of features derived from all layers. We utilize a recently proposed feature selection strategy to select unbiased features from all the layers. We observe this approach gives significant improvements in worst-group accuracy on several standard benchmarks.
Accelerating Training with Neuron Interaction and Nowcasting Networks
Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (see more) learnable update rules can be costly and unstable to train and use. A simpler recently proposed approach to accelerate training is to use Adam for most of the optimization steps and periodically, only every few steps, nowcast (predict future) parameters. We improve this approach by Neuron interaction and Nowcasting (NiNo) networks. NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters by learning in a supervised way from a set of training trajectories over multiple tasks. We show that in some networks, such as Transformers, neuron connectivity is non-trivial. By accurately modeling neuron connectivity, we allow NiNo to accelerate Adam training by up to 50\% in vision and language tasks.
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis
Stefan Horoi
Albert Manuel Orozco Camacho
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats Leon Richter
Quentin Gregory Anthony
Timothee LESORT
Model Breadcrumbs: Scalable Upcycling of Finetuned Foundation Models via Sparse Task Vectors Merging
MohammadReza Davari
Simulating federated learning for steatosis detection using ultrasound images
Yue Qi
Pedro Vianna
Alexandre Cadrin-Chênevert
Katleen Blanchet
Emmanuel Montagnon
Louis-Antoine Mullie
Guy Cloutier
Michael Chassé
An Tang
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Michael Eickenberg
Edouard Oyallon
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep l… (see more)earning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Edouard Oyallon
From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation
G'eraldin Nanfack
Michael Eickenberg
Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applicat… (see more)ions. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Louis Fournier
Adel Nabli
Masih Aminbeidokhti
Marco Pedersoli
Edouard Oyallon
The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at … (see more)an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.