Portrait de Eugene Belilovsky n'est pas disponible

Eugene Belilovsky

Membre académique associé
Professeur adjoint, Concordia University, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage profond
Optimisation
Systèmes distribués

Biographie

Eugene Belilovsky est professeur adjoint au Département d'informatique et de génie logiciel de l'Université Concordia. Il est également membre associé de Mila – Institut québécois d’intelligence artificielle et professeur adjoint à l'Université de Montréal. Ses travaux se concentrent sur la vision par ordinateur et l'apprentissage profond. Ses intérêts de recherche actuels comprennent l'apprentissage continu, l'apprentissage à partir de peu de données (few-shot learning) et leurs applications au carrefour de la vision par ordinateur et du traitement du langage.

Étudiants actuels

Doctorat - Concordia
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Stagiaire de recherche - Concordia University
Doctorat - Concordia
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - Concordia
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia

Publications

Learning Versatile Optimizers on a Compute Diet
Abhinav Moudgil
Boris Knyazev
Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned upda… (voir plus)te rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.
PETRA: Parallel End-to-end Training with Reversible Architectures
Stephane Rivaud
Louis Fournier
Thomas Pumir
Michael Eickenberg
Edouard Oyallon
Reversible architectures have been shown to be capable of performing on par with their non-reversible architectures, being applied in deep l… (voir plus)earning for memory savings and generative modeling. In this work, we show how reversible architectures can solve challenges in parallelizing deep model training. We introduce PETRA, a novel alternative to backpropagation for parallelizing gradient computations. PETRA facilitates effective model parallelism by enabling stages (i.e., a set of layers) to compute independently on different devices, while only needing to communicate activations and gradients between each other. By decoupling the forward and backward passes and keeping a single updated version of the parameters, the need for weight stashing is also removed. We develop a custom autograd-like training framework for PETRA, and we demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving competitive accuracies comparable to backpropagation using ResNet-18, ResNet-34, and ResNet-50 models.
Unsupervised Test-Time Adaptation for Hepatic Steatosis Grading Using Ultrasound B-Mode Images.
Pedro Vianna
Paria Mehrbod
Muawiz Chaudhary
Michael Eickenberg
An Tang
Guy Cloutier
Ultrasound is considered a key modality for the clinical assessment of hepatic steatosis (i.e., fatty liver) due to its non-invasiveness and… (voir plus) availability. Deep learning methods have attracted considerable interest in this field, as they are capable of learning patterns in a collection of images and achieve clinically comparable levels of accuracy in steatosis grading. However, variations in patient populations, acquisition protocols, equipment, and operator expertise across clinical sites can introduce domain shifts that reduce model performance when applied outside the original training setting. In response, unsupervised domain adaptation techniques are being investigated to address these shifts, allowing models to generalize more effectively across diverse clinical environments. In this work, we propose a test-time batch normalization technique designed to handle domain shift, especially for changes in label distribution, by adapting selected features of batch normalization layers in a trained convolutional neural network model. This approach operates in an unsupervised manner, allowing robust adaptation to new distributions without access to label data. The method was evaluated on two abdominal ultrasound datasets collected at different institutions, assessing its capability in mitigating domain shift for hepatic steatosis classification. The proposed method reduced the mean absolute error in steatosis grading by 37% and improved the area under the receiver operating characteristic curve for steatosis detection from 0.78 to 0.97, compared to non-adapted models. These findings demonstrate the potential of the proposed method to address domain shift in ultrasound-based hepatic steatosis diagnosis, minimizing risks associated with deploying trained models in various clinical settings.
Non-Uniform Parameter-Wise Model Merging
Albert Manuel Orozco Camacho
Stefan Horoi
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (voir plus)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.
Non-Uniform Parameter-Wise Model Merging
Albert Manuel Orozco Camacho
Stefan Horoi
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (voir plus)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.
Non-Uniform Parameter-Wise Model Merging
Albert M. Orozco Camacho
Stefan Horoi
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (voir plus)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.
Sketch-guided Cage-based 3D Gaussian Splatting Deformation
Tianhao Xie
Tiberiu Popa
3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and c… (voir plus)omputer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.
ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training
Adel Nabli
Louis Fournier
Pierre ERBACHER
Louis Serrano
Edouard Oyallon
$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
Controlling Forgetting with Test-Time Data in Continual Learning
Vaibhav Singh
Rahaf Aljundi
Foundational vision-language models excel in various tasks but require updates as new tasks or domains emerge. Current Continual Learning (C… (voir plus)L) methods, which focus on supervised training, often suffer from significant forgetting, performing worse than the original models in zero-shot scenarios. This work proposes leveraging test-time, unsupervised data in a self-supervised manner to refresh the model’s memory of previously learned tasks, minimizing forgetting without additional labeling. By introducing a student-teacher framework with gradient-based sparse parameter updates, the approach enhances performance on prior tasks and reduces reliance on offline memory buffers, effectively improving continual learning outcomes.
Understanding Permutation Based Model Merging with Feature Visualizations
Congshu Zou
geraldin nanfack
Stefan Horoi
Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (voir plus)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.
Understanding Permutation Based Model Merging with Feature Visualizations
Congshu Zou
geraldin nanfack
Stefan Horoi
Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (voir plus)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.