Portrait de Eugene Belilovsky n'est pas disponible

Eugene Belilovsky

Membre académique associé
Professeur adjoint, Concordia University, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Eugene Belilovsky est professeur adjoint au Département d'informatique et de génie logiciel de l'Université Concordia. Il est également membre associé de Mila – Institut québécois d’intelligence artificielle et professeur adjoint à l'Université de Montréal. Ses travaux se concentrent sur la vision par ordinateur et l'apprentissage profond. Ses intérêts de recherche actuels comprennent l'apprentissage continu, l'apprentissage à partir de peu de données (few-shot learning) et leurs applications au carrefour de la vision par ordinateur et du traitement du langage.

Étudiants actuels

Doctorat - Concordia University
Doctorat - Concordia University
Co-superviseur⋅e :
Maîtrise recherche - Concordia University
Doctorat - Université de Montréal
Superviseur⋅e principal⋅e :
Maîtrise recherche - Université de Montréal
Co-superviseur⋅e :
Maîtrise recherche - Concordia University
Collaborateur·rice de recherche - Concordia University
Co-superviseur⋅e :
Postdoctorat - Concordia University
Co-superviseur⋅e :
Doctorat - Concordia University
Co-superviseur⋅e :
Maîtrise recherche - Concordia University
Stagiaire de recherche - Concordia University
Maîtrise recherche - Concordia University
Co-superviseur⋅e :
Collaborateur·rice alumni
Co-superviseur⋅e :
Maîtrise recherche - Concordia University
Maîtrise recherche - Concordia University
Collaborateur·rice de recherche - Université de Montréal
Superviseur⋅e principal⋅e :
Doctorat - Concordia University
Co-superviseur⋅e :
Maîtrise recherche - Concordia University

Publications

Guiding The Last Layer in Federated Learning with Pre-Trained Models
Gwen Legate
Nicolas Bernier
Lucas Caccia
Edouard Oyallon
$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Edouard Oyallon
Automated liver segmentation and steatosis grading using deep learning on B-mode ultrasound images
Pedro Vianna
Merve Kulbay
Pamela Boustros
Sara-Ivana Calce
Cassandra Larocque-Rigney
Laurent Patry-Beaudoin
Yi Hui Luo
Muawiz Chaudary
Samuel Kadoury
Bich Nguyen
Emmanuel Montagnon
Michaël Chassé
An Tang
Guy Cloutier
Early detection of nonalcoholic fatty liver disease (NAFLD) is crucial to avoid further complications. Ultrasound is often used for screenin… (voir plus)g and monitoring of hepatic steatosis, however it is limited by the subjective interpretation of images. Computer assisted diagnosis could aid radiologists to achieve objective grading, and artificial intelligence approaches have been tested across various medical applications. In this study, we evaluated the performance of a two-stage hepatic steatosis detection deep learning framework, with a first step of liver segmentation and a subsequent step of hepatic steatosis classification. We evaluated the models on internal and external datasets, aiming to understand the generalizability of the framework. In the external dataset, our segmentation model achieved a Dice score of 0.92 (95% CI: 0.78, 1.00), and our classification model achieved an area under the receiver operating characteristic curve of 0.84 (95% CI: 0.79, 0.89). Our findings highlight the potential benefits of applying artificial intelligence models in NAFLD assessment.
Can Forward Gradient Match Backpropagation?
Louis Fournier
Stephane Rivaud
Michael Eickenberg
Edouard Oyallon
Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable fo… (voir plus)r neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats Leon Richter
Quentin Gregory Anthony
Timothee LESORT
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (voir plus)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch
Learning to Optimize with Recurrent Hierarchical Transformers
Abhinav Moudgil
Boris Knyazev
Simulated Annealing in Early Layers Leads to Better Generalization
Amir M. Sarfi
Zahra Karimpour
Muawiz Chaudhary
Nasir M. Khalid
Sudhir Mudur
Recently, a number of iterative learning methods have been introduced to improve generalization. These typically rely on training for longer… (voir plus) periods of time in exchange for improved generalization. LLF (later-layer-forgetting) is a state-of-the-art method in this category. It strengthens learning in early layers by periodically re-initializing the last few layers of the network. Our principal innovation in this work is to use Simulated annealing in EArly Layers (SEAL) of the network in place of re-initialization of later layers. Essentially, later layers go through the normal gradient descent process, while the early layers go through short stints of gradient ascent followed by gradient descent. Extensive experiments on the popular Tiny-ImageNet dataset benchmark and a series of transfer learning and few-shot learning tasks show that we outperform LLF by a significant margin. We further show that, compared to normal training, LLF features, although improving on the target task, degrade the transfer learning performance across all datasets we explored. In comparison, our method outperforms LLF across the same target datasets by a large margin. We also show that the prediction depth of our method is significantly lower than that of LLF and normal training, indicating on average better prediction performance. 11The code to reproduce our results is publicly available at: https://github.com/amiiir-sarfi/SEAL
Preventing Dimensional Collapse in Contrastive Local Learning with Subsampling
Louis Fournier
Adeetya Patel
Michael Eickenberg
Edouard Oyallon
A2CiD2: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Edouard Oyallon
Reliability of CKA as a Similarity Measure in Deep Learning
MohammadReza Davari
Stefan Horoi
Amine Natik
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different w… (voir plus)ays. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of claims about similarity and dissimilarity of these various representations have been made using CKA results. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation to CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counterintuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning
Nader Asadi
MohammadReza Davari
Sudhir Mudur
Rahaf Aljundi
In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent… (voir plus) best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive or proprietary. To overcome the necessity of using previous tasks' data, in this work, we start with strong representation learning methods that have been shown to be less prone to forgetting. We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. Specifically, samples are mapped to an embedding space where the representations are learned using a supervised contrastive loss. Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point. To continually adapt the prototypes without keeping any prior task data, we propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data. This method yields state-of-the-art performance in the task-incremental setting, outperforming methods relying on large amounts of data, and provides strong performance in the class-incremental setting without using any stored data points.
Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning
Gwen Legate
Lucas Caccia
In Federated Learning a global model is learned by aggregating model updates computed at a set of independent client nodes. To reduce commun… (voir plus)ication costs, multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives. This can lead clients to overly minimize their own local objective consequently diverging from the global solution. We demonstrate that individual client models experience a catastrophic forgetting with respect to data from other clients and propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss. This approach shields classes outside a client’s label set from abrupt representation change and we empirically demonstrate it can alleviate client forgetting and provide consistent improvements to standard federated learning algorithms. Our method is particularly beneficial under the most challenging federated learning settings where data heterogeneity is high and client participation in each round is low.