Portrait de Eugene Belilovsky n'est pas disponible

Eugene Belilovsky

Membre académique associé
Professeur adjoint, Concordia University, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Biographie

Eugene Belilovsky est professeur adjoint au Département d'informatique et de génie logiciel de l'Université Concordia. Il est également membre associé de Mila – Institut québécois d’intelligence artificielle et professeur adjoint à l'Université de Montréal. Ses travaux se concentrent sur la vision par ordinateur et l'apprentissage profond. Ses intérêts de recherche actuels comprennent l'apprentissage continu, l'apprentissage à partir de peu de données (few-shot learning) et leurs applications au carrefour de la vision par ordinateur et du traitement du langage.

Étudiants actuels

Collaborateur·rice alumni
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Maîtrise recherche - UdeM
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Doctorat - Concordia
Co-superviseur⋅e :
Postdoctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Collaborateur·rice de recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Collaborateur·rice de recherche - UdeM
Superviseur⋅e principal⋅e :
Maîtrise recherche - Concordia
Maîtrise recherche - Concordia

Publications

New Insights on Reducing Abrupt Representation Change in Online Continual Learning
Lucas Caccia
Rahaf Aljundi
Nader Asadi
Tinne Tuytelaars
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (voir plus)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes’ representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks.
Parametric Scattering Networks
Shanel Gauthier
Benjamin Th'erien
Laurent Alséne-Racicot
Michael Eickenberg
The wavelet scattering transform creates geometric in-variants and deformation stability. In multiple signal do-mains, it has been shown to … (voir plus)yield more discriminative rep-resentations compared to other non-learned representations and to outperform learned representations in certain tasks, particularly on limited labeled data and highly structured signals. The wavelet filters used in the scattering trans-form are typically selected to create a tight frame via a pa-rameterized mother wavelet. In this work, we investigate whether this standard wavelet filterbank construction is op-timal. Focusing on Morlet wavelets, we propose to learn the scales, orientations, and aspect ratios of the filters to produce problem-specific parameterizations of the scattering transform. We show that our learned versions of the scattering transform yield significant performance gains in small-sample classification settings over the standard scat-tering transform. Moreover, our empirical results suggest that traditional filterbank constructions may not always be necessary for scattering transforms to extract effective rep-resentations.
Online Continual Learning with Maximally Interfered Retrieval
Rahaf Aljundi
Lucas Caccia
Massimo Caccia
Min Lin
Tinne Tuytelaars
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for mo… (voir plus)dern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at this https URL.
Learning Optimizers for Local SGD
Charles-Étienne Joseph
Benjamin Thérien
Abhinav Moudgil
Boris Knyazev
Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches co… (voir plus)mpute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art optimizers for deep learning. In this work, we incorporate local optimizers that compute multiple updates into a learned optimization framework, allowing to meta-learn potentially more efficient local SGD algorithms. Our results demonstrate that local learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. We show that the learned optimizers can generalize to new datasets and architectures, demonstrating the potential of learned optimizers for improving communication-efficient distributed learning.