Portrait de Eugene Belilovsky

Eugene Belilovsky

Membre académique associé
Professeur adjoint, Concordia University, Département d'informatique et de génie logiciel
Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage continu
Apprentissage fédéré
Apprentissage profond
Grands modèles de langage (LLM)
Optimisation

Biographie

Eugene Belilovsky est professeur adjoint au Département d'informatique et de génie logiciel de l'Université Concordia. Il est également membre associé de Mila – Institut québécois d’intelligence artificielle et professeur adjoint à l'Université de Montréal. Ses travaux se concentrent sur la vision par ordinateur et l'apprentissage profond. Ses intérêts de recherche actuels comprennent l'apprentissage continu, l'apprentissage à partir de peu de données (few-shot learning) et leurs applications au carrefour de la vision par ordinateur et du traitement du langage.

Étudiants actuels

Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Maîtrise recherche - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Postdoctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - Concordia
Co-superviseur⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - Concordia
Co-superviseur⋅e :

Publications

New Insights on Reducing Abrupt Representation Change in Online Continual Learning
In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. E… (voir plus)xperience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream, and new classes must be distinguished from previous ones. We shed new light on this question by showing that applying ER causes the newly added classes' representations to overlap significantly with the previous classes, leading to highly disruptive parameter updates. Based on this empirical analysis, we propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes. We show that using an asymmetric update rule pushes new classes to adapt to the older ones (rather than the reverse), which is more effective especially at task boundaries, where much of the forgetting typically occurs. Empirical results show significant gains over strong baselines on standard continual learning benchmarks
Generative Compositional Augmentations for Scene Graph Prediction
Cătălina Cangea
Graham W. Taylor
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of v… (voir plus)ision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. . However, test images might contain zero- and few-shot compositions of objects and relationships, e.g. . Despite each of the object categories and the predicate (e.g. 'on') being frequent in the training data, the models often fail to properly understand such unseen or rare compositions. To improve generalization, it is natural to attempt increasing the diversity of the training distribution. However, in the graph domain this is non-trivial. To that end, we propose a method to synthesize rare yet plausible scene graphs by perturbing real ones. We then propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs and learn from them in a joint fashion. When evaluated on the Visual Genome dataset, our approach yields marginal, but consistent improvements in zero- and few-shot metrics. We analyze the limitations of our approach indicating promising directions for future research.
Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
Cătălina Cangea
Graham W. Taylor
Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between… (voir plus) them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, we identify two key issues that limit such generalization. Firstly, we show that the standard loss used in this task is unintentionally a function of scene graph density. This leads to the neglect of individual edges in large sparse graphs during training, even though these contain diverse few-shot examples that are important for generalization. Secondly, the frequency of relationships can create a strong bias in this task, such that a blind model predicting the most frequent relationship achieves good performance. Consequently, some state-of-the-art models exploit this bias to improve results. We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA. To address these issues, we introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics. Compared to other works in this direction, our enhancements require only a few lines of code and no added computational cost. We also highlight the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduce a novel weighted metric.
Online Continual Learning with Maximally Interfered Retrieval
Lucas Caccia
Massimo Caccia
Tinne Tuytelaars
Continual learning, the setting where a learning agent is faced with a never ending stream of data, continues to be a great challenge for mo… (voir plus)dern machine learning systems. In particular the online or "single-pass through the data" setting has gained attention recently as a natural setting that is difficult to tackle. Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks. These approaches typically rely on randomly selecting samples from the replay memory or from a generative model, which is suboptimal. In this work, we consider a controlled sampling of memories for replay. We retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. We show a formulation for this sampling criterion in both the generative replay and the experience replay setting, producing consistent gains in performance and greatly reduced forgetting. We release an implementation of our method at this https URL.
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering
Cătălina Cangea
Pietro Lio
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely o… (voir plus)n its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task. We investigate several models, adapted from popular VQA methods, on this new benchmark. This establishes an initial understanding of how well VQA-style methods can perform within this novel EQA paradigm.
Blindfold Baselines for Embodied QA
We explore blindfold (question-only) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question b… (voir plus)y intelligently navigating in a simulated environment, gathering necessary visual information only through first-person vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple question-only baseline achieves state-of-the-art results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.