Eugene Belilovsky

Paul Janson

PhD - Concordia University

Co-supervisor :

Charles-Etienne Joseph

Master's Research - Université de Montréal

Co-supervisor :

Zafir Khalid

Master's Research - Concordia University

Co-supervisor :

Irina Rish

Website

Gwen Legate

PhD - Concordia University

Co-supervisor :

Master's Research - Concordia University

Co-supervisor :

Abhinav Moudgil

PhD - Concordia University

Website

Google Scholar

Adel Nabli

PhD - Concordia University

Google Scholar

Geraldin Nanfack

Postdoctorate - Concordia University

Co-supervisor :

geraldin.nanfack@mila.quebec

Website

Google Scholar

Albert Orozco Camacho

PhD - Concordia University

Co-supervisor :

PhD - Concordia University

Co-supervisor :

Irina Rish

Benjamin Therien

PhD - Université de Montréal

Principal supervisor :

Collaborating researcher - Université de Montréal

Principal supervisor :

PhD - Concordia University

Co-supervisor :

Congshu Zou

Master's Research - Concordia University

Publications

Non-Uniform Parameter-Wise Model Merging

Albert M. Orozco Camacho

Stefan Horoi

Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Tradit… (see more)ional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.

2024-12-15

ArXiv (preprint)

Sketch-guided Cage-based 3D Gaussian Splatting Deformation

Tianhao Xie

Noam Aigerman

Tiberiu Popa

3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and c… (see more)omputer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.

2024-11-19

ArXiv (preprint)

ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

Adel Nabli

Louis Fournier

Pierre ERBACHER

Louis Serrano

Edouard Oyallon

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (published)

$\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Benjamin Thérien

Charles-Étienne Joseph

Boris Knyazev

Edouard Oyallon

Irina Rish

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (published)

Controlling Forgetting with Test-Time Data in Continual Learning

Vaibhav Singh

Rahaf Aljundi

Foundational vision-language models excel in various tasks but require updates as new tasks or domains emerge. Current Continual Learning (C… (see more)L) methods, which focus on supervised training, often suffer from significant forgetting, performing worse than the original models in zero-shot scenarios. This work proposes leveraging test-time, unsupervised data in a self-supervised manner to refresh the model’s memory of previously learned tasks, minimizing forgetting without additional labeling. By introducing a student-teacher framework with gradient-based sparse parameter updates, the approach enhances performance on prior tasks and reduces reliance on offline memory buffers, effectively improving continual learning outcomes.

2024-10-10

NeurIPS.cc/2024/Workshop/AFM (poster)

Understanding Permutation Based Model Merging with Feature Visualizations

Congshu Zou

geraldin nanfack

Stefan Horoi

Linear mode connectivity (LMC) has become a topic of great interest in recent years. It has been empirically demonstrated that popular deep … (see more)learning models trained from different initializations exhibit linear model connectivity up to permutation. Based on this, several approaches for finding a permutation of the model's features or weights have been proposed leading to several popular methods for model merging. These methods enable the simple averaging of two models to create a new high-performance model. However, besides accuracy, the properties of these models and their relationships to the representations of the models they derive from are poorly understood. In this work, we study the inner mechanisms behind LMC in model merging through the lens of classic feature visualization methods. Focusing on convolutional neural networks (CNNs) we make several observations that shed light on the underlying mechanisms of model merging by permute and average.

2024-10-10

NeurIPS.cc/2024/Workshop/UniReps (accepted)

Understanding Permutation Based Model Merging with Feature Visualizations

Congshu Zou

geraldin nanfack

Stefan Horoi

2024-10-10

NeurIPS.cc/2024/Workshop/UniReps (accepted)

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

Louis Fournier

Adel Nabli

Masih Aminbeidokhti

Marco Pedersoli

Edouard Oyallon

The performance of deep neural networks is enhanced by ensemble methods, which average the output of several models. However, this comes at … (see more)an increased cost at inference. Weight averaging methods aim at balancing the generalization of ensembling and the inference speed of a single model by averaging the parameters of an ensemble of models. Yet, naive averaging results in poor performance as models converge to different loss basins, and aligning the models to improve the performance of the average is challenging. Alternatively, inspired by distributed training, methods like DART and PAPA have been proposed to train several models in parallel such that they will end up in the same basin, resulting in good averaging accuracy. However, these methods either compromise ensembling accuracy or demand significant communication between models during training. In this paper, we introduce WASH, a novel distributed method for training model ensembles for weight averaging that achieves state-of-the-art image classification accuracy. WASH maintains models within the same basin by randomly shuffling a small percentage of weights during training, resulting in diverse models and lower communication costs compared to standard parameter averaging methods.

2024-10-10

NeurIPS.cc/2024/Workshop/OPT (published)

Not Only the Last-Layer Features for Spurious Correlations: All Layer Deep Feature Reweighting

Humza Wajid Hameed

G'eraldin Nanfack

Spurious correlations are a major source of errors for machine learning models, in particular when aiming for group-level fairness. It has b… (see more)een recently shown that a powerful approach to combat spurious correlations is to re-train the last layer on a balanced validation dataset, isolating robust features for the predictor. However, key attributes can sometimes be discarded by neural networks towards the last layer. In this work, we thus consider retraining a classifier on a set of features derived from all layers. We utilize a recently proposed feature selection strategy to select unbiased features from all the layers. We observe this approach gives significant improvements in worst-group accuracy on several standard benchmarks.

2024-09-23

ArXiv (preprint)

Accelerating Training with Neuron Interaction and Nowcasting Networks

Boris Knyazev

Abhinav Moudgil

Guillaume Lajoie

Simon Lacoste-Julien

Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However,… (see more) learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps and periodically, only every few steps, a WNN nowcasts (predicts near future) parameters. We improve WNNs by proposing neuron interaction and nowcasting (NiNo) networks. In contrast to WNNs, NiNo leverages neuron connectivity and graph neural networks to more accurately nowcast parameters. We further show that in some networks, such as Transformers, modeling neuron connectivity accurately is challenging. We address this and other limitations, which allows NiNo to accelerate Adam training by up to 50% in vision and language tasks.

2024-09-06

ArXiv (preprint)

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Stefan Horoi

Albert Manuel Orozco Camacho

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)