Boris Knyazev

Can We Learn Communication-Efficient Optimizers?

Charles-Étienne Joseph

Benjamin Thérien

Abhinav Moudgil

Boris Knyazev

Eugene Belilovsky

2023-12-02

ArXiv (preprint)

doi.org

arxiv.org

Learning Optimizers for Local SGD

Charles-Étienne Joseph

Benjamin Thérien

Abhinav Moudgil

Boris Knyazev

Eugene Belilovsky

2023-10-27

NeurIPS.cc/2023/Workshop/Federated_Learning (poster)

openreview.net

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Boris Knyazev

DOHA HWANG

Simon Lacoste-Julien

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

doi.org

openreview.net

Learning to Optimize with Recurrent Hierarchical Transformers

Abhinav Moudgil

Boris Knyazev

Guillaume Lajoie

Eugene Belilovsky

2023-06-19

ICML.cc/2023/Workshop/Frontiers4LCD (published)

openreview.net

Pretrained Language Models to Solve Graph Tasks in Natural Language

Frederik Wenkel

Guy Wolf

Boris Knyazev

Pretrained large language models (LLMs) are powerful learners in a variety of language tasks. We explore if LLMs can learn from graph-struct… (see more)ured data when the graphs are described using natural language. We explore data augmentation and pretraining specific to the graph domain and show that LLMs such as GPT-2 and GPT-3 are promising alternatives to graph neural networks.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

openreview.net

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Boris Knyazev

DOHA HWANG

Simon Lacoste-Julien

Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communi… (see more)ties with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance.

2023-04-24

ICML.cc/2023/Conference (poster)

doi.org

openreview.net

Learning Optimizers for Local SGD

Charles-Étienne Joseph

Benjamin Thérien

Abhinav Moudgil

Boris Knyazev

Eugene Belilovsky

Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches co… (see more)mpute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art optimizers for deep learning. In this work, we incorporate local optimizers that compute multiple updates into a learned optimization framework, allowing to meta-learn potentially more efficient local SGD algorithms. Our results demonstrate that local learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. We show that the learned optimizers can generalize to new datasets and architectures, demonstrating the potential of learned optimizers for improving communication-efficient distributed learning.

2000-01-01

(published)

www.semanticscholar.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Boris Knyazev

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Boris Knyazev

Publications