Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Learned optimizers (LOs) have the potential to significantly reduce the wall-clock training time of neural networks. However, they can strug… (see more)gle to optimize unseen tasks (*meta-generalize*), especially when training networks wider than those seen during meta-training. To address this, we derive the Maximal Update Parametrization (
2025-12-31
International Conference on Learning Representations (Accept (Poster))
The ever-growing availability of unlabeled data presents both opportunities and challenges for training artificial intelligence systems. Whi… (see more)le self-supervised learning (SSL) has emerged as a powerful paradigm for extracting meaningful representations from vast amounts of unlabeled data, existing methods still struggle to adapt to the non-stationary, non-IID nature of real-world data streams without forgetting previously learned knowledge. Recent works have adopted a repeated cosine annealing schedule for large-scale continual pre-training; however, these schedules (1) inherently cause forgetting during the re-warming phase and (2) have not been systematically compared to existing continual SSL methods. In this work, we systematically compare the widely used cosine schedule with the recently proposed infinite learning rate schedule and empirically find the latter to be a more effective alternative. Our extensive empirical evaluation across diverse image and language datasets demonstrates that the infinite learning rate schedule consistently enhances continual pre-training performance compared to a repeated cosine decay without being restricted to a fixed iteration budget. For instance, in a small-scale MAE pre-training setup, it outperforms several strong baselines from the literature. We then scale up our experiments to larger MAE pre-training and autoregressive language model pre-training. Our results show that the infinite learning rate schedule remains effective at scale, surpassing repeated cosine decay for both MAE pre-training and zero-shot LM benchmarks.
Learned optimizers have been an active research topic over the past decade, with increasing progress toward practical, general-purpose optim… (see more)izers that can serve as drop-in replacements for widely used methods like Adam. However, recent advances -- such as VeLO, which was meta-trained for 4000 TPU-months -- remain largely inaccessible to the broader community, in part due to their reliance on JAX and the absence of user-friendly packages for applying the optimizers after meta-training. To address this gap, we introduce PyLO, a PyTorch-based library that brings learned optimizers to the broader machine learning community through familiar, widely adopted workflows. Unlike prior work focused on synthetic or convex tasks, our emphasis is on applying learned optimization to real-world large-scale pre-training tasks. Our release includes a CUDA-accelerated version of the small_fc_lopt learned optimizer architecture from (Metz et al., 2022a), delivering substantial speedups -- from 39.36 to 205.59 samples/sec throughput for training ViT B/16 with batch size 32. PyLO also allows us to easily combine learned optimizers with existing optimization tools such as learning rate schedules and weight decay. When doing so, we find that learned optimizers can substantially benefit. Our code is available at https://github.com/Belilovsky-Lab/pylo
Background Screening for nonalcoholic fatty liver disease (NAFLD) is suboptimal due to the subjective interpretation of US images. Purpose T… (see more)o evaluate the agreement and diagnostic performance of radiologists and a deep learning model in grading hepatic steatosis in NAFLD at US, with biopsy as the reference standard. Materials and Methods This retrospective study included patients with NAFLD and control patients without hepatic steatosis who underwent abdominal US and contemporaneous liver biopsy from September 2010 to October 2019. Six readers visually graded steatosis on US images twice, 2 weeks apart. Reader agreement was assessed with use of κ statistics. Three deep learning techniques applied to B-mode US images were used to classify dichotomized steatosis grades. Classification performance of human radiologists and the deep learning model for dichotomized steatosis grades (S0, S1, S2, and S3) was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. Results The study included 199 patients (mean age, 53 years ± 13 [SD]; 101 men). On the test set (n = 52), radiologists had fair interreader agreement (0.34 [95% CI: 0.31, 0.37]) for classifying steatosis grades S0 versus S1 or higher, while AUCs were between 0.49 and 0.84 for radiologists and 0.85 (95% CI: 0.83, 0.87) for the deep learning model. For S0 or S1 versus S2 or S3, radiologists had fair interreader agreement (0.30 [95% CI: 0.27, 0.33]), while AUCs were between 0.57 and 0.76 for radiologists and 0.73 (95% CI: 0.71, 0.75) for the deep learning model. For S2 or lower versus S3, radiologists had fair interreader agreement (0.37 [95% CI: 0.33, 0.40]), while AUCs were between 0.52 and 0.81 for radiologists and 0.67 (95% CI: 0.64, 0.69) for the deep learning model. Conclusion Deep learning approaches applied to B-mode US images provided comparable performance with human readers for detection and grading of hepatic steatosis. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Tuthill in this issue.