Publications

Recent Advances in Reinforcement Learning

Joelle Pineau

2008-01-01

Lecture Notes in Computer Science (published)

doi.org

Advances in Information Retrieval

Diane Kelly

Fernando Diaz

Nicholas J. Belkin

James Allan

2004-04-05

Lecture Notes in Computer Science (published)

doi.org

Advances in Information Retrieval

Diane Kelly

Fernando Diaz

Nicholas J. Belkin

James Allan

2004-01-01

ECIR (published)

doi.org

L AUGHING H YENA D ISTILLERY Extracting Compact Recurrences From Convolutions

∗. StefanoMassaroli

∗. MichaelPoli

∗. DanielY.Fu

Hermann Kumbong

Rom N. Parnichkun

Aman Timalsina

David W. Romero

Quinn McIntyre

Beidi Chen

Atri Rudra

Ce Zhang

Christopher Re

Stefano Ermon

Yoshua Bengio

Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers… (see more). In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads – naively requiring a full pass (or caching of activations) over the input sequence for each generated token – similarly to attention-based models. In this paper, we seek to enable O (1) compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena : by weight-tying the filters across channels into heads , we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10 × higher throughput than Transformers and 1 . 5 × higher than Hyena at 1 . 3 B parameters, without any loss in quality after distillation.

2000-01-01

(published)

www.semanticscholar.org

Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback

Ardavan S Nobandegani

Thomas R. Shultz

Irina Rish

In this work, we substantiate the idea of cognitive models as simulators , which is to have AI systems interact with, and collect feedback f… (see more)rom, cognitive models instead of humans, thereby making the training process safer, cheaper, and faster. We leverage this idea in the context of learning a fair behavior toward a counterpart exhibiting various emotional states — as implicit human feedback. As a case study, we adopt the Ultima-tum game (UG), a canonical task in behavioral and brain sciences for studying fairness. We show that our reinforcement learning (RL) agents learn to exhibit differential, rationally-justified behaviors under various emotional states of their UG counterpart. We discuss the implications of our work for AI and cognitive science research, and its potential for interactive learning with implicit human feedback.

2000-01-01

(published)

www.semanticscholar.org

Deep PDE Solvers for Subgrid Modelling and Out-of-Distribution Generalization

Patrick Chatain

Adam M. Oberman

Climate and weather modelling (CWM) is an important area where ML models are used for subgrid modelling: making predictions of processes occ… (see more)urring at scales too small to be resolved by standard solution methods(Brasseur & Jacob, 2017). These models are expected to make accurate predictions, even on out-of-distribution (OOD) data, and are additionally expected to respect important physical constraints of the ground truth model (Kashinath et al., 2021). While many specialized ML PDE solvers have been developed, the particular requirements of CWM models have not been addressed so far. The goal of this work is to address them. We propose and develop a novel architecture, which matches or exceeds the performance of standard ML models, and which demonstrably succeeds in OOD generalization. The architecture is based on expert knowledge of the structure of PDE solution operators, which permits the model to also obey important physical constraints

2000-01-01

(published)

www.semanticscholar.org

Learning Optimizers for Local SGD

Charles-Étienne Joseph

Benjamin Thérien

Abhinav Moudgil

Boris Knyazev

Eugene Belilovsky

Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches co… (see more)mpute multiple gradient steps locally, that is on each worker, before averaging model parameters, helping relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art optimizers for deep learning. In this work, we incorporate local optimizers that compute multiple updates into a learned optimization framework, allowing to meta-learn potentially more efficient local SGD algorithms. Our results demonstrate that local learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. We show that the learned optimizers can generalize to new datasets and architectures, demonstrating the potential of learned optimizers for improving communication-efficient distributed learning.

2000-01-01

(published)

www.semanticscholar.org

Physics-Informed Transformer Networks

F. Dos

Santos

Tara Akhound-Sadegh

Siamak Ravanbakhsh

Physics-informed neural networks (PINNs) have been recognized as a viable alternative to conventional numerical solvers for Partial Differen… (see more)tial Equations (PDEs). The main appeal of PINNs is that since they directly enforce the PDE equation, one does not require access to costly ground truth solutions for training the model. However, a key challenge is their limited generalization across varied initial conditions. Addressing this, our study presents a novel Physics-Informed Transformer (PIT) model for learning the solution operator for PDEs. Using the attention mechanism, PIT learns to leverage the relationships between its initial condition and query points, resulting in a significant improvement in generalization. Moreover, in contrast to existing physics-informed networks, our model is invariant to the discretization of the input domain, providing great flexibility in problem specification and training. We validated our proposed method on the 1D Burgers’ and the 2D Heat equations, demonstrating notable improvement over standard PINN models for operator learning with negligible computational overhead.

2000-01-01

(published)

www.semanticscholar.org

On the Varied Faces of Overparameterization in Supervised and Self-Supervised Learning

Matteo Gamba

Arna Ghosh

Kumar Krishna

Agrawal

Blake A. Richards

Hossein Azizpour

Mårten Björkman

The quality of the representations learned by neural networks depends on several factors, including the loss function, learning algorithm, a… (see more)nd model architecture. In this work, we use information geometric measures to assess the representation quality in a principled manner. We demonstrate that the sensitivity of learned representations to input perturbations, measured by the spectral norm of the feature Jacobian, provides valuable information about downstream generalization. On the other hand, measuring the coefﬁcient of spectral decay observed in the eigen-spectrum of feature covariance provides insights into the global representation geometry. First, we empirically establish an equivalence between these notions of representation quality and show that they are inversely correlated. Second, our analysis reveals the varying roles that overparameterization plays in improving generalization. Unlike supervised learning, we observe that increasing model width leads to higher discriminability and less smoothness in the self-supervised regime. Furthermore, we report that there is no observable double descent phenomenon in SSL with non-contrastive objectives for commonly used parameterization regimes, which opens up new opportunities for tight asymptotic analysis. Taken together, our results provide a loss-aware characterization of the different role of overparam-eterization in supervised and self-supervised learning.

2000-01-01

(published)

www.semanticscholar.org

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Publications

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications