Publications

A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures
Randall Balestriero
Megi Dervishi
David Fan
Quentin Garrido
Tushar Nagarajan
Wancong Zhang
Michael G. Rabbat
Amir Bar
We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEP… (voir plus)As). JEPAs learn to predict in representation space rather than pixel space, avoiding the pitfalls of generative modeling while capturing semantically meaningful features suitable for downstream tasks. Our library provides modular, self-contained implementations that illustrate how representation learning techniques developed for image-level self-supervised learning can transfer to video, where temporal dynamics add complexity, and ultimately to action-conditioned world models, where the model must additionally learn to predict the effects of control inputs. Each example is designed for single-GPU training within a few hours, making energy-based self-supervised learning accessible for research and education. We provide ablations of JEA components on CIFAR-10. Probing these representations yields 91% accuracy, indicating that the model learns useful features. Extending to video, we include a multi-step prediction example on Moving MNIST that demonstrates how the same principles scale to temporal modeling. Finally, we show how these representations can drive action-conditioned world models, achieving a 97% planning success rate on the Two Rooms navigation task. Comprehensive ablations reveal the critical importance of each regularization component for preventing representation collapse. Code is available at https://github.com/facebookresearch/eb_jepa.
Lloyd's $K$-Means Clustering Algorithm is Frank-Wolfe in Disguise
Michael Pokojovy
J. Marcus Jobe
Lloyd's …
Local Inconsistency Resolution: The Interplay between Attention and Control in Probabilistic Models
We present a generic algorithm for learning and approximate inference with an intuitive epistemic interpretation: iteratively focus on a sub… (voir plus)set of the model and resolve inconsistencies using the parameters under control. This framework, which we call Local Inconsistency Resolution (LIR) is built upon Probabilistic Dependency Graphs (PDGs), which provide a flexible representational foundation capable of capturing inconsistent beliefs. We show how LIR unifies and generalizes a wide variety of important algorithms in the literature, including the Expectation-Maximization (EM) algorithm, belief propagation, adversarial training, GANs, and GFlowNets. Each of these methods can be recovered as a specific instance of LIR by choosing a procedure to direct focus (attention and control). We implement this algorithm for discrete PDGs and study its properties on synthetically generated PDGs, comparing its behavior to the global optimization semantics of the full PDG.
Observational Study of Maternal and Fetal Outcome in Posterior Reversible Encephalopathy Syndrome in Eclamptic Women in a Tertiary Care Institute
Prerna Kailashchand Gupta
Meenal Shailesh Sarmalkar
Madhuri A Mehendale
Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity
Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastr… (voir plus)ophic events. Unlike risk-neutral objectives, the static CVaR of the return depends on entire trajectories without admitting a recursive Bellman decomposition in the underlying Markov decision process. A classical resolution relies on state augmentation with a continuous variable. However, unless restricted to a specialized class of admissible value functions, this formulation induces sparse rewards and degenerate fixed points. In this work, we propose a novel formulation of the static CVaR objective based on augmentation. Our alternative approach leads to a Bellman operator with: (1) dense per-step rewards; (2) contracting properties on the full space of bounded value functions. Building on this theoretical foundation, we develop risk-averse value iteration and model-free Q-learning algorithms that rely on discretized augmented states. We further provide convergence guarantees and approximation error bounds due to discretization. Empirical results demonstrate that our algorithms successfully learn CVaR-sensitive policies and achieve effective performance-safety trade-offs.
Robust Intervention Learning from Emergency Stop Interventions
Ethan Pronovost
Siddhartha Srinivasa
The Role of Causal Features in Strategic Classification for Robustness and Alignment
In strategic classification, an institution (e.g., a bank) anticipates adaptation from users who change their features to increase utility i… (voir plus)n a classification task (e.g., loan repayment). Since a key challenge is the distribution shift induced by users, we turn to causal models, which have been shown to bound the worst-case out-of-distribution (OOD) risk, and establish several new results that link causality and strategic classification. First, we show that causal classification leads to optimal classification error after any sufficiently large adaptation, when the noise is bounded in a certain way. Second, when these assumptions do not hold, we show OOD cross-entropy risk of optimal classifiers decomposes into an OOD bias term and a term arising from not using all observable features, allowing us to determine when causal classifiers have an advantage. Finally, we show that causal classifiers can align long-term incentives between institutions and users, contrasting with previous work that highlights social costs of such approaches. We validate our theory empirically on synthetic data, finding that our results predict behavior in practice.
On the Role of Depth in the Expressivity of RNNs
The benefits of depth in feedforward neural networks (FNNs) are well known: composing multiple layers of linear transformations with nonline… (voir plus)ar activations enables complex computations. While similar effects are expected in recurrent neural networks (RNNs), it remains unclear how depth interacts with recurrence to shape expressive power. Here, we formally show that depth increases RNNs’ memory capacity efficiently with respect to parameters, enhancing expressivity both by enabling more complex input transformations and improving the retention of past information. We extend our analysis to 2RNNs, a generalization of RNNs with multiplicative interactions between inputs and hidden states. Unlike RNNs, which remain linear without nonlinear activations, 2RNNs perform polynomial transformations whose maximal degree grows with depth. We further show that multiplicative interactions cannot, in general, be replaced by layerwise nonlinearities. Finally, we validate these insights empirically on synthetic and real-world tasks.
Tractable Shapley Values and Interactions via Tensor Networks
Farzaneh Heidari
Chao Li
We show how to replace the …
How Notations Evolve: A Historical Analysis with Implications for Supporting User-Defined Abstractions
J.D. Zamfirescu-Pereira
Elena L. Glassman
Damien Masson
Opposite impact of thermal expansion and phonon anharmonicity on the phonon-limited resistivity of elemental metals from first principles
Ao Wang
Junwen Yin
Félix Antoine Goudreault
Olle Hellman
Samuel Poncé
Understanding electrical resistivity in metals remains a central challenge in quantifying charge transport at finite temperature. Current fi… (voir plus)rst-principles calculations based on the Boltzmann transport equation often match experiments, yet they almost always neglect the effect of thermal expansion and phonon anharmonicity. We show that both effects exert an opposite impact on electron-phonon coupling and on electrical resistivity. Thermal expansion enhances the coupling and leads to overestimation of resistivity, whereas anharmonic effects reduce it. By explicitly incorporating both effects, we establish a more complete description of resistivity in elemental metals, demonstrated here for Pb, Nb, and Al.
Sub-optimality bounds for certainty equivalent policies in partially observed systems
Ashutosh Nayyar
Yi Ouyang
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical… (voir plus) certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.