Prakash Panangaden

Christopher G. Lucas

David Abel

Stefano V Albrecht

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

Conditions on Preference Relations that Guarantee the Existence of Optimal Policies

Jonathan Colaço Carr

Doina Precup

Learning from Preferential Feedback (LfPF) plays an essential role in training Large Language Models, as well as certain types of interactiv… (see more)e learning agents. However, a substantial gap exists between the theory and application of LfPF algorithms. Current results guaranteeing the existence of optimal policies in LfPF problems assume that both the preferences and transition dynamics are determined by a Markov Decision Process. We introduce the Direct Preference Process, a new framework for analyzing LfPF problems in partially-observable, non-Markovian environments. Within this framework, we establish conditions that guarantee the existence of optimal policies by considering the ordinal structure of the preferences. We show that a decision-making problem can have optimal policies -- that are characterized by recursive optimality equations -- even when no reward function can express the learning goal. These findings underline the need to explore preference-based learning strategies which do not assume that preferences are generated by reward.

2024-04-17

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

proceedings.mlr.press

Polynomial Lawvere Logic

Giorgio Bacci

Radu Mardare

Gordon D. Plotkin

2024-02-04

ArXiv (preprint)

Optimal Approximate Minimization of One-Letter Weighted Finite Automata

Clara Lacroce

Borja Balle

Guillaume Rabusseau

In this paper, we study the approximate minimization problem of weighted finite automata (WFAs): to compute the best possible approximation … (see more)of a WFA given a bound on the number of states. By reformulating the problem in terms of Hankel matrices, we leverage classical results on the approximation of Hankel operators, namely the celebrated Adamyan-Arov-Krein (AAK) theory. We solve the optimal spectral-norm approximate minimization problem for irredundant WFAs with real weights, defined over a one-letter alphabet. We present a theoretical analysis based on AAK theory, and bounds on the quality of the approximation in the spectral norm and

2023-12-31

Mathematical Structures in Computer Science (published)

openreview.net

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Sahand Rezaei-Shoshtari

Rosie Zhao

David Meger

Doina Precup

Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In th… (see more)is paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the presence of such symmetries. We demonstrate the effectiveness of our method on our environments, as well as on challenging visual control tasks from the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance, and the visualizations of the latent space clearly demonstrate the structure of the learned abstraction.

2023-12-31

J. Mach. Learn. Res. (published)

Sum and Tensor of Quantitative Effects

Giorgio Bacci

Radu Mardare

Gordon Plotkin

2023-12-31

Log. Methods Comput. Sci. (published)

Behavioural pseudometrics for continuous-time diffusions

Linan Chen

Florence Clerc

2023-12-26

ArXiv (preprint)

Propositional Logics for the Lawvere Quantale

Giorgio Bacci

Radu Mardare

Gordon Plotkin

2023-11-22

Electronic Notes in Theoretical Informatics and Computer Science (published)

Behavioural equivalences for continuous-time Markov processes

Linan Chen

Florence Clerc

2023-03-29

Mathematical Structures in Computer Science (published)

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Pablo Samuel Castro

Tyler Kastner

Mark Rowland

We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We define a ne… (see more)w metric under this lens that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective enables us to provide new theoretical results, including value-function bounds and low-distortion finite-dimensional Euclidean embeddings, which are crucial when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

2022-12-31

Trans. Mach. Learn. Res. (published)

openreview.net

Towards an AAK Theory Approach to Approximate Minimization in the Multi-Letter Case

Clara Lacroce

Guillaume Rabusseau

We study the approximate minimization problem of weighted finite automata (WFAs): given a WFA, we want to compute its optimal approximation … (see more)when restricted to a given size. We reformulate the problem as a rank-minimization task in the spectral norm, and propose a framework to apply Adamyan-Arov-Krein (AAK) theory to the approximation problem. This approach has already been successfully applied to the case of WFAs and language modelling black boxes over one-letter alphabets \citep{AAK-WFA,AAK-RNN}. Extending the result to multi-letter alphabets requires solving the following two steps. First, we need to reformulate the approximation problem in terms of noncommutative Hankel operators and noncommutative functions, in order to apply results from multivariable operator theory. Secondly, to obtain the optimal approximation we need a version of noncommutative AAK theory that is constructive. In this paper, we successfully tackle the first step, while the second challenge remains open.

2022-05-31

ArXiv (preprint)

Augmenting Human Selves Through Artificial Agents – Lessons From the Brain

Georg Northoff

Maia Fraser

John Griffiths

Dimitris A. Pinotsis

Rosalyn Moran

Karl Friston

Much of current artificial intelligence (AI) and the drive toward artificial general intelligence (AGI) focuses on developing machines for f… (see more)unctional tasks that humans accomplish. These may be narrowly specified tasks as in AI, or more general tasks as in AGI – but typically these tasks do not target higher-level human cognitive abilities, such as consciousness or morality; these are left to the realm of so-called “strong AI” or “artificial consciousness.” In this paper, we focus on how a machine can augment humans rather than do what they do, and we extend this beyond AGI-style tasks to augmenting peculiarly personal human capacities, such as wellbeing and morality. We base this proposal on associating such capacities with the “self,” which we define as the “environment-agent nexus”; namely, a fine-tuned interaction of brain with environment in all its relevant variables. We consider richly adaptive architectures that have the potential to implement this interaction by taking lessons from the brain. In particular, we suggest conjoining the free energy principle (FEP) with the dynamic temporo-spatial (TSD) view of neuro-mental processes. Our proposed integration of FEP and TSD – in the implementation of artificial agents – offers a novel, expressive, and explainable way for artificial agents to adapt to different environmental contexts. The targeted applications are broad: from adaptive intelligence augmenting agents (IA’s) that assist psychiatric self-regulation to environmental disaster prediction and personal assistants. This reflects the central role of the mind and moral decision-making in most of what we do as humans.

2021-12-31

Frontiers in Computational Neuroscience (published)