Weikai Wang

Doctorat - HEC

Superviseur⋅e principal⋅e

Erick Delage

Sujets de recherche

Apprentissage par renforcement

Optimisation

Publications

Planning and Learning in Average Risk-aware MDPs

Weikai Wang

Erick Delage

For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. Howeve… (voir plus)r, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.

2025-09-17

NeurIPS.cc/2025/Conference (poster)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Weikai Wang

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Weikai Wang

Publications