Opening Conference | Building Safer AI for Youth Mental Health
On March 16, starting at 9 AM, join leading AI researchers, clinical experts, and voices from the ground for an event exploring the frameworks needed to design AI that is not only powerful, but also safe for mental health.
TRAIL: Responsible AI for Professionals and Leaders
Learn how to integrate responsible AI practices into your organization with TRAIL. Join our information session on March 12, where you’ll discover the program in detail and have the chance to ask all your questions.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. Howeve… (see more)r, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.