Erick Delage

Postdoctorat - HEC

Abhilash Chenreddy

Doctorat - HEC

Esther Derman

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Yudong Luo

Postdoctorat - HEC

Lidiia Shchichko

Doctorat - Concordia

Xiaohui Tu

Doctorat - HEC

Weikai Wang

Doctorat - HEC

Publications

What Matters when Modeling Human Behavior using Imitation Learning?

As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (voir plus)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.

2025-06-10

ICML.cc/2025/Workshop/MoFA (poster)

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu

Yossiri Adulyasak

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (voir plus)esource constraints couple the action spaces of

2025-04-23

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (publié)

Planning and Learning in Risk-Aware Restless Multi-Arm Bandits

Yossiri Adulyasak

2025-04-23

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (publié)

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau

Esther Derman

Mohammad Ghavamzadeh

Marek Petrik

2025-04-23

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics (publié)

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu

Yossiri Adulyasak

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Planning and Learning in Risk-Aware Restless Multi-Arm Bandits

Yossiri Adulyasak

In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with e… (voir plus)ach arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau

Esther Derman

Mohammad Ghavamzadeh

Marek Petrik

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences … (voir plus)for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

A Survey of Contextual Optimization Methods for Decision Making under Uncertainty

Utsav Sadana

Abhilash Reddy Chenreddy

Alexandre Forel

Emma Frejinger

Thibaut Vidal

2025-01-01

European Journal of Operational Research (publié)

doi.org

arxiv.org

Conformal Inverse Optimization

Bo Lin

Timothy Chan

Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (voir plus) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

End-to-end Conditional Robust Optimization

Abhilash Reddy Chenreddy

2024-09-12

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (publié)

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Angelos Georghiou

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (voir plus)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Angelos Georghiou

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org