Portrait of Erick Delage

Erick Delage

Associate Academic Member
Full Professor, HEC Montréal, Department of Decision Sciences
Research Topics
Optimization
Reinforcement Learning

Biography

Erick Delage is a professor in the Department of Decision Sciences at HEC Montréal, the Canada Research Chair in Decision Making Under Uncertainty, and a member of the College of New Scholars, Artists and Scientists of the Royal Society of Canada.

Delage’s research interests span the areas of robust and stochastic optimization, decision analysis, machine learning, reinforcement learning and risk management. He focuses on the applications of these processes to portfolio optimization, inventory management, and energy and transportation problems.

Current Students

Postdoctorate - HEC Montréal
PhD - HEC Montréal
Postdoctorate - Université de Montréal
Principal supervisor :
Postdoctorate - HEC Montréal
PhD - Concordia University
PhD - HEC Montréal
PhD - HEC Montréal
PhD - HEC Montréal

Publications

What Matters when Modeling Human Behavior using Imitation Learning?
As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (see more)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with e… (see more)ach arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences … (see more)for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.
Conformal Inverse Optimization
Bo Lin
Timothy Chan
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (see more) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.
End-to-end Conditional Robust Optimization
Abhilash Reddy Chenreddy
Robust Data-driven Prescriptiveness Optimization
Mehran Poursoltani
Angelos Georghiou
The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (see more)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
Robust Data-driven Prescriptiveness Optimization
Mehran Poursoltani
Angelos Georghiou
Crowdkeeping in Last-mile Delivery
Okan Arslan