Portrait de Erick Delage

Erick Delage

Membre académique associé
Professeur titulaire, HEC Montréal, Département de sciences de la décision
Sujets de recherche
Apprentissage par renforcement
Optimisation

Biographie

Erick Delage est professeur au Département de sciences de la décision à HEC Montréal, titulaire de la Chaire de recherche du Canada en prise de décision sous incertitude, et membre du Collège de nouveaux chercheurs et créateurs en art et en science de la Société royale du Canada. Ses domaines de recherche englobent l'optimisation robuste et stochastique, l'analyse de décision, l'apprentissage automatique, l'apprentissage par renforcement et la gestion des risques avec des applications en optimisation de portefeuille, en gestion des stocks, et dans les problèmes liés à l'énergie et aux transports.

Étudiants actuels

Postdoctorat - HEC
Doctorat - HEC
Postdoctorat - UdeM
Superviseur⋅e principal⋅e :
Postdoctorat - HEC
Doctorat - Concordia
Doctorat - HEC
Doctorat - HEC
Doctorat - HEC

Publications

What Matters when Modeling Human Behavior using Imitation Learning?
As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (voir plus)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (voir plus)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (voir plus)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with e… (voir plus)ach arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences … (voir plus)for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.
Conformal Inverse Optimization
Bo Lin
Timothy Chan
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (voir plus) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.
End-to-end Conditional Robust Optimization
Abhilash Reddy Chenreddy
Robust Data-driven Prescriptiveness Optimization
Mehran Poursoltani
Angelos Georghiou
The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (voir plus)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
Robust Data-driven Prescriptiveness Optimization
Mehran Poursoltani
Angelos Georghiou
Crowdkeeping in Last-mile Delivery
Okan Arslan