Portrait of Erick Delage

Erick Delage

Associate Academic Member
Full Professor, HEC Montréal, Department of Decision Sciences
Research Topics
Optimization
Reinforcement Learning

Biography

Erick Delage is a professor in the Department of Decision Sciences at HEC Montréal, the Canada Research Chair in Decision Making Under Uncertainty, and a member of the College of New Scholars, Artists and Scientists of the Royal Society of Canada.

Delage’s research interests span the areas of robust and stochastic optimization, decision analysis, machine learning, reinforcement learning and risk management. He focuses on the applications of these processes to portfolio optimization, inventory management, and energy and transportation problems.

Current Students

Postdoctorate - HEC Montréal
Collaborating Alumni - Université de Montréal
Principal supervisor :
Postdoctorate - HEC Montréal
PhD - Concordia University
PhD - HEC Montréal
PhD - HEC Montréal

Publications

Epistemic Robust Offline Reinforcement Learning
Abhilash Reddy Chenreddy
Offline reinforcement learning learns policies from fixed datasets without further environment interaction. A key challenge in this setting … (see more)is epistemic uncertainty, arising from limited or biased data coverage, particularly when the behavior policy systematically avoids certain actions. This can lead to inaccurate value estimates and unreliable generalization. Ensemble-based methods like SAC-N mitigate this by conservatively estimating Q-values using the ensemble minimum, but they require large ensembles and often conflate epistemic with aleatoric uncertainty. To address these limitations, we propose a unified and generalizable framework that replaces discrete ensembles with compact uncertainty sets over Q-values. %We further introduce an Epinet based model that directly shapes the uncertainty sets to optimize the cumulative reward under the robust Bellman objective without relying on ensembles. We also introduce a benchmark for evaluating offline RL algorithms under risk-sensitive behavior policies, and demonstrate that our method achieves improved robustness and generalization over ensemble-based baselines across both tabular and continuous state domains.
Joint Rolling Stock and Crew Scheduling with Multi-train Composition in Urban Rail Networks
Entai Wang
Lixing Yang
Jean-François Cordeau
Ziyou Gao
Yossiri Adulyasak
Rolling stock scheduling and crew scheduling are two fundamental problems that arise in the planning of urban rail operations and that are e… (see more)specially important in the case of flexible operations in real-world networks. These problems are often solved separately and sequentially in different planning stages, resulting in limited options to adjust crew schedules after rolling stock decisions have been made. To better adjust these two decision-making processes and achieve better solutions, this paper studies a joint rolling stock and crew scheduling problem in urban rail networks. A novel optimization model is formulated with the aim of reducing the operational cost of rolling stock units and crew members. In addition, the multi-train composition mode is considered to adequately match different frequency requirements and rolling stock transport capacities. To solve the model, a customized branch-and-price-and-cut solution algorithm is proposed to find the optimal schedule schemes, in which Benders decomposition is used to solve the linear programming relaxation of the path-based reformulation. Two customized column generation methods with label correcting are embedded to solve the master problem and pricing subproblem for generating paths (columns) corresponding to rolling stock units and crew groups, respectively. Finally, a branch-and-bound procedure with several acceleration techniques is proposed to find integer solutions. To demonstrate the computational performance and the robustness of the proposed approaches, a series of numerical experiments are performed in real-world instances of the Beijing urban rail network under different settings. The computational results confirm the high efficiency of the solution methodology and the benefits of the flexible operation schemes based on the solutions found by the proposed methods. Funding: This work was supported by National Natural Science Foundation of China [Grants 72288101, 72322022, 72371015]. The first author sincerely thanks the China Scholarship Council for supporting his visiting PhD program [Grant 202407090173]. Supplemental Material: The electronic companion is available at https://doi.org/10.1287/trsc.2024.0905 .
Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity
Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastr… (see more)ophic events. Unlike risk-neutral objectives, the static CVaR of the return depends on entire trajectories without admitting a recursive Bellman decomposition in the underlying Markov decision process. A classical resolution relies on state augmentation with a continuous variable. However, unless restricted to a specialized class of admissible value functions, this formulation induces sparse rewards and degenerate fixed points. In this work, we propose a novel formulation of the static CVaR objective based on augmentation. Our alternative approach leads to a Bellman operator with: (1) dense per-step rewards; (2) contracting properties on the full space of bounded value functions. Building on this theoretical foundation, we develop risk-averse value iteration and model-free Q-learning algorithms that rely on discretized augmented states. We further provide convergence guarantees and approximation error bounds due to discretization. Empirical results demonstrate that our algorithms successfully learn CVaR-sensitive policies and achieve effective performance-safety trade-offs.
Boosting CVaR Policy Optimization with Quantile Gradients
Optimizing Conditional Value-at-risk (CVaR) using policy gradient (a.k.a CVaR-PG) faces significant challenges of sample inefficiency. This … (see more)inefficiency stems from the fact that it focuses on tail-end performance and overlooks many sampled trajectories. We address this problem by augmenting CVaR with an expected quantile term. Quantile optimization admits a dynamic programming formulation that leverages all sampled data, thus improves sample efficiency. This does not alter the CVaR objective since CVaR corresponds to the expectation of quantile over the tail. Empirical results in domains with verifiable risk-averse behavior show that our algorithm within the Markovian policy class substantially improves upon CVaR-PG and consistently outperforms other existing methods.
Planning and Learning in Average Risk-aware MDPs
For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. Howeve… (see more)r, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.
What Matters when Modeling Human Behavior using Imitation Learning?
As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (see more)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.
Fair Resource Allocation in Weakly Coupled Markov Decision Processes
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of
Planning and Learning in Risk-Aware Restless Multi-Arm Bandits
Yossiri Adulyasak
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis
Jia Lin Hau
Mohammad Ghavamzadeh
Marek Petrik
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences … (see more)for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.
Learning-to-Optimize for Consolidation and Transshipment in Multi-store Order Delivery
Okan Arslan
Jean-François Cordeau
A Survey of Contextual Optimization Methods for Decision-Making under Uncertainty
Utsav Sadana
Alexandre Forel
Thibaut Vidal
Conformal Inverse Optimization
Timothy Chan
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (see more) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.