Erick Delage

As AI systems become increasingly embedded in human decision-making process, aligning their behavior with human values is critical to ensuri… (see more)ng safe and trustworthy deployment. A central approach to AI Alignment called Imitation Learning (IL), trains a learner to directly mimic desirable human behaviors from expert demonstrations. However, standard IL methods assume that (1) experts act to optimize expected returns; (2) expert policies are Markovian. Both assumptions are inconsistent with empirical findings from behavioral economics, according to which humans are (1) risk-sensitive; and (2) make decisions based on past experience. In this work, we examine the implications of risk sensitivity for IL and show that standard approaches do not capture all optimal policies under risk-sensitive decision criteria. By characterizing these expert policies, we identify key limitations of existing IL algorithms in replicating expert performance in risk-sensitive settings. Our findings underscore the need for new IL frameworks that account for both risk-aware preferences and temporal dependencies to faithfully align AI behavior with human experts.

2025-06-10

ICML.cc/2025/Workshop/MoFA (poster)

Fair Resource Allocation in Weakly Coupled Markov Decision Processes

Xiaohui Tu

Yossiri Adulyasak

Nima Akbarzadeh

We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes, where r… (see more)esource constraints couple the action spaces of

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Planning and Learning in Risk-Aware Restless Multi-Arm Bandits

Nima Akbarzadeh

Yossiri Adulyasak

In restless multi-arm bandits, a central agent is tasked with optimally distributing limited resources across several bandits (arms), with e… (see more)ach arm being a Markov decision process. In this work, we generalize the traditional restless multi-arm bandit problem with a risk-neutral objective by incorporating risk-awareness. We establish indexability conditions for the case of a risk-aware objective and provide a solution based on Whittle index. In addition, we address the learning problem when the true transition probabilities are unknown by proposing a Thompson sampling approach and show that it achieves bounded regret that scales sublinearly with the number of episodes and quadratically with the number of arms. The efficacy of our method in reducing risk exposure in restless multi-arm bandits is illustrated through a set of numerical experiments in the contexts of machine replacement and patient scheduling applications under both planning and learning setups.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Jia Lin Hau

Esther Derman

Mohammad Ghavamzadeh

Marek Petrik

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences … (see more)for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Conformal Inverse Optimization

Bo Lin

Timothy Chan

Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that… (see more) such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.

2024-09-25

NeurIPS.cc/2024/Conference (poster)

End-to-end Conditional Robust Optimization

Abhilash Reddy Chenreddy

2024-09-12

Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence (published)

proceedings.mlr.press

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Angelos Georghiou

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (see more)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Angelos Georghiou

2024-05-01

ICML.cc/2024/Conference (poster)

Crowdkeeping in Last-mile Delivery

Xin Wang

Okan Arslan

2024-02-29

Transportation Science (published)

Crowdkeeping in Last-Mile Delivery

Xin Wang

Okan Arslan

In order to improve the efficiency of the last-mile delivery system when customers are possibly absent for deliveries, we propose the idea o… (see more)f employing the crowd to work as keepers and to provide storage services for their neighbors. Crowd keepers have extra flexibility, more availability, and lower costs than fixed storage options such as automated lockers, and this leads to a more efficient and a more profitable system for last-mile deliveries. We present a bilevel program that jointly determines the assignment, routing, and pricing decisions while considering customer preferences, keeper behaviors, and platform operations. We develop an equivalent single-level program, a mixed-integer linear program with subtour elimination constraints, that can be solved to optimality using a row generation algorithm. To improve the efficiency of the solution procedure, we further derive exact best response sets for both customers and keepers, and approximate optimal travel times using linear regression. We present a numerical study using a real-world data set from Amazon. The fixed-storage and no-storage systems are used as benchmarks to assess the performance of the crowdkeeping system. The results show that the crowdkeeping delivery system has the potential to generate higher profits because of its ability to consolidate deliveries and to eliminate failed deliveries. Funding: Funding provided by the Natural Sciences and Engineering Research Council of Canada [Grants 2022-04979 and 2022-05261], the Canada Research Chair program [Grant CRC-2018-00105], and the China Scholarship Council [Grant 202006190051] is gratefully acknowledged. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2022.0323 .

2024-02-29

Transportation Science (published)

A Column Generation Scheme for Distributionally Robust Multi-Item Newsvendor Problems

Shanshan Wang

This paper studies a distributionally robust multi-item newsvendor problem, where the demand distribution is unknown but specified with a ge… (see more)neral event-wise ambiguity set. Using the event-wise affine decision rules, we can obtain a conservative approximation formulation of the problem, which can typically be further reformulated as a linear program. In order to efficiently solve the resulting large-scale linear program, we develop a column generation-based decomposition scheme and speed up the computational efficiency by exploiting a special column selection strategy and stopping early based on a Karush-Kuhn-Tucker condition test. Focusing on the Wasserstein ambiguity set and the event-wise mean absolute deviation set, a computational study demonstrates both the computational efficiency of the proposed algorithm, which significantly outperforms a commercial solver and a Benders decomposition method, and the out-of-sample superiority of distributionally robust solutions relative to their sample average approximation counterparts. History: Accepted by Nicola Secomandi, Area Editor for Stochastic Models & Reinforcement Learning. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [492997-2016, RGPIN-2016-05208], the National Natural Science Foundation of China [71972012], Alliance de recherche numérique du Canada, and Canada Research Chairs [CRC-2018-00105]. It was also supported by Groupe d’études et de recherche en analyse des décisions (GERAD). Finally, this research was enabled in part by support provided by Digital Research Alliance of Canada ( https://alliancecan.ca/en ). Supplemental Material: The software that supports the findings of this study is available within the paper and its supplemental information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0010 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0010 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

2024-01-03

INFORMS Journal on Computing (published)

Distributional Robustness and Inequity Mitigation in Disaster Preparedness of Humanitarian Operations

Hongming Li

Ning Zhu

Michael Pinedo

Shoufeng Ma

Problem definition: In this paper, we study a predisaster relief network design problem with uncertain demands. The aim is to determine the … (see more)prepositioning and reallocation of relief supplies. Motivated by the call of the International Federation of Red Cross and Red Crescent Societies (IFRC) to leave no one behind, we consider three important practical aspects of humanitarian operations: shortages, equity, and uncertainty. Methodology/results: We first employ a form of robust satisficing measure, which we call the shortage severity measure, to evaluate the severity of the shortage caused by uncertain demand in a context with limited distribution information. Because shortages often raise concerns about equity, we then formulate a mixed-integer lexicographic optimization problem with nonconvex objectives and design a new branch-and-bound algorithm to identify the exact solution. We also propose two approaches for identifying optimal postdisaster adaptable resource reallocation: an exact approach and a conservative approximation that is more computationally efficient. Our case study considers the 2010 Yushu earthquake, which occurred in northwestern China, and demonstrates the value of our methodology in mitigating geographical inequities and reducing shortages. Managerial implications: In our case study, we show that (i) incorporating equity in both predisaster deployment and postdisaster reallocation can produce substantially more equitable shortage prevention strategies while sacrificing only a reasonable amount of total shortage; (ii) increasing donations/budgets may not necessarily alleviate the shortage suffered by the most vulnerable individuals if equity is not fully considered; and (iii) exploiting disaster magnitude information when quantifying uncertainty can help alleviate geographical inequities caused by uncertain relief demands. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [Grant RGPIN-2016-05208], the National Natural Science Foundation of China [Grants 71971154, 72010107004, 72091214, and 72122015], and the Canada Research Chairs [Grant CRC-2018-00105]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2023.1230 .

2023-10-23

Manufacturing & Service Operations Management (published)