Emma Frejinger

Andrea Lodi

2024-03-16

Journal of Revenue and Pricing Management (published)

One-Shot Learning for MIPs with SOS1 Constraints

Charly Robinson La Rocca

Jean-François Cordeau

2024-03-14

ArXiv (preprint)

Decoupling regularization from the action space

Sobhan Mohammadpour

Pierre-Luc Bacon

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. W… (see more)hile standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological design task.

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Decoupling regularization from the action space

Sobhan Mohammadpour

Pierre-Luc Bacon

Regularized reinforcement learning (RL), particularly the entropy-regularized kind, has gained traction in optimal control and inverse RL. W… (see more)hile standard unregularized RL methods remain unaffected by changes in the number of actions, we show that it can severely impact their regularized counterparts. This paper demonstrates the importance of decoupling the regularizer from the action space: that is, to maintain a consistent level of regularization regardless of how many actions are involved to avoid over-regularization. Whereas the problem can be avoided by introducing a task-specific temperature parameter, it is often undesirable and cannot solve the problem when action spaces are state-dependent. In the state-dependent action context, different states with varying action spaces are regularized inconsistently. We introduce two solutions: a static temperature selection approach and a dynamic counterpart, universally applicable where this problem arises. Implementing these changes improves performance on the DeepMind control suite in static and dynamic temperature regimes and a biological design task.

2024-01-16

ICLR.cc/2024/Conference (poster)

openreview.net

Maximum entropy GFlowNets with soft Q-learning

Sobhan Mohammadpour

Emmanuel Bengio

Pierre-Luc Bacon

2024-01-01

AISTATS (published)

Pseudo-random Instance Generators in C++ for Deterministic and Stochastic Multi-commodity Network Design Problems

Eric Larsen

Serge Bisaillon

Jean-François Cordeau

Network design problems constitute an important family of combinatorial optimization problems for which numerous exact and heuristic algorit… (see more)hms have been developed over the last few decades. Two central problems in this family are the multi-commodity, capacitated, fixed charge network design problem (MCFNDP) and its stochastic counterpart, the two-stage MCFNDP with recourse. These are standard problems that often serve as work benches for devising and testing models and algorithms in stylized but close-to-realistic settings. The purpose of this paper is to introduce two flexible, high-speed generators capable of simulating a wide range of settings for both the deterministic and stochastic MCFNDPs. We hope that, by facilitating systematic experimentation with new and larger sets of instances, these generators will lead to a more thorough assessment of the performance achieved by exact and heuristic solution methods in both deterministic and stochastic settings. We also hope that making these generators available will promote the reproducibility and comparability of published research.

2023-12-17

ArXiv (preprint)

A Survey of Contextual Optimization Methods for Decision Making under Uncertainty

Utsav Sadana

Abhilash Reddy Chenreddy

Erick Delage

Alexandre Forel

Thibaut Vidal

2023-06-17

ArXiv (preprint)

Scope Restriction for Scalable Real-Time Railway Rescheduling: An Exploratory Study

Erik L. Nygren

Christian Eichenberger

With the aim to stimulate future research, we describe an exploratory study of a railway rescheduling problem. A widely used approach in pra… (see more)ctice and state of the art is to decompose these complex problems by geographical scope. Instead, we propose defining a core problem that restricts a rescheduling problem in response to a disturbance to only trains that need to be rescheduled, hence restricting the scope in both time and space. In this context, the difficulty resides in defining a scoper that can predict a subset of train services that will be affected by a given disturbance. We report preliminary results using the Flatland simulation environment that highlights the potential and challenges of this idea. We provide an extensible playground open-source implementation based on the Flatland railway environment and Answer-Set Programming.

2023-05-05

ArXiv (preprint)

Optimising Electric Vehicle Charging Station Placement Using Advanced Discrete Choice Models

Steven Lamontagne

Margarida Carvalho

Bernard Gendron

Miguel F. Anjos

Ribal Atallah

D'epartement d'informatique et de recherche op'erationnelle

U. Montr'eal

S. O. Mathematics

U. Edinburgh

Institut de Recherche d'Hydro-Qu'ebec

We present a new model for finding the optimal placement of electric vehicle charging stations across a multiperiod time frame so as to maxi… (see more)mise electric vehicle adoption. Via the use of stochastic discrete choice models and user classes, this work allows for a granular modelling of user attributes and their preferences in regard to charging station characteristics. We adopt a simulation approach and precompute error terms for each option available to users for a given number of scenarios. This results in a bilevel optimisation model that is, however, intractable for all but the simplest instances. Our major contribution is a reformulation into a maximum covering model, which uses the precomputed error terms to calculate the users covered by each charging station. This allows solutions to be found more efficiently than for the bilevel formulation. The maximum covering formulation remains intractable in some instances, so we propose rolling horizon, greedy, and greedy randomised adaptive search procedure heuristics to obtain good-quality solutions more efficiently. Extensive computational results are provided, and they compare the maximum covering formulation with the current state of the art for both exact solutions and the heuristic methods. History: Accepted by Andrea Lodi, Area Editor for Design & Analysis of Algorithms–Discrete. Funding: This work was supported by Hydro-Québec and the Natural Sciences and Engineering Research Council of Canada [Discovery grant 2017-06054; Collaborative Research and Development Grant CRDPJ 536757–19]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/ijoc.2022.0185 .

2023-01-01

INFORMS J. Comput. (published)

The load planning and sequencing problem for double-stack trains

Moritz Ruf

Jean-François Cordeau

2022-09-01

Journal of Rail Transport Planning & Management (published)

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Sébastien Lachapelle

Yoshua Bengio

Simon Lacoste-Julien

Andrea Lodi

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (see more)ology to quickly predict expected tactical descriptions of operational solutions (TDOSs). The problem we address occurs in the context of two-stage stochastic programming, where the second stage is demanding computationally. We aim to predict at a high speed the expected TDOS associated with the second-stage problem, conditionally on the first-stage variables. This may be used in support of the solution to the overall two-stage problem by avoiding the online generation of multiple second-stage scenarios and solutions. We formulate the tactical prediction problem as a stochastic optimal prediction program, whose solution we approximate with supervised machine learning. The training data set consists of a large number of deterministic operational problems generated by controlled probabilistic sampling. The labels are computed based on solutions to these problems (solved independently and offline), employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application on load planning for rail transportation show that deep learning models produce accurate predictions in very short computing time (milliseconds or less). The predictive accuracy is close to the lower bounds calculated based on sample average approximation of the stochastic prediction programs.

2022-01-01

INFORMS Journal on Computing (published)

Assessing the Impact: Does an Improvement to a Revenue Management System Lead to an Improved Revenue?

Greta Laage

Andrea Lodi

Guillaume Rabusseau

2021-01-13

ArXiv (preprint)