Elaine Lau

QGFN: Controllable Greediness with Action Values

Stephen Zhewen Lu

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating … (see more)diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

DGFN: Double Generative Flow Networks

2023-10-26

NeurIPS.cc/2023/Workshop/GenBio (poster)

doi.org

openreview.net

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

Nikhil Murali Vemgal

Elaine Lau

Doina Precup

Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total exp… (see more)ected return,

2023-06-18

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning

Jacob Shkrob

My Duc Tran

Doina Precup

Sumana Basu

Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously… (see more) adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present DeepVent, a Conservative Q-Learning (CQL) based offline Deep Reinforcement Learning (DRL) agent that learns to predict the optimal ventilator parameters for a patient to promote 90 day survival. We design a clinically relevant intermediate reward that encourages continuous improvement of the patient vitals as well as addresses the challenge of sparse reward in RL. We find that DeepVent recommends ventilation parameters within safe ranges, as outlined in recent clinical trials. The CQL algorithm offers additional safety by mitigating the overestimation of the value estimates of out-of-distribution states/actions. We evaluate our agent using Fitted Q Evaluation (FQE) and demonstrate that it outperforms physicians from the MIMIC-III dataset.

2022-10-04

ArXiv (preprint)

doi.org

arxiv.org

Policy Gradients Incorporating the Future

David Venuto

Elaine Lau

Doina Precup

Ofir Nachum

Reasoning about the future -- understanding how decisions in the present time affect outcomes in the future -- is one of the central challen… (see more)ges for reinforcement learning (RL), especially in highly-stochastic or partially observable environments. While predicting the future directly is hard, in this work we introduce a method that allows an agent to "look into the future" without explicitly predicting it. Namely, we propose to allow an agent, during its training on past experience, to observe what \emph{actually} happened in the future at that time, while enforcing an information bottleneck to avoid the agent overly relying on this privileged information. This gives our agent the opportunity to utilize rich and useful information about the future trajectory dynamics in addition to the present. Our method, Policy Gradients Incorporating the Future (PGIF), is easy to implement and versatile, being applicable to virtually any policy gradient algorithm. We apply our proposed method to a number of off-the-shelf RL algorithms and show that PGIF is able to achieve higher reward faster in a variety of online and offline RL domains, as well as sparse-reward and partially observable environments.

2022-01-27

ICLR.cc/2022/Conference (poster)

doi.org

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Elaine Lau

Publications