Publications

Discrete Probabilistic Inference as Control in Multi-path Environments
Tristan Deleu
Padideh Nouri
Nikolay Malkin
We consider the problem of sampling from a discrete and structured distribution as a sequential decision problem, where the objective is to … (voir plus)find a stochastic policy such that objects are sampled at the end of this sequential process proportionally to some predefined reward. While we could use maximum entropy Reinforcement Learning (MaxEnt RL) to solve this problem for some distributions, it has been shown that in general, the distribution over states induced by the optimal policy may be biased in cases where there are multiple ways to generate the same object. To address this issue, Generative Flow Networks (GFlowNets) learn a stochastic policy that samples objects proportionally to their reward by approximately enforcing a conservation of flows across the whole Markov Decision Process (MDP). In this paper, we extend recent methods correcting the reward in order to guarantee that the marginal distribution induced by the optimal MaxEnt RL policy is proportional to the original reward, regardless of the structure of the underlying MDP. We also prove that some flow-matching objectives found in the GFlowNet literature are in fact equivalent to well-established MaxEnt RL algorithms with a corrected reward. Finally, we study empirically the performance of multiple MaxEnt RL and GFlowNet algorithms on multiple problems involving sampling from discrete distributions.
Neural Active Learning Meets the Partial Monitoring Framework
Maxime Heuillet
Ola Ahmad
Neural Active Learning Meets the Partial Monitoring Framework
Maxime Heuillet
Ola Ahmad
We focus on the online-based active learning (OAL) setting where an agent operates over a stream of observations and trades-off between the … (voir plus)costly acquisition of information (labelled observations) and the cost of prediction errors. We propose a novel foundation for OAL tasks based on partial monitoring, a theoretical framework specialized in online learning from partially informative actions. We show that previously studied binary and multi-class OAL tasks are instances of partial monitoring. We expand the real-world potential of OAL by introducing a new class of cost-sensitive OAL tasks. We propose NeuralCBP, the first PM strategy that accounts for predictive uncertainty with deep neural networks. Our extensive empirical evaluation on open source datasets shows that NeuralCBP has favorable performance against state-of-the-art baselines on multiple binary, multi-class and cost-sensitive OAL tasks.
Penalty weight tuning in high dose rate brachytherapy using multi-objective Bayesian optimization.
Hossein Jafarzadeh
Majd Antaki
Ximeng Mao
Marie Duclos
Farhad Maleki
OBJECTIVE Treatment plan optimization in high dose rate (HDR) brachytherapy often requires manual fine-tuning of penalty weights for each ob… (voir plus)jective, which can be time-consuming and dependent on the planner's experience. To automate this process, this study used a multi-criteria approach called multi-objective Bayesian optimization with q-noisy expected hypervolume improvement as its acquisition function (MOBO-qNEHVI). Approach: The treatment plans of 13 prostate cancer patients were retrospectively imported to a research treatment planning system, RapidBrachyMTPS, where fast mixed integer optimization (FMIO) performs dwell time optimization given a set of penalty weights to deliver 15 Gy to the target volume. MOBO-qNEHVI was used to find patient-specific Pareto optimal penalty weight vectors that yield clinically acceptable dose volume histogram metrics. The relationship between the number of MOBO-qNEHVI iterations and the number of clinically acceptable plans per patient (acceptance rate) was investigated. The performance time was obtained for various parameter configurations. Main results: MOBO-qNEHVI found clinically acceptable treatment plans for all patients. With increasing the number of MOBO-qNEHVI iterations, the acceptance rate grew logarithmically while the performance time grew exponentially. Fixing the penalty weight of the tumour volume to maximum value, adding the target dose as a parameter, initiating MOBO-qNEHVI with 25 parallel sampling of FMIO, and running 6 MOBO-qNEHVI iterations found solutions that delivered 15 Gy to the hottest 95% of the clinical target volume while respecting the dose constraints to the organs at risk. The average acceptance rate for each patient was 89.74% ± 8.11%, and performance time was 66.6 ± 12.6 seconds. The initiation took 22.47 ± 7.57 s, and each iteration took 7.35 ± 2.45 s to find one Pareto solution. Significance: MOBO-qNEHVI can automatically explore the trade-offs between treatment plan objectives in a patient-specific manner within a minute. This approach can reduce the dependency of plan quality on planner's experience.
Autoregressive Networks with Dependent Edges
Jinyuan Chang
Qin Fang
Peter W. MacDonald
Qiwei Yao
Radiation hardness of open Fabry-Pérot microcavities
Fernanda C. Rodrigues-Machado
Erika Janitz
Simon Bernard
H. Bekerat
Malcolm McEwen
James Renaud
Lilian Childress
Jack C Sankey
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani
Bac Nguyen
Samuel Lavoie
Ranjay Krishna
Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception al… (voir plus)lows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.
Universal Adversarial Triggers Are Not Universal
Nicholas Meade
Arkil Patel
Fairness Incentives in Response to Unfair Dynamic Pricing
Jesse Thibodeau
Hadi Nekoei
Afaf Taïk
Janarthanan Rajendran
The use of dynamic pricing by profit-maximizing firms gives rise to demand fairness concerns, measured by discrepancies in consumer groups' … (voir plus)demand responses to a given pricing strategy. Notably, dynamic pricing may result in buyer distributions unreflective of those of the underlying population, which can be problematic in markets where fair representation is socially desirable. To address this, policy makers might leverage tools such as taxation and subsidy to adapt policy mechanisms dependent upon their social objective. In this paper, we explore the potential for AI methods to assist such intervention strategies. To this end, we design a basic simulated economy, wherein we introduce a dynamic social planner (SP) to generate corporate taxation schedules geared to incentivizing firms towards adopting fair pricing behaviours, and to use the collected tax budget to subsidize consumption among underrepresented groups. To cover a range of possible policy scenarios, we formulate our social planner's learning problem as a multi-armed bandit, a contextual bandit and finally as a full reinforcement learning (RL) problem, evaluating welfare outcomes from each case. To alleviate the difficulty in retaining meaningful tax rates that apply to less frequently occurring brackets, we introduce FairReplayBuffer, which ensures that our RL agent samples experiences uniformly across a discretized fairness space. We find that, upon deploying a learned tax and redistribution policy, social welfare improves on that of the fairness-agnostic baseline, and approaches that of the analytically optimal fairness-aware baseline for the multi-armed and contextual bandit settings, and surpassing it by 13.19% in the full RL setting.
Learning Control Barrier Functions and their application in Reinforcement Learning: A Survey
Maeva Guerrier
Hassan Fouad
Reinforcement learning is a powerful technique for developing new robot behaviors. However, typical lack of safety guarantees constitutes a … (voir plus)hurdle for its practical application on real robots. To address this issue, safe reinforcement learning aims to incorporate safety considerations, enabling faster transfer to real robots and facilitating lifelong learning. One promising approach within safe reinforcement learning is the use of control barrier functions. These functions provide a framework to ensure that the system remains in a safe state during the learning process. However, synthesizing control barrier functions is not straightforward and often requires ample domain knowledge. This challenge motivates the exploration of data-driven methods for automatically defining control barrier functions, which is highly appealing. We conduct a comprehensive review of the existing literature on safe reinforcement learning using control barrier functions. Additionally, we investigate various techniques for automatically learning the Control Barrier Functions, aiming to enhance the safety and efficacy of Reinforcement Learning in practical robot applications.
Foliar spectra accurately distinguish most temperate tree species and show strong phylogenetic signal
Florence Blanchard
Anne Bruneau
BACS: Background Aware Continual Semantic Segmentation
Mostafa ElAraby
Ali Harakeh
Semantic segmentation plays a crucial role in enabling comprehensive scene understanding for robotic systems. However, generating annotation… (voir plus)s is challenging, requiring labels for every pixel in an image. In scenarios like autonomous driving, there's a need to progressively incorporate new classes as the operating environment of the deployed agent becomes more complex. For enhanced annotation efficiency, ideally, only pixels belonging to new classes would be annotated. This approach is known as Continual Semantic Segmentation (CSS). Besides the common problem of classical catastrophic forgetting in the continual learning setting, CSS suffers from the inherent ambiguity of the background, a phenomenon we refer to as the"background shift'', since pixels labeled as background could correspond to future classes (forward background shift) or previous classes (backward background shift). As a result, continual learning approaches tend to fail. This paper proposes a Backward Background Shift Detector (BACS) to detect previously observed classes based on their distance in the latent space from the foreground centroids of previous steps. Moreover, we propose a modified version of the cross-entropy loss function, incorporating the BACS detector to down-weight background pixels associated with formerly observed classes. To combat catastrophic forgetting, we employ masked feature distillation alongside dark experience replay. Additionally, our approach includes a transformer decoder capable of adjusting to new classes without necessitating an additional classification head. We validate BACS's superior performance over existing state-of-the-art methods on standard CSS benchmarks.