Publications

A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings

Safa Alver

In model-based reinforcement learning (RL), an agent can leverage a learned model to improve its way of behaving in different ways. Two of t… (see more)he prevalent ways to do this are through decision-time and background planning methods. In this study, we are interested in understanding how the value-based versions of these two planning methods will compare against each other across different settings. Towards this goal, we first consider the simplest instantiations of value-based decision-time and background planning methods and provide theoretical results on which one will perform better in the regular RL and transfer learning settings. Then, we consider the modern instantiations of them and provide hypotheses on which one will perform better in the same settings. Finally, we perform illustrative experiments to validate these theoretical results and hypotheses. Overall, our findings suggest that even though value-based versions of the two planning methods perform on par in their simplest instantiations, the modern instantiations of value-based decision-time planning methods can perform on par or better than the modern instantiations of value-based background planning methods in both the regular RL and transfer learning settings.

2024-08-01

EWRL/2024/Workshop (accepted)

openreview.net

Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition

Muhammad Haseeb Aslam

Marco Pedersoli

Alessandro Lameiras Koerich

Eric Granger

Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signa… (see more)ls. Multimodal emotion recognition systems can perform well because they can learn complementary and redundant semantic information from diverse sensors. In real-world scenarios, only a subset of the modalities employed for training may be available at test time. Learning privileged information allows a model to exploit data from additional modalities that are only available during training. SOTA methods for PKD have been proposed to distill information from a teacher model (with privileged modalities) to a student model (without privileged modalities). However, such PKD methods utilize point-to-point matching and do not explicitly capture the relational information. Recently, methods have been proposed to distill the structural information. However, PKD methods based on structural similarity are primarily confined to learning from a single joint teacher representation, which limits their robustness, accuracy, and ability to learn from diverse multimodal sources. In this paper, a multi-teacher PKD (MT-PKDOT) method with self-distillation is introduced to align diverse teacher representations before distilling them to the student. MT-PKDOT employs a structural similarity KD mechanism based on a regularized optimal transport (OT) for distillation. The proposed MT-PKDOT method was validated on the Affwild2 and Biovid datasets. Results indicate that our proposed method can outperform SOTA PKD methods. It improves the visual-only baseline on Biovid data by 5.5%. On the Affwild2 dataset, the proposed method improves 3% and 5% over the visual-only baseline for valence and arousal respectively. Allowing the student to learn from multiple diverse sources is shown to increase the accuracy and implicitly avoids negative transfer to the student model.

2024-08-01

arXiv (published)

doi.org

arxiv.org

Neural differential equations for temperature control in buildings under demand response programs

Vincent Taboga

Clement Gehring

Mathieu Le Cam

Hanane Dagdougui

Pierre-Luc Bacon

2024-08-01

Applied Energy (published)

doi.org

Neural differential equations for temperature control in buildings under demand response programs

Vincent Taboga

Clement Gehring

Mathieu Le Cam

Hanane Dagdougui

Pierre-Luc Bacon

2024-08-01

Applied Energy (published)

doi.org

Noise covariance estimation in multi-task high-dimensional linear models

Kai Tan

Gabriel Romon

Lune Bellec

2024-08-01

Bernoulli (published)

doi.org

arxiv.org

Penalty Learning for Optimal Partitioning using Multilayer Perceptron

Tung L. Nguyen

Toby Dylan Hocking

Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics… (see more), and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly Optimal Partitioning (OP) family, are widely used. To control the changepoints count, these algorithms use a fixed penalty to penalize the changepoints presence. To predict the optimal value of that penalty, existing methods used simple models such as linear or tree-based, which may limit predictive performance. To address this issue, this study proposes using a multilayer perceptron (MLP) with a ReLU activation function to predict the penalty. The proposed model generates continuous predictions -- as opposed to the stepwise ones in tree-based models -- and handles non-linearity better than linear models. Experiments on large benchmark genomic datasets demonstrate that the proposed model improves accuracy and F1 score compared to existing models.

2024-08-01

ArXiv (preprint)

arxiv.org

Perfectly Accurate Membership Inference by a Dishonest Central Server in Federated Learning

Georg Pichler

Marco Romanelli

Leonardo Rey Vega

Pablo Piantanida

2024-08-01

IEEE Transactions on Dependable and Secure Computing (published)

doi.org

arxiv.org

Periodic agent-state based Q-learning for POMDPs

Amit Sinha

Matthieu Geist

Aditya Mahajan

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. H… (see more)owever, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based Q-learning), which is a variant of agent-state-based Q-learning that learns periodic policies. By combining ideas from periodic Markov chains and stochastic approximation, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Finally, we present a numerical experiment to highlight the salient features of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.

2024-08-01

EWRL/2024/Workshop (accepted)

openreview.net

Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping

Vishal Batchu

A. Wilson

Betty Peng

Carl D. Elkin

Umangi Jain

Christopher Van Arsdale

Ross Goroshin

Varun Gulshan

The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by esti… (see more)mating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a Digital Surface Model (DSM) and roof instance segmentation from lower resolution and single oblique views using deep learning models. Our models, trained on aligned satellite and aerial datasets, produce 25cm DSMs and roof segments. With ~1m DSM MAE on buildings, ~5deg roof pitch error and ~56% IOU on roof segmentation, they significantly enhance the Solar API's potential to promote solar adoption.

2024-08-01

arXiv (published)

doi.org

arxiv.org

The effect of gestational age on short- and long-term complications following primary esophageal atresia repair

Mathias Johansen

Samuel Wasserman

Dan Poenaru

Jean-Martin Laberge

Sam J. Daniel

Thomas Engelhardt

2024-08-01

Brazilian Journal of Anesthesiology (published)

doi.org

The Harmonic Exponential Filter for Nonparametric Estimation on Motion Groups

Miguel Saavedra-Ruiz

Steven A. Parkison

Ria Arora

James Richard Forbes

Liam Paull

Bayesian estimation is a vital tool in robotics as it allows systems to update the robot state belief using incomplete information from nois… (see more)y sensors. To render the state estimation problem tractable, many systems assume that the motion and measurement noise, as well as the state distribution, are all unimodal and Gaussian. However, there are numerous scenarios and systems that do not comply with these assumptions. Existing nonparametric filters that are used to model multimodal distributions have drawbacks that limit their ability to represent a diverse set of distributions. This letter introduces a novel approach to nonparametric Bayesian filtering on motion groups, designed to handle multimodal distributions using harmonic exponential distributions. This approach leverages two key insights of harmonic exponential distributions: a) the product of two distributions can be expressed as the element-wise addition of their log-likelihood Fourier coefficients, and b) the convolution of two distributions can be efficiently computed as the tensor product of their Fourier coefficients. These observations enable the development of an efficient and asymptotically exact solution to the Bayes filter up to the band limit of a Fourier transform. We demonstrate our filter's superior performance compared with established nonparametric filtering methods across a range of simulated and real-world localization tasks.

2024-08-01

ArXiv (preprint)

doi.org

arxiv.org

The Harmonic Exponential Filter for Nonparametric Estimation on Motion Groups

Miguel Saavedra-Ruiz

Steven A. Parkison

Ria Arora

James Richard Forbes

Liam Paull

Bayesian estimation is a vital tool in robotics as it allows systems to update the robot state belief using incomplete information from nois… (see more)y sensors. To render the state estimation problem tractable, many systems assume that the motion and measurement noise, as well as the state distribution, are unimodal and Gaussian. However, there are numerous scenarios and systems that do not comply with these assumptions. Existing nonparametric filters that are used to model multimodal distributions have drawbacks that limit their ability to represent a diverse set of distributions. This paper introduces a novel approach to nonparametric Bayesian filtering on motion groups, designed to handle multimodal distributions using harmonic exponential distributions. This approach leverages two key insights of harmonic exponential distributions: a) the product of two distributions can be expressed as the element-wise addition of their log-likelihood Fourier coefficients, and b) the convolution of two distributions can be efficiently computed as the tensor product of their Fourier coefficients. These observations enable the development of an efficient and asymptotically exact solution to the Bayes filter up to the band limit of a Fourier transform. We demonstrate our filter's performance compared with established nonparametric filtering methods across simulated and real-world localization tasks.

2024-08-01

ArXiv (preprint)

doi.org

arxiv.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications