Publications

Maximum entropy GFlowNets with soft Q-learning

Sobhan Mohammadpour

Generative Flow Networks (GFNs) have emerged as a powerful tool for sampling discrete objects from unnormalized distributions, offering a sc… (see more)alable alternative to Markov Chain Monte Carlo (MCMC) methods. While GFNs draw inspiration from maximum entropy reinforcement learning (RL), the connection between the two has largely been unclear and seemingly applicable only in specific cases. This paper addresses the connection by constructing an appropriate reward function, thereby establishing an exact relationship between GFNs and maximum entropy RL. This construction allows us to introduce maximum entropy GFNs, which, in contrast to GFNs with uniform backward policy, achieve the maximum entropy attainable by GFNs without constraints on the state space.

2023-12-31

AISTATS (published)

doi.org

proceedings.mlr.press

McGill NLP Group Submission to the MRL 2024 Shared Task: Ensembling Enhances Effectiveness of Multilingual Small LMs

Senyu Li

Hao Yu

Jessica Ojo

David Ifeoluwa Adelani

We present our systems for the three tasks and five languages included in the MRL 2024 Shared Task on Multilingual Multi-task Information Re… (see more)trieval: (1) Named Entity Recognition, (2) Free-form Question Answering, and (3) Multiple-choice Question Answering. For each task, we explored the impact of selecting different multilingual language models for fine-tuning across various target languages, and implemented an ensemble system that generates final outputs based on predictions from multiple fine-tuned models. All models are large language models fine-tuned on task-specific data. Our experimental results show that a more balanced dataset would yield better results. However, when training data for certain languages are scarce, fine-tuning on a large amount of English data supplemented by a small amount of “triggering data” in the target language can produce decent results.

2023-12-31

MRL (published)

doi.org

Metric Flow Matching for Smooth Interpolations on the Data Manifold

Kacper Kapuśniak

Peter Potaptchik

Teodora Reu

Leo Zhang

Alexander Tong

Michael Bronstein

Avishek Joey Bose

Francesco Di Giovanni

Matching objectives underpin the success of modern generative models and rely on constructing conditional paths that transform a source dist… (see more)ribution into a target distribution. Despite being a fundamental building block, conditional paths have been designed principally under the assumption of Euclidean geometry, resulting in straight interpolations. However, this can be particularly restrictive for tasks such as trajectory inference, where straight paths might lie outside the data manifold, thus failing to capture the underlying dynamics giving rise to the observed marginals. In this paper, we propose Metric Flow Matching (MFM), a novel simulation-free framework for conditional flow matching where interpolants are approximate geodesics learned by minimizing the kinetic energy of a data-induced Riemannian metric. This way, the generative model matches vector fields on the data manifold, which corresponds to lower uncertainty and more meaningful interpolations. We prescribe general metrics to instantiate MFM, independent of the task, and test it on a suite of challenging problems including LiDAR navigation, unpaired image translation, and modeling cellular dynamics. We observe that MFM outperforms the Euclidean baselines, particularly achieving SOTA on single-cell trajectory prediction.

2023-12-31

NeurIPS (published)

doi.org

openreview.net

Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play

Gabriel Robert

Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such a… (see more)s Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy.

2023-12-31

AAMAS (published)

doi.org

arxiv.org

Mirror Descent Algorithms with Nearly Dimension-Independent Rates for Differentially-Private Stochastic Saddle-Point Problems

Tom'as Gonz'alez

Crist'obal Guzm'an

Courtney Paquette

2023-12-31

COLT (published)

doi.org

arxiv.org

Mitigating Translationese in Low-resource Languages: The Storyboard Approach

Garry Kuwanto

Eno-Abasi Urua

Priscilla A. Amuok

Shamsuddeen Hassan Muhammad

Aremu Anuoluwapo

Verrah Akinyi Otiende

Loice Emma Nanyanga

T. Nyoike

A. D. Akpan

Nsima Ab Udouboh

Idongesit Udeme Archibong

Idara Effiong Moses

Ifeoluwatayo A. Ige

Benjamin A. Ajibade

Olumide Benjamin Awokoya

Idris Abdulmumin

Saminu Mohammad Aliyu

Ruqayya Nasir Iro

Ibrahim Ahmad

Deontae Smith … (see 4 more)

Praise-EL Michaels

David Ifeoluwa Adelani

Derry Tanti Wijaya

Anietie U Andy

Low-resource languages often face challenges in acquiring high-quality language data due to the reliance on translation-based methods, which… (see more) can introduce the translationese effect. This phenomenon results in translated sentences that lack fluency and naturalness in the target language. In this paper, we propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency. Human annotators and quantitative metrics were used to assess translation quality. The results indicate a preference for text translation in terms of accuracy, while our method demonstrates worse accuracy but better fluency in the language focused.

2023-12-31

International Conference on Language Resources and Evaluation (published)

doi.org

arxiv.org

Mixture of Experts in a Mixture of RL settings

Timon Willi

Johan Samir Obando Ceron

Jakob Nicolaus Foerster

Gintare Karolina Dziugaite

Pablo Samuel Castro

2023-12-31

RLC (published)

doi.org

openreview.net

Model-based graph reinforcement learning for inductive traffic signal control

FranÃ§ois-Xavier Devailly

Denis Larocque

Laurent Charlin

Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection … (see more)or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.

2023-12-31

IEEE Open Journal of Intelligent Transportation Systems (published)

doi.org

arxiv.org

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

Paul Barde

Jakob Foerster

Derek Nowrouzezahrai

Amy Zhang

Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences. H… (see more)owever, most existing Multi-Agent Reinforcement Learning (MARL) methods are online and thus impractical for real-world applications in which collecting new interactions is costly or dangerous. While these algorithms should leverage offline data when available, doing so gives rise to what we call the offline coordination problem. Specifically, we identify and formalize the strategy agreement (SA) and the strategy fine-tuning (SFT) coordination challenges, two issues at which current offline MARL algorithms fail. Concretely, we reveal that the prevalent model-free methods are severely deficient and cannot handle coordination-intensive offline multi-agent tasks in either toy or MuJoCo domains. To address this setback, we emphasize the importance of inter-agent interactions and propose the very first model-based offline MARL method. Our resulting algorithm, Model-based Offline Multi-Agent Proximal Policy Optimization (MOMA-PPO) generates synthetic interaction data and enables agents to converge on a strategy while fine-tuning their policies accordingly. This simple model-based solution solves the coordination-intensive offline tasks, significantly outperforming the prevalent model-free methods even under severe partial observability and with learned world models.

2023-12-31

AAMAS (published)

doi.org

arxiv.org

Motivating Users to Attend to Privacy: A Theory-Driven Design Study

Varun Shiri

Maggie Xiong

Jinghui Cheng

Jin L.C. Guo

In modern technology environments, raising users’ privacy awareness is crucial. Existing efforts largely focused on privacy policy present… (see more)ation and failed to systematically address a radical challenge of user motivation for initiating privacy awareness. Leveraging the Protection Motivation Theory (PMT), we proposed design ideas and categories dedicated to motivating users to engage with privacy-related information. Using these design ideas, we created a conceptual prototype, enhancing the current App Store product page. Results from an online experiment and follow-up interviews showed that our design effectively motivated participants to attend to privacy issues, raising both the threat appraisal and coping appraisal, two main factors in PMT. Our work indicated that effective design should consider combining PMT components, calibrating information content, and integrating other design elements, such as visual cues and user familiarity. Overall, our study contributes valuable design considerations driven by the PMT to amplify the motivational aspect of privacy communication.

2023-12-31

Conference on Designing Interactive Systems (published)

doi.org

arxiv.org

Multidomain Object Detection Framework Using Feature Domain Knowledge Distillation.

Da-Wei Jaw

Shih-Chia Huang

Zhihui Lu

Benjamin C. M. Fung

Sy-Yen Kuo

Object detection techniques have been widely studied, utilized in various works, and have exhibited robust performance on images with suffic… (see more)ient luminance. However, these approaches typically struggle to extract valuable features from low-luminance images, which often exhibit blurriness and dim appearence, leading to detection failures. To overcome this issue, we introduce an innovative unsupervised feature domain knowledge distillation (KD) framework. The proposed framework enhances the generalization capability of neural networks across both low-and high-luminance domains without incurring additional computational costs during testing. This improvement is made possible through the integration of generative adversarial networks and our proposed unsupervised KD process. Furthermore, we introduce a region-based multiscale discriminator designed to discern feature domain discrepancies at the object level rather than from the global context. This bolsters the joint learning process of object detection and feature domain distillation tasks. Both qualitative and quantitative assessments shown that the proposed method, empowered by the region-based multiscale discriminator and the unsupervised feature domain distillation process, can effectively extract beneficial features from low-luminance images, outperforming other state-of-the-art approaches in both low-and sufficient-luminance domains.

2023-12-31

IEEE Trans. Cybern. (published)

doi.org

Multi-objective PSO semi-supervised random forest method for dioxin soft sensor

Wen Xu

Jian Tang

Heng Xia

Wen Yu

JunFei Qiao

2023-12-31

Eng. Appl. Artif. Intell. (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications