Publications

Towards General-Purpose Model-Free Reinforcement Learning

Scott Fujimoto

Pierluca D'Oro

Amy Zhang

Yuandong Tian

Michael G. Rabbat

Reinforcement learning (RL) promises a framework for near-universal problem-solving. In practice however, RL algorithms are often tailored t… (voir plus)o specific benchmarks, relying on carefully tuned hyperparameters and algorithmic choices. Recently, powerful model-based RL methods have shown impressive general results across benchmarks but come at the cost of increased complexity and slow run times, limiting their broader applicability. In this paper, we attempt to find a unifying model-free deep RL algorithm that can address a diverse class of domains and problem settings. To achieve this, we leverage model-based representations that approximately linearize the value function, taking advantage of the denser task objectives used by model-based RL while avoiding the costs associated with planning or simulated trajectories. We evaluate our algorithm, MR.Q, on a variety of common RL benchmarks with a single set of hyperparameters and show a competitive performance against domain-specific and general baselines, providing a concrete step towards building general-purpose model-free deep RL algorithms.

2025-01-21

ICLR.cc/2025/Conference (spotlight)

Towards Improving Exploration Through Sibling Augmented GFlowNets

2025-01-21

ICLR.cc/2025/Conference (poster)

Towards Interpreting Visual Information Processing in Vision-Language Models

Clement Neo

Luke Ong

Philip Torr

Mor Geva

David M. Krueger

Fazl Barez

2025-01-21

ICLR.cc/2025/Conference (poster)

Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms

Shadi Zabad

Chirayu Anant Haryan

Simon Gravel

Sanchit Misra

Yuemei Li

2025-01-21

bioRxiv (prépublication)

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Khimya Khetarpal

Zhaohan Daniel Guo

Bernardo Avila Pires

Yunhao Tang

Clare Lyle

Mark Rowland

Nicolas Heess

Diana Borsa

Arthur Guez

Will Dabney

2025-01-21

aistats.org/AISTATS/2025/Conference (poster)

proceedings.mlr.press

Enhancing Privacy in the Early Detection of Sexual Predators Through Federated Learning and Differential Privacy

Khaoula Chehbouni

Martine De Cock

Gilles Caporossi

Afaf Taïk

Reihaneh Rabbany

Golnoosh Farnadi

The increased screen time and isolation caused by the COVID-19 pandemic have led to a significant surge in cases of online grooming, which i… (voir plus)s the use of strategies by predators to lure children into sexual exploitation. Previous efforts to detect grooming in industry and academia have involved accessing and monitoring private conversations through centrally-trained models or sending private conversations to a global server. In this work, we implement a privacy-preserving pipeline for the early detection of sexual predators. We leverage federated learning and differential privacy in order to create safer online spaces for children while respecting their privacy. We investigate various privacy-preserving implementations and discuss their benefits and shortcomings. Our extensive evaluation using real-world data proves that privacy and utility can coexist with only a slight reduction in utility.

2025-01-20

ArXiv (prépublication)

Supervised Large Neighbourhood Search for MIPs

Charly Robinson La Rocca

Jean-François Cordeau

Emma Frejinger

Large Neighbourhood Search (LNS) is a powerful heuristic framework for solving Mixed-Integer Programming (MIP) problems. However, designing … (voir plus)effective variable selection strategies in LNS remains challenging, especially for diverse sets of problems. In this paper, we propose an approach that integrates Machine Learning (ML) within the destroy operator of LNS for MIPs with a focus on minimal offline training. We implement a modular LNS matheuristic as a test bench to compare different LNS heuristics, including our ML-enhanced LNS. Experimental results on the MIPLIB 2017 dataset demonstrate that the matheuristic can significantly improve the performance of state-of-the-art solvers like Gurobi and SCIP. We conduct analyses on noisy oracles to explore the impact of prediction accuracy on solution quality. Additionally, we develop techniques to enhance the ML model through loss adjustments and sampling routines. Our findings suggest that while random LNS remains competitive, our Supervised LNS (SLNS) outperforms other baselines and helps set the foundation for future research on ML for LNS methods that are both efficient and general.

2025-01-17

ArXiv (prépublication)

CHIRP: A Fine-Grained Benchmark for Open-Ended Response Evaluation in Vision-Language Models

Daniel Z Kaplan

Qirui Sun

Jonathan Siu Chi Lim

Quentin Gregory Anthony

Edwin Fennell

Irina Rish

The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and ben… (voir plus)chmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research.

2025-01-15

ArXiv (prépublication)

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Shamsuddeen Hassan Muhammad

Idris Abdulmumin

Abinew Ayele

David Ifeoluwa Adelani

Ibrahim Ahmad

Saminu Mohammad Aliyu

Nelson Odhiambo Onyango

Lilian D. A. Wanzare

Samuel Rutunda

Lukman Jibril Aliyu

Esubalew Alemneh

Oumaima Hourrane

Hagos Gebremichael

Elyas Abdi Ismail

Meriem Beloucif

Ebrahim Chekol Jibril

Andiswa Bukula

Rooweither Mabuya

Salomey Osei

Abigail Oppong … (voir 7 de plus)

Tadesse Belay

Tadesse Kebede Guge

Tesfa Tegegne Asfaw

Chiamaka Ijeoma Chukwuneke

Paul Rottger

Seid Muhie Yimam

Nedjma OUSIDHOUM

Hate speech and abusive language are global phenomena that need socio-cultural background knowledge to be understood, identified, and modera… (voir plus)ted. However, in many regions of the Global South, there have been several documented occurrences of (1) absence of moderation and (2) censorship due to the reliance on keyword spotting out of context. Further, high-profile individuals have frequently been at the center of the moderation process, while large and targeted hate speech campaigns against minorities have been overlooked. These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes. To address this issue, we present AfriHate: a multilingual collection of hate speech and abusive language datasets in 15 African languages. Each instance in AfriHate is annotated by native speakers familiar with the local culture. We report the challenges related to the construction of the datasets and present various classification baseline results with and without using LLMs. The datasets, individual annotations, and hate speech and offensive language lexicons are available on https://github.com/AfriHate/AfriHate

2025-01-13

ArXiv (prépublication)

Integrating food webs in species distribution models can improve ecological niche estimation and predictions

Giovanni Poggiato

Jérémy Andréoletti

Laura J. Pollock

Wilfried Thuiller

2025-01-13

Ecography (publié)

Multi-agent deep reinforcement learning with online and fair optimal dispatch of EV aggregators

Arian Shah Kamrani

Anoosh Dini

Hanane Dagdougui

Keyhan Sheshyekani

The growing popularity of electric vehicles (EVs) and the unpredictable behavior of EV owners have attracted attention to real-time coordina… (voir plus)tion of EVs charging management. This paper presents a hierarchical structure for charging management of EVs by integrating fairness and efficiency concepts within the operations of the distribution system operator (DSO) while utilizing a multi-agent deep reinforcement learning (MADRL) framework to tackle the complexities of energy purchasing and distribution among EV aggregators (EVAs). At the upper level, DSO calculates the maximum allowable power for each EVA based on power flow constraints to ensure grid safety. Then, it finds the optimal efficiency-jain tradeoff (EJT) point, where it sells the highest energy amount while ensuring equitable energy distribution. At the lower level, initially, each EVA acts as an agent employing a double deep Q-network (DDQN) with adaptive learning rates and prioritized experience replay to determine optimal energy purchases from the DSO. Then, the real-time smart dispatch (RSD) controller prioritizes EVs for energy dispatch based on relevant EVs information. Findings indicate the proposed enhanced DDQN outperforms deep deterministic policy gradient (DDPG) and proximal policy optimization (PPO) in cumulative rewards and convergence speed. Finally, the framework’s performance is evaluated against uncontrolled charging and the first come first serve (FCFS) scenario using the 118-bus distribution system, demonstrating superior performance in maintaining safe operation of the grid while reducing charging costs for EVAs. Additionally, the framework’s integration with renewable energy sources (RESs), such as photovoltaic (PV), demonstrates its potential to enhance grid reliability. • Introduces a scalable MADRL framework for real-time EV charging and energy distribution. • Ensures fairness via an Efficiency-Jain Tradeoff (EJT) strategy at the DSO level. • Enhances agent convergence with DDQN using adaptive learning rates and prioritized replay. • Preserves stakeholder privacy with decentralized control and minimal data sharing. • Balances grid reliability with equitable energy allocation under dynamic uncertainties.

2025-01-08

Machine Learning with Applications (publié)