Sparse Decomposition of Graph Neural Networks
Yaochen Hu
Mai Zeng
Ge Zhang
Pavel Rumiantsev
Liheng Ma
Yingxue Zhang
Negotiative Alignment: Embracing Disagreement to Achieve Fairer Outcomes -- Insights from Urban Studies
Rashid A. Mushkani
Hugo Berard
Sample Compression for Continual Learning
Jacob Comeau
Mathieu Bazinet
Sample Compression for Continual Learning
Jacob Comeau
Mathieu Bazinet
Sample Compression for Self Certified Continual Learning
Jacob Comeau
Mathieu Bazinet
Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary. The majority of existi… (see more)ng continual learning approaches in the literature rely on heuristics and do not provide learning guarantees. In this paper, we present a new method called Continual Pick-to-Learn (CoP2L), which is able to retain the most representative samples for each task in an efficient way. CoP2L combines the Pick-to-Learn algorithm (rooted in the sample compression theory) and the experience replay continual learning scheme. This allows us to provide non-vacuous upper bounds on the generalization loss of the learned predictors, numerically computable after each task. We empirically evaluate our approach on several standard continual learning benchmarks across Class-Incremental, Task-Incremental, and Domain-Incremental settings. Our results show that CoP2L is highly competitive across all setups, often outperforming existing baselines, and significantly mitigating catastrophic forgetting compared to vanilla experience replay in the Class-Incremental setting. It is possible to leverage the bounds provided by CoP2L in practical scenarios to certify the predictor reliability on previously learned tasks, in order to improve the trustworthiness of the continual learning algorithm.
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval
Parishad BehnamGhader
Nicholas Meade
Learning Decision Trees as Amortized Structure Inference
Mohammed Mahfoud
Ghait Boukachab
Michał Koziarski
Alex Hernandez-Garcia
Stefan Bauer
Nikolay Malkin
Relative biological effectiveness of 31 meV thermal neutrons in peripheral blood lymphocytes
Laura C Paterson
Fawaz Ali
Mohsen Naseri
David Perez Loureiro
Amy Festarini
Marilyne Stuart
Chad Boyer
Ronald Rogge
Christie Costello
Norma Ybarra
Richard B Richardson
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (see 1 more)
Saif M. Mohammad
Understanding the impact of IoT security patterns on CPU usage and energy consumption: a dynamic approach for selecting patterns with deep reinforcement learning
Saeid Jamshidi
Amin Nikanjam
Kawser Wazed Nafi
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani
Ali Bahri
Moslem Yazdanpanah
Mehrdad Noori
David Osowiechi
Gustavo Adolfo Vargas Hakim
Farzad Beizaee
Milad Cheraghalikhani
Arnab Kumar Mondal
Christian Desrosiers
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Keunho Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Russ Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (see more)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.