Learning Decision Trees as Amortized Structure Inference
Mohammed Mahfoud
Ghait Boukachab
Michał Koziarski
Alex Hernandez-Garcia
Stefan Bauer
Nikolay Malkin
Relative biological effectiveness of 31 meV thermal neutrons in peripheral blood lymphocytes
Laura C Paterson
Fawaz Ali
Mohsen Naseri
David Perez Loureiro
Amy Festarini
Marilyne Stuart
Chad Boyer
Ronald Rogge
Christie Costello
Norma Ybarra
Richard B Richardson
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (voir 1 de plus)
Saif M. Mohammad
SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection
Shamsuddeen Hassan Muhammad
Nedjma OUSIDHOUM
Idris Abdulmumin
Seid Muhie Yimam
Jan Philip Wahle
Terry Lima Ruas
Meriem Beloucif
Christine de Kock
Tadesse Belay
Ibrahim Ahmad
Nirmal Surange
Daniela Teodorescu
Alham Fikri Aji
Felermino Ali
Vladimir Araujo
Abinew Ayele
Oana Ignat
Alexander Panchenko
Yi Zhou … (voir 1 de plus)
Saif M. Mohammad
Understanding the impact of IoT security patterns on CPU usage and energy consumption: a dynamic approach for selecting patterns with deep reinforcement learning
Saeid Jamshidi
Amin Nikanjam
Kawser Wazed Nafi
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani
Ali Bahri
Moslem Yazdanpanah
Mehrdad Noori
David Osowiechi
Gustavo Adolfo Vargas Hakim
Farzad Beizaee
Milad Cheraghalikhani
Arnab Kumar Mondal
Christian Desrosiers
The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Keunho Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Russ Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (voir plus)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
Unveiling Inefficiencies in LLM-Generated Code: Toward a Comprehensive Taxonomy
Altaf Allah Abbassi
Leuson Da Silva
Amin Nikanjam
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Shikhar Murty
Hao Zhu
Christopher D Manning
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agent… (voir plus)s. Given any website, NNetNav produces these demonstrations by retroactively labeling action sequences from an exploration policy. Most work on training browser agents has relied on expensive human supervision, and the limited prior work on such interaction-based techniques has failed to provide effective search through the exponentially large space of exploration. In contrast, NNetNav exploits the hierarchical structure of language instructions to make this search more tractable: Complex instructions are typically decomposable into simpler sub-tasks, allowing NNetNav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. \texttt{LLama-3.1-8b} finetuned on 10k NNetNav self-generated demonstrations obtains over 16\% success rate on WebArena, and 35\% on WebVoyager, an improvement of 15pts and 31pts respectively over zero-shot \texttt{LLama-3.1-8b}, outperforming zero-shot GPT-4 and reaching the state-of-the-art among unsupervised methods, for both benchmarks.
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Shikhar Murty
Hao Zhu
Christopher D Manning
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings
Billy Joe Franks
Moshe Eliasof
Semih Cantürk
Carola-Bibiane Schönlieb
Sophie Fellenz
Marius Kloft
Recent advances in integrating positional and structural encodings (PSEs) into graph neural networks (GNNs) have significantly enhanced thei… (voir plus)r performance across various graph learning tasks. However, the general applicability of these encodings and their potential to serve as foundational representations for graphs remain uncertain. This paper investigates the fine-tuning efficiency, scalability with sample size, and generalization capability of learnable PSEs across diverse graph datasets. Specifically, we evaluate their potential as universal pre-trained models that can be easily adapted to new tasks with minimal fine-tuning and limited data. Furthermore, we assess the expressivity of the learned representations, particularly, when used to augment downstream GNNs. We demonstrate through extensive benchmarking and empirical analysis that PSEs generally enhance downstream models. However, some datasets may require specific PSE-augmentations to achieve optimal performance. Nevertheless, our findings highlight their significant potential to become integral components of future graph foundation models. We provide new insights into the strengths and limitations of PSEs, contributing to the broader discourse on foundation models in graph learning.
Tractable Representations for Convergent Approximation of Distributional HJB Equations
Julie Alhosh
Harley Wiltzer