Publications

CALE: Continuous Arcade Learning Environment
Jesse Farebrother
We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (see more)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Orlando Romero Mogrovejo
Chenyang Lyu
Haryo Akbarianto Wibowo
Santiago Góngora
Aishik Mandal
Sukannya Purkayastha
Jesus-German Ortiz-Barajas
Emilio Villa Cueva
Jinheon Baek
Soyeong Jeong
Injy Hamed
Zheng Xin Yong
Zheng Wei Lim
Paula Mónica Silva
Jocelyn Dunstan
D. Meur
Mélanie Jouitteau
David LE MEUR
Joan Nwatu
Ganzorig Batnasan … (see 57 more)
Munkh-Erdene Otgonbold
Munkhjargal Gochoo
Guido Ivetta
Luciana Benotti
Laura Alonso Alemany
Hernán Maina
Jiahui Geng
Tiago Timponi Torrent
Frederico Belcavello
Marcelo Viridiano
Jan Christian Blaise Cruz
Dan John Velasco
Oana Ignat
Zara Burzo
Chenxi Whitehouse
Artem Abzaliev
Teresa Clifford
Gráinne Caulfield
Teresa Lynn
Christian Salamea-Palacios
Vladimir Araujo
Yova Kementchedjhieva
Mihail Minkov Mihaylov
Israel Abebe Azime
Henok Biadglign Ademtew
Bontu Fufa Balcha
Naome Etori
Rada Mihalcea
Atnafu Lambebo Tonja
Maria Camila Buitrago Cabrera
Gisela Vallejo
Holy Lovenia
Ruochen Zhang
Marcos Estecha-Garitagoitia
Mario Rodríguez-Cantelar
Toqeer Ehsan
Rendi Chevi
Muhammad Farid Adilazuarda
Ryandito Diandaru
Samuel Cahyawijaya
Fajri Koto
Tatsuki Kuribayashi
Haiyue Song
Aditya Nanda Kishore Khandavally
Thanmay Jayakumar
Raj Dabre
Mohamed Fazli Mohamed Imam
Kumaranage Ravindu Yasas Nagasinghe
Alina Dragonetti
Luis Fernando D'Haro
Olivier NIYOMUGISHA
Jay Gala
Pranjal A Chitale
Fauzan Farooqui
Thamar Solorio
Alham Fikri Aji
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Charles Guille-Escuret
Pierre-Andre Noel
David Vazquez
Joao Monteiro
Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
Bowen Li
Zhaoyu Li
Qiwei Du
Jinqi Luo
Wenshan Wang
Yaqi Xie
Simon Stepputtis
Chen Wang
Katia P. Sycara
Pradeep Kumar Ravikumar
Alexander G. Gray
Sebastian Scherer
Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural n… (see more)etworks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as
Reactzyme: A Benchmark for Enzyme-Reaction Prediction
Chenqing Hua
Bozitao Zhong
Sitao Luan
Liang Hong
Shuangjia Zheng
Reconstructing Spatio-Temporal Trajectories of Visual Object Memories in the Human Brain
Julia Lifanov
Benjamin J. Griffiths
Juan Linde-Domingo
Catarina S. Ferreira
Martin Wilson
Stephen D. Mayhew
Maria Wimber
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y Fu
Quentin Gregory Anthony
Yonatan Oren
Shane Adams
Anton Alexandrov
Xiaozhong Lyu
Huu Nguyen
Xiaozhe Yao
Virginia Adams
Ben Athiwaratkun
Rahul Chalamala
Kezhen Chen
Max Ryabinin
Tri Dao
Percy Liang
Christopher Re
Ce Zhang
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Joao Monteiro
Pierre-Andre Noel
Étienne Marcotte
Sai Rajeswar
Valentina Zantedeschi
David Vazquez
Perouz Taslakian
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (see more)es encyclopedic documents that harbor a vast amount of general knowledge (*e.g.*, Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (*e.g.*, a news article) absent from the internet; (2) a question about the document’s topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs
Julia Gastinger
Shenyang Huang
Mikhail Galkin
Erfan Loghmani
Ali Parviz
Farimah Poursafaei
Jacob Danovitch
Emanuele Rossi
Ioannis Koutis
Heiner Stuckenschmidt
The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
Eshta Bhardwaj
Harshit Gujral
Siyi Wu
Ciara Zogheib
Christoph Becker
Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not… (see more) millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models - evidenced not least by the creation of the Datasets and Benchmarks track itself. This work provides an analysis of dataset development practices at NeurIPS through the lens of data curation. We present an evaluation framework for dataset documentation, consisting of a rubric and toolkit developed through a literature review of data curation principles. We use the framework to assess the strengths and weaknesses in current dataset development practices of 60 datasets published in the NeurIPS Datasets and Benchmarks track from 2021-2023. We summarize key findings and trends. Results indicate greater need for documentation about environmental footprint, ethical considerations, and data management. We suggest targeted strategies and resources to improve documentation in these areas and provide recommendations for the NeurIPS peer-review process that prioritize rigorous data curation in ML. Finally, we provide results in the format of a dataset that showcases aspects of recommended data curation practices. Our rubric and results are of interest for improving data curation practices broadly in the field of ML as well as to data curation and science and technology studies scholars studying practices in ML. Our aim is to support continued improvement in interdisciplinary research on dataset practices, ultimately improving the reusability and reproducibility of new datasets and benchmarks, enabling standardized and informed human oversight, and strengthening the foundation of rigorous and responsible ML research.
Using Unity to Help Solve Reinforcement Learning
Connor Brennan
Andrew Robert Williams
Omar G. Younis
Vedant Vyas
Daria Yasafova
Leveraging the depth and flexibility of XLand as well as the rapid prototyping features of the Unity engine, we present the United Unity Uni… (see more)verse — an open-source toolkit designed to accelerate the creation of innovative reinforcement learning environments. This toolkit includes a robust implementation of XLand 2.0 complemented by a user-friendly interface which allows users to modify the details of procedurally generated terrains and task rules with ease. Additionally, we provide a curated selection of terrains and rule sets, accompanied by implementations of reinforcement learning baselines to facilitate quick experimentation with novel architectural designs for adaptive agents. Furthermore, we illustrate how the United Unity Universe serves as a high-level language that enables researchers to develop diverse and endlessly variable 3D environments within a unified framework. This functionality establishes the United Unity Universe (U3) as an essential tool for advancing the field of reinforcement learning, especially in the development of adaptive and generalizable learning systems.