Longitudinal bi-criteria framework for assessing national healthcare responses to pandemic outbreaks
Adel Guitouni
Nabil Belacel
Belaid Moa
Munire Erman
Halim Abdul
CALE: Continuous Arcade Learning Environment
Jesse Farebrother
We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare … (see more)et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic. CALE is available as part of the ALE athttps://github.com/Farama-Foundation/Arcade-Learning-Environment.
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David LE MEUR
David Orlando Romero Mogrovejo
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
Aditya Nanda Kishore Khandavally
Aishik Mandal
Alina Dragonetti
Artem Abzaliev
Atnafu Lambebo Tonja
Bontu Fufa Balcha
Chenxi Whitehouse
Christian Salamea-Palacios
Dan John Velasco
D. Meur
Emilio Villa Cueva
Fajri Koto
Fauzan Farooqui … (see 57 more)
Frederico Belcavello
Ganzorig Batnasan
Gisela Vallejo
Gráinne Caulfield
Guido Ivetta
Haiyue Song
Henok Biadglign Ademtew
Hernán Maina
Holy Lovenia
Israel Abebe Azime
Jan Christian Blaise Cruz
Jay Gala
Jiahui Geng
Jesus-German Ortiz-Barajas
Jinheon Baek
Jocelyn Dunstan
Laura Alonso Alemany
Teresa Clifford
Kumaranage Ravindu Yasas Nagasinghe
Luciana Benotti
Luis Fernando D'Haro
Marcelo Viridiano
Marcos Estecha-Garitagoitia
Maria Camila Buitrago Cabrera
Mario Rodríguez-Cantelar
Mélanie Jouitteau
Mihail Minkov Mihaylov
Mohamed Fazli Mohamed Imam
Muhammad Farid Adilazuarda
Munkhjargal Gochoo
Munkh-Erdene Otgonbold
Naome Etori
Olivier NIYOMUGISHA
Paula Mónica Silva
Pranjal A Chitale
Raj Dabre
Rendi Chevi
Ruochen Zhang
Ryandito Diandaru
Samuel Cahyawijaya
Santiago Góngora
Soyeong Jeong
Sukannya Purkayastha
Tatsuki Kuribayashi
Thanmay Jayakumar
Tiago Timponi Torrent
Toqeer Ehsan
Vladimir Araujo
Yova Kementchedjhieva
Zara Burzo
Zheng Wei Lim
Zheng Xin Yong
Oana Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
Development of AI-assisted microscopy frameworks through realistic simulation in pySTED
Anthony Bilodeau
Albert Michaud-Gagnon
Julia Chabbert
Benoit Turcotte
Jörn Heine
Flavie Lavoie-Cardinal
The integration of artificial intelligence into microscopy systems significantly enhances performance, optimizing both the image acquisition… (see more) and analysis phases. Development of artificial intelligence (AI)-assisted super-resolution microscopy is often limited by the access to large biological datasets, as well as by the difficulties to benchmark and compare approaches on heterogeneous samples. We demonstrate the benefits of a realistic STED simulation platform, pySTED, for the development and deployment of AI-strategies for super-resolution microscopy. The simulation environment provided by pySTED allows the augmentation of data for the training of deep neural networks, the development of online optimization strategies, and the training of reinforcement learning models, that can be deployed successfully on a real microscope.
Expecting The Unexpected: Towards Broad Out-Of-Distribution Detection
Charles Guille-Escuret
Pierre-Andre Noel
David Vazquez
Joao Monteiro
Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation
Bowen Li
Zhaoyu Li
Qiwei Du
Jinqi Luo
Wenshan Wang
Yaqi Xie
Simon Stepputtis
Chen Wang
Katia P. Sycara
Pradeep Kumar Ravikumar
Alexander G. Gray
Sebastian Scherer
Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural n… (see more)etworks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as
Reactzyme: A Benchmark for Enzyme-Reaction Prediction
Chenqing Hua
Bozitao Zhong
Sitao Luan
Liang Hong
Shuangjia Zheng
Reconstructing Spatio-Temporal Trajectories of Visual Object Memories in the Human Brain
Julia Lifanov
Benjamin J. Griffiths
Juan Linde-Domingo
Catarina S. Ferreira
Martin Wilson
Stephen D. Mayhew
Maria Wimber
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y Fu
Quentin Gregory Anthony
Yonatan Oren
Shane Adams
Anton Alexandrov
Xiaozhong Lyu
Huu Nguyen
Xiaozhe Yao
Virginia Adams
Ben Athiwaratkun
Rahul Chalamala
Kezhen Chen
Max Ryabinin
Tri Dao
Percy Liang
Christopher Re
Ce Zhang
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
Joao Monteiro
Pierre-Andre Noel
Étienne Marcotte
Sai Rajeswar
Valentina Zantedeschi
David Vazquez
Perouz Taslakian
Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (see more)es encyclopedic documents that harbor a vast amount of general knowledge (*e.g.*, Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (*e.g.*, a news article) absent from the internet; (2) a question about the document’s topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.
TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs
Julia Gastinger
Shenyang Huang
Mikhail Galkin
Erfan Loghmani
Ali Parviz
Farimah Poursafaei
Jacob Danovitch
Emanuele Rossi
Ioannis Koutis
Heiner Stuckenschmidt