Publications

ReactZyme: A Benchmark for Enzyme-Reaction Prediction

Bozitao Zhong

Liang Hong

Shuangjia Zheng

Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptatio… (voir plus)ns. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation (https://github.com/WillHua127/ReactZyme).

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

Reconstructing Spatio-Temporal Trajectories of Visual Object Memories in the Human Brain

Julia Lifanov

Benjamin J. Griffiths

Juan Linde-Domingo

Catarina S. Ferreira

Martin Wilson

Stephen D. Mayhew

Ian Charest

Maria Wimber

2024-09-25

eNeuro (publié)

doi.org

RedPajama: an Open Dataset for Training Large Language Models

Maurice Weber

Daniel Y Fu

Quentin Gregory Anthony

Yonatan Oren

Shane Adams

Anton Alexandrov

Xiaozhong Lyu

Huu Nguyen

Xiaozhe Yao

Virginia Adams

Ben Athiwaratkun

Rahul Chalamala

Kezhen Chen

Max Ryabinin

Tri Dao

Percy Liang

Christopher Re

Irina Rish

Ce Zhang

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (spotlight)

doi.org

openreview.net

Repliqa: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Joao Monteiro

Pierre-Andre Noel

Étienne Marcotte

Sai Rajeswar

Valentina Zantedeschi

David Vázquez

Nicolas Chapados

Christopher Pal

Perouz Taslakian

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (voir plus)es encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

doi.org

openreview.net

TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

Erfan Loghmani

Emanuele Rossi

Ioannis Koutis

Heiner Stuckenschmidt

Reihaneh Rabbany

Guillaume Rabusseau

Multi-relational temporal graphs are powerful tools for modeling real-world data, capturing the evolving and interconnected nature of entiti… (voir plus)es over time. Recently, many novel models are proposed for ML on such graphs intensifying the need for robust evaluation and standardized benchmark datasets. However, the availability of such resources remains scarce and evaluation faces added complexity due to reproducibility issues in experimental protocols. To address these challenges, we introduce Temporal Graph Benchmark 2.0 (TGB 2.0), a novel benchmarking framework tailored for evaluating methods for predicting future links on Temporal Knowledge Graphs and Temporal Heterogeneous Graphs with a focus on large-scale datasets, extending the Temporal Graph Benchmark. TGB 2.0 facilitates comprehensive evaluations by presenting eight novel datasets spanning five domains with up to 53 million edges. TGB 2.0 datasets are significantly larger than existing datasets in terms of number of nodes, edges, or timestamps. In addition, TGB 2.0 provides a reproducible and realistic evaluation pipeline for multi-relational temporal graphs. Through extensive experimentation, we observe that 1) leveraging edge-type information is crucial to obtain high performance, 2) simple heuristic baselines are often competitive with more complex methods, 3) most methods fail to run on our largest datasets, highlighting the need for research on more scalable methods.

2024-09-25

Datasets and Benchmarks Track @ Neural Information Processing Systems (poster)

doi.org

openreview.net

The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track

Eshta Bhardwaj

Harshit Gujral

Siyi Wu

Ciara Zogheib

Tegan Maharaj

Christoph Becker

Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not… (voir plus) millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models - evidenced not least by the creation of the Datasets and Benchmarks track itself. This work provides an analysis of dataset development practices at NeurIPS through the lens of data curation. We present an evaluation framework for dataset documentation, consisting of a rubric and toolkit developed through a literature review of data curation principles. We use the framework to assess the strengths and weaknesses in current dataset development practices of 60 datasets published in the NeurIPS Datasets and Benchmarks track from 2021-2023. We summarize key findings and trends. Results indicate greater need for documentation about environmental footprint, ethical considerations, and data management. We suggest targeted strategies and resources to improve documentation in these areas and provide recommendations for the NeurIPS peer-review process that prioritize rigorous data curation in ML. Finally, we provide results in the format of a dataset that showcases aspects of recommended data curation practices. Our rubric and results are of interest for improving data curation practices broadly in the field of ML as well as to data curation and science and technology studies scholars studying practices in ML. Our aim is to support continued improvement in interdisciplinary research on dataset practices, ultimately improving the reusability and reproducibility of new datasets and benchmarks, enabling standardized and informed human oversight, and strengthening the foundation of rigorous and responsible ML research.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (spotlight)

doi.org

openreview.net

Using Unity to Help Solve Reinforcement Learning

Connor Brennan

Andrew Robert Williams

Omar G. Younis

Vedant Vyas

Daria Yasafova

Irina Rish

Leveraging the depth and flexibility of XLand as well as the rapid prototyping features of the Unity engine, we present the United Unity Uni… (voir plus)verse — an open-source toolkit designed to accelerate the creation of innovative reinforcement learning environments. This toolkit includes a robust implementation of XLand 2.0 complemented by a user-friendly interface which allows users to modify the details of procedurally generated terrains and task rules with ease. Additionally, we provide a curated selection of terrains and rule sets, accompanied by implementations of reinforcement learning baselines to facilitate quick experimentation with novel architectural designs for adaptive agents. Furthermore, we illustrate how the United Unity Universe serves as a high-level language that enables researchers to develop diverse and endlessly variable 3D environments within a unified framework. This functionality establishes the United Unity Universe (U3) as an essential tool for advancing the field of reinforcement learning, especially in the development of adaptive and generalizable learning systems.

2024-09-25

NeurIPS.cc/2024/Datasets_and_Benchmarks_Track (poster)

openreview.net

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks

Léo Boisvert

Megh Thakkar

Maxime Gasse

Massimo Caccia

Thibault Le Sellier De Chezelles

Quentin Cappart

Nicolas Chapados

Alexandre Lacoste

Alexandre Drouin

The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though recen… (voir plus)t LLMs seem capable of planning and reasoning given user instructions, their effectiveness in applying these capabilities for autonomous task solving remains underexplored. This is especially true in enterprise settings, where automated agents hold the promise of a high impact. To fill this gap, we propose WorkArena++, a novel benchmark consisting of 682 tasks corresponding to realistic workflows routinely performed by knowledge workers. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents. Our empirical studies across state-of-the-art LLMs and vision-language models (VLMs), as well as human workers, reveal several challenges for such models to serve as useful assistants in the workplace. In addition to the benchmark, we provide a mechanism to effortlessly generate thousands of ground-truth observation/action traces, which can be used for fine-tuning existing models. Overall, we expect this work to serve as a useful resource to help the community progress toward capable autonomous agents. The benchmark can be found at https://github.com/ServiceNow/WorkArena.

2024-09-25

Datasets and Benchmarks Track @ Neural Information Processing Systems (poster)

doi.org

openreview.net

4+3 Phases of Compute-Optimal Neural Scaling Laws

Elliot Paquette

Courtney Paquette

Lechao Xiao

Jeffrey Pennington

We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use t… (voir plus)his neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget.

2024-09-24

NeurIPS.cc/2024/Conference (spotlight)

doi.org

openreview.net

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Harley Wiltzer

Bellemare Marc-Emmanuel

David Meger

Patrick Shafto

Yash Jhaveri

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In… (voir plus) turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases. We quantify the rate of collapse of these return distributions and exhibit that their statistics collapse at different rates. Moreover, we define distributional perspectives on action gaps and advantages. In particular, we introduce the superiority as a probabilistic generalization of the advantage -- the core object of approaches to mitigating performance issues in high-frequency value-based RL. In addition, we build a superiority-based DRL algorithm. Through simulations in an option-trading domain, we validate that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Arushi Jain

Josiah P. Hanna

Doina Precup

General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expecte… (voir plus)d return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce GVFExplorer, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel. Our method optimizes the behavior policy by minimizing the total variance in return across GVFs, thereby reducing the required environmental interactions. We use an existing temporal-difference-style variance estimator to approximate the return variance. We prove that each behavior policy update decreases the overall mean squared error in GVF predictions. We empirically show our method's performance in tabular and nonlinear function approximation settings, including Mujoco environments, with stationary and non-stationary reward signals, optimizing data usage and reducing prediction errors across multiple GVFs.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Amortizing intractable inference in diffusion models for vision, language, and control

Nikolay Malkin

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (voir plus)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Publications