Christopher Morris

Alumni

Publications

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Jama Hussein Mohamud

Michael Craig

Cristian Gabellini

Kerstin Klaser

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Ioannis Koutis

Hadrien Mary

Therence Bois

Andrew William Fitzgibbon

Blazej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.

2024-01-16

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights

Maxime Gasse

Simon Bowly

Quentin Cappart

Jonas Charfreitag

Laurent Charlin

Didier Chételat

Antonia Chmiela

Justin Dumouchelle

Ambros Gleixner

Aleksandr Kazachkov

Elias Boutros Khalil

Paweł Lichocki

Andrea Lodi

Miles Lubin

Chris J. Maddison

Christopher Morris

D. Papageorgiou

Augustin Parjadis

Sebastian Pokutta

Antoine Prouvost … (voir 22 de plus)

Lara Scavuzzo

Giulia Zarpellon

Linxin Yangm

Sha Lai

Akang Wang

Xiaodong Luo

Xiang Zhou

Haohan Huang

Sheng Cheng Shao

Yuanming Zhu

Dong Dong Zhang

Tao Manh Quan

Zixuan Cao

Yang Xu

Zhewei Huang

Shuchang Zhou

C. Binbin

He Minggui

Haoren Ren Hao

Zhang Zhiyu

An Zhiwu

Mao Kun

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (voir plus)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.

2021-01-01

NeurIPS (Competition and Demos) (publié)

doi.org

arxiv.org

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Christopher Morris

Publications

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Mots-clés populaires:

Christopher Morris

Publications