Portrait de Christopher Morris n'est pas disponible

Christopher Morris

Alumni

Publications

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Joao Alex Cunha
Zhiyi Li
Samuel Maddrell-Mander
Callum McLean
Jama Hussein Mohamud
Michael Craig
Cristian Gabellini
Kerstin Klaser
Josef Dean
Maciej Sypetkowski
Ioannis Koutis
Hadrien Mary
Therence Bois
Andrew William Fitzgibbon
Blazej Banaszewski
Chad Martin
Dominic Masters
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.
The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights
Simon Bowly
Jonas Charfreitag
Didier Chételat
Antonia Chmiela
Justin Dumouchelle
Ambros Gleixner
Aleksandr Kazachkov
Elias Boutros Khalil
Paweł Lichocki
Andrea Lodi
Miles Lubin
Chris J. Maddison
D. Papageorgiou
Augustin Parjadis
Sebastian Pokutta
Lara Scavuzzo
Linxin Yangm
Sha Lai
Akang Wang
Xiaodong Luo
Xiang Zhou
Haohan Huang
Sheng Cheng Shao
Yuanming Zhu
Dong Dong Zhang
Tao Manh Quan
Zixuan Cao
Yang Xu
Zhewei Huang
Shuchang Zhou
C. Binbin
He Minggui
Haoren Ren Hao
Zhang Zhiyu
An Zhiwu
Mao Kun
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused … (voir plus)on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.