On the Stability of Iterative Retraining of Generative Models on their own Data
Quentin Bertrand
Alexandre Duplessis
Marco Jiralerspong
Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical … (voir plus)human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets---from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.
Towards Foundation Models for Knowledge Graph Reasoning
Mikhail Galkin
Xinyu Yuan
Hesham Mostafa
Zhaocheng Zhu
Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable repre… (voir plus)sentations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Shenyang Huang
Joao Alex Cunha
Zhiyi Li
Gabriela Moisescu-Pareja
Oleksandr Dymov
Samuel Maddrell-Mander
Callum McLean
Frederik Wenkel
Luis Müller
Jama Hussein Mohamud
Ali Parviz
Michael Craig
Michał Koziarski
Jiarui Lu
Zhaocheng Zhu
Cristian Gabellini
Kerstin Klaser
Josef Dean
Cas Wognum … (voir 15 de plus)
Maciej Sypetkowski
Christopher Morris
Ioannis Koutis
Prudencio Tossou
Hadrien Mary
Therence Bois
Andrew William Fitzgibbon
Blazej Banaszewski
Chad Martin
Dominic Masters
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.
Tree Cross Attention
Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Cross Attention is a popular method for retrieving information from a set of context tokens for making predictions. At inference time, for e… (voir plus)ach prediction, Cross Attention scans the full set of
Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias
Dominic Rampas
Mats Leon Richter
Marc Aubreville
BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.
Kim A. Tran
Erwan Pernet
Mina Sadeghi
Jeffrey Downey
Julia Chronopoulos
Elizabeth Lapshina
Oscar Tsai
Eva Kaufmann
Maziar Divangahi
BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.
Kim A. Tran
Erwan Pernet
Mina Sadeghi
Jeffrey Downey
Julia Chronopoulos
Elizabeth Lapshina
Oscar Tsai
Eva Kaufmann
Maziar Divangahi
BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.
Kim A. Tran
Erwan Pernet
Mina Sadeghi
Jeffrey Downey
Julia Chronopoulos
Elizabeth Lapshina
Oscar Tsai
Eva Kaufmann
Maziar Divangahi
BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.
Kim A. Tran
Erwan Pernet
Mina Sadeghi
Jeffrey Downey
Julia Chronopoulos
Elizabeth Lapshina
Oscar Tsai
Eva Kaufmann
Maziar Divangahi
Computational pathology: A survey review and the way forward
Mahdi S. Hosseini
Babak Ehteshami Bejnordi
Vincent Quoc-Huy Trinh
Danial Hasan
Xingwen Li
Taehyo Kim
Haochen Zhang
Theodore Wu
Kajanan Chinniah
Sina Maghsoudlou
Ryan Zhang
Stephen Yang
Jiadai Zhu
Lyndon Chan
Samir Khaki
Andrei Buin
Fatemeh Chaji
Ala Salehi
Alejandra Zambrano Luna
Bich Ngoc Nguyen … (voir 2 de plus)
Dimitris Samaras
Konstantinos N. Plataniotis
Assessing the quality and value of metabolic chart data for capturing core outcomes for pediatric medium-chain acyl-CoA dehydrogenase (MCAD) deficiency
Ryan Iverson
Monica Taljaard
Michael T. Geraghty
Michael Pugliese
Kylie Tingley
Doug Coyle
Jonathan B. Kronick
Kumanan Wilson
Valerie Austin
Catherine Brunel-Guitton
Daniela Buhas
Nancy J. Butcher
Alicia K. J. Chan
Sarah Dyack
Sharan Goobie
Cheryl Greenberg
Shailly Jain-Ghai
Michal Inbar-Feigenberg
Natalya Karp
Mariya Kozenko … (voir 30 de plus)
Erica Langley
Matthew Lines
Julian Little
Jennifer MacKenzie
Bruno Maranda
Saadet Mercimek-Andrews
Aizeddin Mhanni
John J. Mitchell
Laura Nagy
Martin Offringa
Amy Pender
Murray Potter
Chitra Prasad
Suzanne Ratko
Ramona Salvarinova
Andreas Schulze
Komudi Siriwardena
Neal Sondheimer
Rebecca Sparkes
Sylvia Stockler-Ipsiroglu
Kendra Tapscott
Lesley Turner
Clara Van Karnebeek
Anthony Vandersteen
Jagdeep S. Walia
Brenda J. Wilson
Andrea C. Yu
Beth K. Potter
Pranesh Chakraborty
Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation
Mauricio Rivera
Kellin Pelrine
Large Language Models have emerged as prime candidates to tackle misinformation mitigation. However, existing approaches struggle with hallu… (voir plus)cinations and overconfident predictions. We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods to provide better calibration for NLP misinformation mitigation solutions. We first investigate the calibration of sample-based consistency methods that exploit distinct features of consistency across sample sizes and stochastic levels. Next, we evaluate the performance and distributional shift of a robust numeric verbalization prompt across single vs. two-step confidence elicitation procedure. We also compare the performance of the same prompt with different versions of GPT and different numerical scales. Finally, we combine the sample-based consistency and verbalized methods to propose a hybrid framework that yields a better uncertainty estimation for GPT models. Overall, our work proposes novel uncertainty quantification methods that will improve the reliability of Large Language Models in misinformation mitigation applications.