Publications

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

Arjun Ashok

Étienne Marcotte

Valentina Zantedeschi

Nicolas Chapados

Alexandre Drouin

We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including fore… (voir plus)casting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.

2024-01-15

ICLR.cc/2024/Conference (poster)

The Curse of Diversity in Ensemble-Based Exploration

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established … (voir plus)exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches.

2024-01-15

ICLR.cc/2024/Conference (poster)

On the Stability of Iterative Retraining of Generative Models on Their Own Data

Avishek (Joey) Bose

Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical … (voir plus)human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets -- from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.

2024-01-15

ICLR.cc/2024/Conference (spotlight)

Towards Foundation Models for Knowledge Graph Reasoning

Hesham Mostafa

Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable repre… (voir plus)sentations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.

2024-01-15

ICLR.cc/2024/Conference (poster)

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Jama Hussein Mohamud

Michael Craig

Cristian Gabellini

Kerstin Klasers

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Ioannis Koutis

Hadrien Mary

Therence Bois

Andrew Fitzgibbon

Błażej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have shown significant advancements in multiple fields. However, the lack of datasets with labeled f… (voir plus)eatures and codebases has hindered the development of a supervised foundation model for molecular tasks. Here, we have carefully curated seven datasets specifically tailored for node- and graph-level prediction tasks to facilitate supervised learning on molecules. Moreover, to support the development of multi-task learning on our proposed datasets, we created the Graphium graph machine learning library. Our dataset collection encompasses two distinct categories. Firstly, the TOYMIX category modifies three small existing datasets with additional data for multi-task learning. Secondly, the LARGEMIX category includes four large-scale datasets with 344M graph-level data points and 409M node-level data points from ∼5M unique molecules. Finally, the ultra-large dataset contains 2,210M graph-level data points and 2,031M node-level data points coming from 86M molecules. Hence our datasets represent an order of magnitude increase in data volume compared to other 2D-GNN datasets. In addition, recognizing that molecule-related tasks often span multiple levels, we have designed our library to explicitly support multi-tasking, offering a diverse range of multi-level representations, i.e., representations at the graph, node, edge, and node-pair level. We equipped the library with an extensive collection of models and features to cover different levels of molecule analysis. By combining our curated datasets with this versatile library, we aim to accelerate the development of molecule foundation models. Datasets and code are available at https://github.com/datamol-io/graphium.

2024-01-15

ICLR.cc/2024/Conference (poster)

BCG immunization induces CX3CR1hi effector memory T cells to provide cross-protection via IFN-γ-mediated trained immunity.

Kim A. Tran

Erwan Pernet

Mina Sadeghi

Jeffrey Downey

Julia Chronopoulos

Elizabeth Lapshina

Oscar Tsai

Eva Kaufmann

Jun Ding

Maziar Divangahi

2024-01-14

Nature Immunology (publié)

Self-Supervised Anomaly Detection: A Survey and Outlook

Hadi Hojjati

Thi Kieu Khanh Ho

Naregs Armanfard

Anomaly detection (AD) plays a crucial role in various domains, including cybersecurity, finance, and healthcare, by identifying patterns or… (voir plus) events that deviate from normal behaviour. In recent years, significant progress has been made in this field due to the remarkable growth of deep learning models. Notably, the advent of self-supervised learning has sparked the development of novel AD algorithms that outperform the existing state-of-the-art approaches by a considerable margin. This paper aims to provide a comprehensive review of the current methodologies in self-supervised anomaly detection. We present technical details of the standard methods and discuss their strengths and drawbacks. We also compare the performance of these models against each other and other state-of-the-art anomaly detection models. Finally, the paper concludes with a discussion of future directions for self-supervised anomaly detection, including the development of more effective and efficient algorithms and the integration of these techniques with other related fields, such as multi-modal learning.

2024-01-14

Neural Networks (inconnu)

Computational pathology: A survey review and the way forward

Mahdi S. Hosseini

Babak Ehteshami Bejnordi

Vincent Quoc-Huy Trinh

Danial Hasan

Xingwen Li

Taehyo Kim

Haochen Zhang

Theodore Wu

Kajanan Chinniah

Sina Maghsoudlou

Ryan Zhang

Stephen Yang

Jiadai Zhu

Lyndon Chan

Samir Khaki

Andrei Buin

Fatemeh Chaji

Ala Salehi

Alejandra Zambrano Luna

Bich Ngoc Nguyen … (voir 2 de plus)

Dimitris Samaras

Konstantinos N. Plataniotis

2024-01-13

Journal of Pathology Informatics (publié)

Assessing the quality and value of metabolic chart data for capturing core outcomes for pediatric medium-chain acyl-CoA dehydrogenase (MCAD) deficiency

Ryan Iverson

Monica Taljaard

Michael T. Geraghty

Michael Pugliese

Kylie Tingley

Doug Coyle

Jonathan B. Kronick

Kumanan Wilson

Valerie Austin

Catherine Brunel-Guitton

Daniela Buhas

Nancy J. Butcher

Alicia K. J. Chan

Sarah Dyack

Sharan Goobie

Cheryl Greenberg

Shailly Jain-Ghai

Michal Inbar-Feigenberg

Natalya Karp

Mariya Kozenko … (voir 30 de plus)

Erica Langley

Matthew Lines

Julian Little

Jennifer MacKenzie

Bruno Maranda

Saadet Mercimek-Andrews

Aizeddin Mhanni

John J. Mitchell

Laura Nagy

Martin Offringa

Amy Pender

Murray Potter

Chitra Prasad

Suzanne Ratko

Ramona Salvarinova

Andreas Schulze

Komudi Siriwardena

Neal Sondheimer

Rebecca Sparkes

Sylvia Stockler-Ipsiroglu

Kendra Tapscott

Yannis Trakadis

Lesley Turner

Clara Van Karnebeek

Anthony Vandersteen

Jagdeep S. Walia

Brenda J. Wilson

Andrea C. Yu

Beth K. Potter

Pranesh Chakraborty

2024-01-12

BMC Pediatrics (publié)

Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation Mitigation

Mauricio Rivera

Jean-François Godbout

Reihaneh Rabbany

Kellin Pelrine

2024-01-12

ArXiv (prépublication)

Comparing GPT-4 and Open-Source Language Models in Misinformation Mitigation

Tyler Vergho

Jean-François Godbout

Reihaneh Rabbany

Kellin Pelrine

Recent large language models (LLMs) have been shown to be effective for misinformation detection. However, the choice of LLMs for experiment… (voir plus)s varies widely, leading to uncertain conclusions. In particular, GPT-4 is known to be strong in this domain, but it is closed source, potentially expensive, and can show instability between different versions. Meanwhile, alternative LLMs have given mixed results. In this work, we show that Zephyr-7b presents a consistently viable alternative, overcoming key limitations of commonly used approaches like Llama-2 and GPT-3.5. This provides the research community with a solid open-source option and shows open-source models are gradually catching up on this task. We then highlight how GPT-3.5 exhibits unstable performance, such that this very widely used model could provide misleading results in misinformation detection. Finally, we validate new tools including approaches to structured output and the latest version of GPT-4 (Turbo), showing they do not compromise performance, thus unlocking them for future research and potentially enabling more complex pipelines for misinformation mitigation.

2024-01-11

ArXiv (prépublication)