Publications

Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration

Xiangyu Zhao

Hannes Stärk

Dominique Beaini

Pietro Lio

Yiren Zhao

2023-03-05

ICLR.cc/2023/Workshop/MLDD (poster)

Improved Robustness Against Adaptive Attacks With Ensembles and Error-Correcting Output Codes

Thomas Philippon

Christian Gagné

Neural network ensembles have been studied extensively in the context of adversarial robustness and most ensemble-based approaches remain vu… (see more)lnerable to adaptive attacks. In this paper, we investigate the robustness of Error-Correcting Output Codes (ECOC) ensembles through architectural improvements and ensemble diversity promotion. We perform a comprehensive robustness assessment against adaptive attacks and investigate the relationship between ensemble diversity and robustness. Our results demonstrate the benefits of ECOC ensembles for adversarial robustness compared to regular ensembles of convolutional neural networks (CNNs) and show why the robustness of previous implementations is limited. We also propose an adversarial training method specific to ECOC ensembles that allows to further improve robustness to adaptive attacks.

2023-03-03

ArXiv (preprint)

Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations

Shashank Shekhar

Florian Bordes

P Vincent

Ari S. Morcos

Joint-embedding based learning (e.g., SimCLR, MoCo, DINO) and reconstruction-based learning (e.g., BEiT, SimMIM, MAE) are the two leading pa… (see more)radigms for self-supervised learning of vision transformers, but they differ substantially in their transfer performance. Here, we aim to explain these differences by analyzing the impact of these objectives on the structure and transferability of their representations. Our analysis reveals that reconstruction-based learning features are significantly dissimilar to joint-embedding based learning features and that models trained with similar objectives learn similar features even across architectures. These differences arise early in the network, primarily driven by attention and normalization layers. We find that joint-embedding features yield better linear probe transfer for classification because the different objectives drive different distributions of information and invariances in the representation. These differences explain opposite trends in transfer performance for downstream tasks that require spatial specificity in features. Finally, we address how fine-tuning changes reconstructive representations to enable better transfer, showing that it re-organizes the information to be more similar to pre-trained joint embedding models.

2023-03-03

ICLR.cc/2023/Workshop/ME-FoMo (spotlight)

Out-of-context Meta-learning in Large Language Models

Dmitrii Krasheninnikov

Egor Krasheninnikov

David M. Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context meta-learning in large language models (LLMs). Our work establishes the… (see more) existence of a phenomenon we call out-of-context meta-learning via carefully designed synthetic experiments with large language models. We argue that out-of-context meta-learning is an important and surprising capability of LLMs, which may lead them to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and apply it in appropriate contexts. We also raise the question of how this phenomenon emerges, and discuss two possible explanations: one relying on the way LLMs store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based methods may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks.

2023-03-03

ICLR.cc/2023/Workshop/ME-FoMo (poster)

Robustifying Language Models with Test-Time Adaptation

Noah Thomas McDermott

Junfeng Yang

Chengzhi Mao

Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial languag… (see more)e examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o

2023-03-03

ICLR.cc/2023/Workshop/Trustworthy_ML (poster)

Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students

Chuqin Geng

Wenwen Xu

Yingjie Xu

Brigitte Pientka

Xujie Si

Instructors and students alike are often focused on the grade in programming assignments as a key measure of how well a student is mastering… (see more) the material and whether a student is struggling. This can be, however, misleading. Especially when students have access to auto-graders, their grades may be heavily skewed. In this paper, we analyze student assignment submission data collected from a functional programming course taught at McGill university incorporating a wide range of features. In addition to the grade, we consider activity time data, time spent, and the number of static errors. This allows us to identify four clusters of students: "Quick-learning", "Hardworking", "Satisficing", and "Struggling" through cluster algorithms. We then analyze how work habits, working duration, the range of errors, and the ability to fix errors impact different clusters of students. This structured analysis provides valuable insights for instructors to actively help different types of students and emphasize different aspects of their overall course design. It also provides insights for students themselves to understand which aspects they still struggle with and allows them to seek clarification and adjust their work habits.

2023-03-02

Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (published)

The end game: respecting major sources of population diversity

Jakub Kopal

Lucina Q. Uddin

Danilo Bzdok

Human neuroscience is enjoying burgeoning population data resources: large-scale cohorts with thousands of participant profiles of gene expr… (see more)ession, brain scanning and sociodemographic measures. The depth of phenotyping puts us in a better position than ever to fully embrace major sources of population diversity as effects of interest to illuminate mechanisms underlying brain health.

2023-03-02

Nature Methods (published)

Towards Democratizing Joint-Embedding Self-Supervised Learning

Florian Bordes

Randall Balestriero

P Vincent

Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage la… (see more)rge unlabeled data. The development of JE-SSL methods was driven primarily by the search for ever increasing downstream classification accuracies, using huge computational resources, and typically built upon insights and intuitions inherited from a close parent JE-SSL method. This has led unwittingly to numerous pre-conceived ideas that carried over across methods e.g. that SimCLR requires very large mini batches to yield competitive accuracies; that strong and computationally slow data augmentations are required. In this work, we debunk several such ill-formed a priori ideas in the hope to unleash the full potential of JE-SSL free of unnecessary limitations. In fact, when carefully evaluating performances across different downstream tasks and properly optimizing hyper-parameters of the methods, we most often -- if not always -- see that these widespread misconceptions do not hold. For example we show that it is possible to train SimCLR to learn useful representations, while using a single image patch as negative example, and simple Gaussian noise as the only data augmentation for the positive pair. Along these lines, in the hope to democratize JE-SSL and to allow researchers to easily make more extensive evaluations of their methods, we introduce an optimized PyTorch library for SSL.

2023-03-02

ArXiv (preprint)

Rare CNVs and phenome-wide profiling highlight brain structural divergence and phenotypical convergence

Jakub Kopal

Kuldeep Kumar

Karin Saltoun

Claudia Modenato

Clara A. Moreau

Sandra Martin-Brevet

Guillaume Huguet

Martineau Jean-Louis

Charles-Olivier Martin

Zohra Saci

Nadine Younis

Petra Tamer

Elise Douard

Anne M. Maillard

Borja Rodriguez-Herreros

Aurélie Pain

Sonia Richetin

Leila Kushan

Ana I. Silva … (see 13 more)

Marianne B. M. van den Bree

David E. J. Linden

Michael J. Owen

Jeremy Hall

Sarah Lippé

Bogdan Draganski

Ida E. Sønderby

Ole A. Andreassen

David C. Glahn

Paul M. Thompson

Carrie E. Bearden

Sébastien Jacquemont

Danilo Bzdok

Copy number variations (CNVs) are rare genomic deletions and duplications that can affect brain and behaviour. Previous reports of CNV pleio… (see more)tropy imply that they converge on shared mechanisms at some level of pathway cascades, from genes to large-scale neural circuits to the phenome. However, existing studies have primarily examined single CNV loci in small clinical cohorts. It remains unknown, for example, how distinct CNVs escalate vulnerability for the same developmental and psychiatric disorders. Here we quantitatively dissect the associations between brain organization and behavioural differentiation across 8 key CNVs. In 534 CNV carriers, we explored CNV-specific brain morphology patterns. CNVs were characteristic of disparate morphological changes involving multiple large-scale networks. We extensively annotated these CNV-associated patterns with ~1,000 lifestyle indicators through the UK Biobank resource. The resulting phenotypic profiles largely overlap and have body-wide implications, including the cardiovascular, endocrine, skeletal and nervous systems. Our population-level investigation established brain structural divergences and phenotypical convergences of CNVs, with direct relevance to major brain disorders.

2023-03-01

Nature Human Behaviour (unknown)

Ternary Quantization: A Survey

Danyang Liu

Xue Liu

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to … (see more)compress neural network models with faster inference and higher accuracy. Pruning and quantization are mainstream methods to this end. During model quantization, converting individual float values of layer weights to low-precision ones can substantially reduce the computational overhead and improve the inference speed. Many quantization methods have been studied, for example, vector quantization, low-bit quantization, and binary/ternary quantization. This survey focuses on ternary quantization. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods from the perspective of projection function and optimization methods.

2023-03-01

ArXiv (preprint)