Publications

Out-of-context Meta-learning in Large Language Models
Dmitrii Krasheninnikov
Egor Krasheninnikov
Brown et al. (2020) famously introduced the phenomenon of in-context meta-learning in large language models (LLMs). Our work establishes the… (voir plus) existence of a phenomenon we call out-of-context meta-learning via carefully designed synthetic experiments with large language models. We argue that out-of-context meta-learning is an important and surprising capability of LLMs, which may lead them to more readily "internalize" the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and apply it in appropriate contexts. We also raise the question of how this phenomenon emerges, and discuss two possible explanations: one relying on the way LLMs store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based methods may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks.
Robustifying Language Models with Test-Time Adaptation
Noah Thomas McDermott
Junfeng Yang
Chengzhi Mao
Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial languag… (voir plus)e examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o
Identifying Different Student Clusters in Functional Programming Assignments: From Quick Learners to Struggling Students
Chuqin Geng
Wenwen Xu
Yingjie Xu
Brigitte Pientka
Instructors and students alike are often focused on the grade in programming assignments as a key measure of how well a student is mastering… (voir plus) the material and whether a student is struggling. This can be, however, misleading. Especially when students have access to auto-graders, their grades may be heavily skewed. In this paper, we analyze student assignment submission data collected from a functional programming course taught at McGill university incorporating a wide range of features. In addition to the grade, we consider activity time data, time spent, and the number of static errors. This allows us to identify four clusters of students: "Quick-learning", "Hardworking", "Satisficing", and "Struggling" through cluster algorithms. We then analyze how work habits, working duration, the range of errors, and the ability to fix errors impact different clusters of students. This structured analysis provides valuable insights for instructors to actively help different types of students and emphasize different aspects of their overall course design. It also provides insights for students themselves to understand which aspects they still struggle with and allows them to seek clarification and adjust their work habits.
The end game: respecting major sources of population diversity
Jakub Kopal
Lucina Q. Uddin
Towards Democratizing Joint-Embedding Self-Supervised Learning
Florian Bordes
Randall Balestriero
Joint Embedding Self-Supervised Learning (JE-SSL) has seen rapid developments in recent years, due to its promise to effectively leverage la… (voir plus)rge unlabeled data. The development of JE-SSL methods was driven primarily by the search for ever increasing downstream classification accuracies, using huge computational resources, and typically built upon insights and intuitions inherited from a close parent JE-SSL method. This has led unwittingly to numerous pre-conceived ideas that carried over across methods e.g. that SimCLR requires very large mini batches to yield competitive accuracies; that strong and computationally slow data augmentations are required. In this work, we debunk several such ill-formed a priori ideas in the hope to unleash the full potential of JE-SSL free of unnecessary limitations. In fact, when carefully evaluating performances across different downstream tasks and properly optimizing hyper-parameters of the methods, we most often -- if not always -- see that these widespread misconceptions do not hold. For example we show that it is possible to train SimCLR to learn useful representations, while using a single image patch as negative example, and simple Gaussian noise as the only data augmentation for the positive pair. Along these lines, in the hope to democratize JE-SSL and to allow researchers to easily make more extensive evaluations of their methods, we introduce an optimized PyTorch library for SSL.
Rare CNVs and phenome-wide profiling highlight brain structural divergence and phenotypical convergence
Jakub Kopal
Kuldeep Kumar
Karin Saltoun
Claudia Modenato
Clara A. Moreau
Sandra Martin-Brevet
Guillaume Huguet
Martineau Jean-Louis
Charles-Olivier Martin
C.O. Martin
Zohra Saci
Nadine Younis
Petra Tamer
Elise Douard
Anne M. Maillard
Borja Rodriguez-Herreros
Aurélie Pain
Sonia Richetin
Leila Kushan
Ana I. Silva … (voir 13 de plus)
Marianne B.M. van den Bree
David E.J. Linden
M. J. Owen
Jeremy Hall
Sarah Lippé
Bogdan Draganski
Ida E. Sønderby
Ole A. Andreassen
David C. Glahn
Paul M. Thompson
Carrie E. Bearden
Sébastien Jacquemont
Ternary Quantization: A Survey
Danyang Liu
Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to … (voir plus)compress neural network models with faster inference and higher accuracy. Pruning and quantization are mainstream methods to this end. During model quantization, converting individual float values of layer weights to low-precision ones can substantially reduce the computational overhead and improve the inference speed. Many quantization methods have been studied, for example, vector quantization, low-bit quantization, and binary/ternary quantization. This survey focuses on ternary quantization. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods from the perspective of projection function and optimization methods.
A108 AUTOMATED DETECTION OF ILEOCECAL VALVE, APPENDICEAL ORIFICE, AND POLYP DURING COLONOSCOPY USING A DEEP LEARNING MODEL
Mahsa Taghiakbari
Sina Hamidi Ghalehjegh
E Jehanno
Tess Berthier
Lisa Di Jorio
Alan Barkun
Eric Deslandres
Simon Bouchard
Sacha Sidani
Daniel von Renteln
Assessing the Impact of Aircraft Arrival on Ambient Ultrafine Particle Number Concentrations in Near-Airport Communities in Boston, Massachusetts
Chloe S. Chung
Chloe S. Kim
Kevin James Lane
K. Lane
Flannery Black-Ingersoll
Claire Schollaert
Sijia Li
Matthew C. Simon
Jonathan I. Levy
J. Levy
Assessing the impact of aircraft arrival on ambient ultrafine particle number concentrations in near-airport communities in Boston, Massachusetts.
Chloe S. Chung
K. Lane
Flannery Black-Ingersoll
Claire Schollaert
Sijia Li
Matthew C. Simon
J. Levy
A Convex Reformulation and an Outer Approximation for a Large Class of Binary Quadratic Programs
Borzou Rostami
Fausto Errico
Design and Implementation of Smooth Renewable Power in Cloud Data Centers
Xinxin Liu
Yu Hua
Ling Yang
Yuanyuan Sun
The renewable power has been widely used in modern cloud data centers, which also produce large electricity bills and the negative impacts o… (voir plus)n environments. However, frequent fluctuation and intermittency of renewable power often cause the challenges in terms of the stability of both electricity grid and data centers, as well as decreasing the utilization of renewable power. Existing schemes fail to alleviate the renewable power fluctuation, which is caused by the essential properties of renewable power. In order to address this problem, we propose an efficient and easy-to-use smooth renewable power-aware scheme, called Smoother, which consists of Flexible Smoothing (FS) and Active Delay (AD). First, in order to smooth the fluctuation of renewable power, FS carries out the optimized charge/discharge operation via computing the minimum variance of the renewable power that is supplied to data centers per interval. Second, AD improves the utilization of renewable power via actively adjusting the execution time of deferrable workloads. Extensive experimental results via examining the traces of real-world data centers demonstrate that Smoother significantly reduces the negative impact of renewable power fluctuations on data centers and improves the utilization of renewable power by 250.88 percent on average. We have released the source codes for public use.