Publications

Lifelong Online Learning from Accumulated Knowledge
Changjian Shui
William Wang
Ihsen Hedhli
Chi Man Wong
Feng Wan
Boyu Wang
In this article, we formulate lifelong learning as an online transfer learning procedure over consecutive tasks, where learning a given task… (see more) depends on the accumulated knowledge. We propose a novel theoretical principled framework, lifelong online learning, where the learning process for each task is in an incremental manner. Specifically, our framework is composed of two-level predictions: the prediction information that is solely from the current task; and the prediction from the knowledge base by previous tasks. Moreover, this article tackled several fundamental challenges: arbitrary or even non-stationary task generation process, an unknown number of instances in each task, and constructing an efficient accumulated knowledge base. Notably, we provide a provable bound of the proposed algorithm, which offers insights on the how the accumulated knowledge improves the predictions. Finally, empirical evaluations on both synthetic and real datasets validate the effectiveness of the proposed algorithm.
OptEmbed: Learning Optimal Embedding Table for Click-through Rate Prediction
Fuyuan Lyu
Xing Tang
Hong Zhu
Huifeng Guo
Yingxue Zhang
Ruiming Tang
Click-through rate (CTR) prediction model usually consists of three components: embedding table, feature interaction layer, and classifier. … (see more)Learning embedding table plays a fundamental role in CTR prediction from the view of the model performance and memory usage. The embedding table is a two-dimensional tensor, with its axes indicating the number of feature values and the embedding dimension, respectively. To learn an efficient and effective embedding table, recent works either assign various embedding dimensions for feature fields and reduce the number of embeddings respectively or mask the embedding table parameters. However, all these existing works cannot get an optimal embedding table. On the one hand, various embedding dimensions still require a large amount of memory due to the vast number of features in the dataset. On the other hand, decreasing the number of embeddings usually suffers from performance degradation, which is intolerable in CTR prediction. Finally, pruning embedding parameters will lead to a sparse embedding table, which is hard to be deployed. To this end, we propose an optimal embedding table learning framework OptEmbed, which provides a practical and general method to find an optimal embedding table for various base CTR models. Specifically, we propose pruning the redundant embeddings regarding corresponding features' importance by learnable pruning thresholds. Furthermore, we consider assigning various embedding dimensions as one single candidate architecture. To efficiently search the optimal embedding dimensions, we design a uniform embedding dimension sampling scheme to equally train all candidate architectures, meaning architecture-related parameters and learnable thresholds are trained simultaneously in one supernet. We then propose an evolution search method based on the supernet to find the optimal embedding dimensions for each field. Experiments on public datasets show that OptEmbed can learn a compact embedding table which can further improve the model performance.
Inductive biases for deep learning of higher-level cognition
Anirudh Goyal
Lookback for Learning to Branch
Prateek Gupta
Elias Boutros Khalil
Didier Chételat
M. Pawan Kumar
Dissecting adaptive methods in GANs
Samy Jelassi
David Dobre
Arthur Mensch
Yuanzhi Li
Adaptive methods are a crucial component widely used for training generative adversarial networks (GANs). While there has been some work to … (see more)pinpoint the “marginal value of adaptive methods” in standard tasks, it remains unclear why they are still critical for GAN training. In this paper, we formally study how adaptive methods help train GANs; inspired by the grafting method proposed in Agarwal et al. (2020), we separate the magnitude and direction components of the Adam updates, and graft them to the direction and magnitude of SGDA updates respectively. By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training. This motivates us to have a closer look at the class of normalized stochastic gradient descent ascent (nSGDA) methods in the context of GAN training. We propose a synthetic theoretical framework to compare the performance of nSGDA and SGDA for GAN training with neural networks. We prove that in that setting, GANs trained with nSGDA recover all the modes of the true distribution, whereas the same networks trained with SGDA (and any learning rate configuration) suffer from mode collapse. The critical insight in our analysis is that normalizing the gradients forces the discriminator and generator to be updated at the same pace. We also experimentally show that for several datasets, Adam’s performance can be recovered with nSGDA methods.
PipeBERT: High-throughput BERT Inference for ARM Big.LITTLE Multi-core Processors
Hung-Yang Chang
Seyyed Hasan Mozafari
Cheng Chen
James J. Clark
Brett Meyer
Novice Type Error Diagnosis with Natural Language Models
Chuqin Geng
Haolin Ye
Yixuan Li
Tianyu Han
Brigitte Pientka
Strong static type systems help programmers eliminate many errors without much burden of supplying type annotations. However, this flexibili… (see more)ty makes it highly non-trivial to diagnose ill-typed programs, especially for novice programmers. Compared to classic constraint solving and optimization-based approaches, the data-driven approach has shown great promise in identifying the root causes of type errors with higher accuracy. Instead of relying on hand-engineered features, this work explores natural language models for type error localization, which can be trained in an end-to-end fashion without requiring any features. We demonstrate that, for novice type error diagnosis, the language model-based approach significantly outperforms the previous state-of-the-art data-driven approach. Specifically, our model could predict type errors correctly 62% of the time, outperforming the state-of-the-art Nate's data-driven model by 11%, in a more rigorous accuracy metric. Furthermore, we also apply structural probes to explain the performance difference between different language models.
Functional connectivity subtypes associate robustly with ASD diagnosis
S. Urchs
Angela Tam
Pierre Orban
C. Moreau
Yassine Benhajali
Hien Duy Nguyen
Alan C. Evans
Our understanding of the changes in functional brain organization in autism is hampered by the extensive heterogeneity that characterizes th… (see more)is neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose autism heterogeneity into subtypes of connectivity and promises an unbiased framework to investigate behavioral symptoms and causative genetic factors. Yet, the robustness and generalizability of functional connectivity subtypes is unknown. Here, we show that a simple hierarchical cluster analysis can robustly relate a given individual and brain network to a connectivity subtype, but that continuous assignments are more robust than discrete ones. We also found that functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and these associations generalize to independent replication data. We explored systematically 18 different brain networks as we expected them to associate with different behavioral profiles as well as different key regions. Contrary to this prediction, autism functional connectivity subtypes converged on a common topography across different networks, consistent with a compression of the primary gradient of functional brain organization, as previously reported in the literature. Our results support the use of data driven clustering as a reliable data dimensionality reduction technique, where any given dimension only associates moderately with clinical manifestations.
Functional connectivity subtypes associate robustly with ASD diagnosis
Sebastian G. W. Urchs
Angela Tam
Pierre Orban
Clara A. Moreau
Yassine Benhajali
Hien Duy Nguyen
Alan C. Evans
Our understanding of the changes in functional brain organization in autism is hampered by the extensive heterogeneity that characterizes th… (see more)is neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose autism heterogeneity into subtypes of connectivity and promises an unbiased framework to investigate behavioral symptoms and causative genetic factors. Yet, the robustness and generalizability of functional connectivity subtypes is unknown. Here, we show that a simple hierarchical cluster analysis can robustly relate a given individual and brain network to a connectivity subtype, but that continuous assignments are more robust than discrete ones. We also found that functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and these associations generalize to independent replication data. We explored systematically 18 different brain networks as we expected them to associate with different behavioral profiles as well as different key regions. Contrary to this prediction, autism functional connectivity subtypes converged on a common topography across different networks, consistent with a compression of the primary gradient of functional brain organization, as previously reported in the literature. Our results support the use of data driven clustering as a reliable data dimensionality reduction technique, where any given dimension only associates moderately with clinical manifestations.
Protective effectiveness of prior SARS-CoV-2 infection and hybrid immunity against Omicron infection and severe disease: a systematic review and meta-regression
Niklas Bobrovitz
Harriet Ware
Xiaomeng Ma
Zihan Li
Reza Hosseini
Christian Cao
Anabel Selemon
Mairead Whelan
Zahra Premji
Hanane Issa
Brianna Cheng
L. Abu-Raddad
M. D. Kerkhove
Vanessa Piechotta
Melissa M Higdon
Annelies Wilder-Smith
Isabel Bergeri
Daniel R Feikin
Rahul K. Arora … (see 2 more)
Minal K Patel
Lorenzo Subissi
Background We aimed to systematically review the magnitude and duration of the protective effectiveness of prior infection (PE) and hybrid i… (see more)mmunity (HE) against Omicron infection and severe disease. Methods We searched pre-print and peer-reviewed electronic databases for controlled studies from January 1, 2020, to June 1, 2022. Risk of bias (RoB) was assessed using the Risk of Bias In Non-Randomized Studies of Interventions (ROBINS-I)-Tool. We used random-effects meta-regression to estimate the magnitude of protection at 1-month intervals and the average change in protection since the last vaccine dose or infection from 3 months to 6 or 12 months. We compared our estimates of PE and HE to previously published estimates of the magnitude and durability of vaccine effectiveness (VE) against Omicron. Findings Eleven studies of prior infection and 15 studies of hybrid immunity were included. For prior infection, there were 97 estimates (27 at moderate RoB and 70 at serious RoB), with the longest follow up at 15 months. PE against hospitalization or severe disease was 82.5% [71.8-89.7%] at 3 months, and 74.6% [63.1-83.5%] at 12 months. PE against reinfection was 65.2% [52.9-75.9%] at 3 months, and 24.7% [16.4-35.5%] at 12 months. For HE, there were 153 estimates (78 at moderate RoB and 75 at serious RoB), with the longest follow up at 11 months for primary series vaccination and 4 months for first booster vaccination. Against hospitalization or severe disease, HE involving either primary series vaccination or first booster vaccination was consistently >95% for the available follow up. Against reinfection, HE involving primary series vaccination was 69.0% [58.9-77.5%] at 3 months after the most recent infection or vaccination, and 41.8% [31.5-52.8%] at 12 months, while HE involving first booster vaccination was 68.6% [58.8-76.9%] at 3 months, and 46.5% [36.0-57.3%] at 6 months. Against hospitalization or severe disease at 6 months, hybrid immunity with first booster vaccination (effectiveness 95.3% [81.9-98.9%]) or with primary series alone (96.5% [90.2-98.8%]) provided significantly greater protection than prior infection alone (80.1% [70.3-87.2%]), first booster vaccination alone (76.7% [72.5-80.4%]), or primary series alone (64.6% [54.5-73.6%]). Results for protection against reinfection were similar. Interpretation Prior infection and hybrid immunity both provided greater and more sustained protection against Omicron than vaccination alone. All protection estimates waned quickly against infection but remained high for hospitalisation or severe disease. Individuals with hybrid immunity had the highest magnitude and durability of protection against all outcomes, reinforcing the global imperative for vaccination.
A Reproducible and Realistic Evaluation of Partial Domain Adaptation Methods
Unsupervised Domain Adaptation (UDA) aims at classifying unlabeled target images leveraging source labeled ones. In this work, we consider t… (see more)he Partial Domain Adaptation (PDA) variant, where we have extra source classes not present in the target domain. Most successful algorithms use model selection strategies that rely on target labels to find the best hyper-parameters and/or models along training. However, these strategies violate the main assumption in PDA: only unlabeled target domain samples are available. Moreover, there are also inconsistencies in the experimental settings - architecture, hyper-parameter tuning, number of runs - yielding unfair comparisons. The main goal of this work is to provide a realistic evaluation of PDA methods with the different model selection strategies under a consistent evaluation protocol. We evaluate 7 representative PDA algorithms on 2 different real-world datasets using 7 different model selection strategies. Our two main findings are: (i) without target labels for model selection, the accuracy of the methods decreases up to 30 percentage points; (ii) only one method and model selection pair performs well on both datasets. Experiments were performed with our PyTorch framework, BenchmarkPDA, which we open source.
Revisiting the Impact of Anti-patterns on Fault-Proneness: A Differentiated Replication
Aurel Ikama
Vincent Du
Philippe Belias
Biruk Asmare Muse
Mohammad Hamdaqa
Anti-patterns manifesting on software code through code smells have been investigated in terms of their prevalence, detection, refactoring, … (see more)and impact on software quality attributes. In particular, leveraging heuristics to identify fault-fixing commits, Khomh et al. have found that anti-patterns and code smells have an impact on the fault-proneness of a software system. Similarly, Saboury et al. found a relationship between anti-pattern occurrences and fault-proneness, using heuristic to identify fault-fixing commits and fault-inducing changes. However, recent studies question the accuracy of heuristics, and thus the validity of empirical studies that leverage it. Hence, in this work, we would like to investigate to what extent the results of empirical studies using heuristics to identify bug fix commits are affected by the limitations of the heuristics based approach using manually validated bug fix commits as a ground truth. In particular, we conduct a differentiated replication of the work by Khomh et al. We particularly focused on the impact of anti-patterns on fault-proneness as it is the only dependent variable that may be affected by noise in the collected faults data. In our differentiated replication study, (1) we expanded the number of subject systems from 5 to 38, (2) utilized a manually validated dataset of bug-fixing commits from the work of Herbold et al., and (3) answered research questions from Khomh et al., that are related to the relationship between anti-pattern occurrences and fault-proneness. (4) We added an additional research question to investigate if combining results from several heuristic-based approaches could help reduce the impact of noise. Our findings show that the impact of the noise generated by the automatic algorithm heuristic based is negligible for the studied subject systems; meaning that the reported relation observed on noisy data still holds on the clean data. However, we also observed that combining results from several heuristic based approaches do not reduce this noise, quite the contrary.