Publications

High-Order Pooling for Graph Neural Networks with Tensor Decomposition

Chenqing Hua

Guillaume Rabusseau

Jian Tang

openreview.net

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

Emanuele Bugliarello

Fangyu Liu

Jonas Pfeiffer

Siva Reddy

Desmond Elliott

Edoardo Ponti

Ivan Vulic

Reliable evaluation benchmarks designed for replicability and comprehensiveness have driven progress in machine learning. Due to the lack of… (see more) a multilingual benchmark, however, vision-and-language research has mostly focused on English language tasks. To fill this gap, we introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together - by both aggregating pre-existing datasets and creating new ones - visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups. Based on the evaluation of the available state-of-the-art models, we find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks. Moreover, downstream performance is partially explained by the amount of available unlabelled textual data for pretraining, and only weakly by the typological distance of target-source languages. We hope to encourage future research efforts in this area by releasing the benchmark to the community.

2022-01-01

ICML (published)

proceedings.mlr.press

arxiv.org

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Courtney Paquette

Elliot Paquette

Ben Adlam

Jeffrey Pennington

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of… (see more) problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk).

openreview.net

Improved DC-Free Run-Length Limited 4B6B Codes for Concatenated Schemes

Elie Ngomseu Mambou

Thibaud Tonnellier

Warren Gross

In this letter, we introduce a class of improved DC-free 4B6B codes in terms of error correction capabilities for a serially concatenated ar… (see more)chitecture. There are billions of different codebooks that can be derived from the 16 codewords contained in the traditional 4B6B code as per the IEEE 802.15.7 standard for visible light communication (VLC). These codebooks can be classified based on distances properties which determine their error correction performances. The traditional 4B6B code is suitable for hard-decision decoding, however, when a soft decoder is used like in a serially concatenated architecture, that code becomes obsolete. Simulations show that the proposed 4B6B code concatenated with forward error correction (FEC) codes, has better performance compared to state-of-the-art schemes such as the original 4B6B code, the enhanced Miller code, the Manchester code, the 5B10B code and the (0,4) 2/3 RLL code.

2022-01-01

IEEE Access (published)

doi.org

Improved DC-Free Run-Length Limited 4B6B Codes for Concatenated Schemes

Elie Ngomseu Mambou

Thibaud Tonnellier

Warren Gross

In this letter, we introduce a class of improved DC-free 4B6B codes in terms of error correction capabilities for a serially concatenated ar… (see more)chitecture. There are billions of different codebooks that can be derived from the 16 codewords contained in the traditional 4B6B code as per the IEEE 802.15.7 standard for visible light communication (VLC). These codebooks can be classified based on distances properties which determine their error correction performances. The traditional 4B6B code is suitable for hard-decision decoding, however, when a soft decoder is used like in a serially concatenated architecture, that code becomes obsolete. Simulations show that the proposed 4B6B code concatenated with forward error correction (FEC) codes, has better performance compared to state-of-the-art schemes such as the original 4B6B code, the enhanced Miller code, the Manchester code, the 5B10B code and the (0,4) 2/3 RLL code.

2022-01-01

IEEE Access (published)

doi.org

Investigating the Performance of Transformer-Based NLI Models on Presuppositional Inferences

Jad Kabbara

Jackie Cheung

Presuppositions are assumptions that are taken for granted by an utterance, and identifying them is key to a pragmatic interpretation of lan… (see more)guage. In this paper, we investigate the capabilities of transformer models to perform NLI on cases involving presupposition. First, we present simple heuristics to create alternative “contrastive” test cases based on the ImpPres dataset and investigate the model performance on those test cases. Second, to better understand how the model is making its predictions, we analyze samples from sub-datasets of ImpPres and examine model performance on them. Overall, our findings suggest that NLI-trained transformer models seem to be exploiting specific structural and lexical cues as opposed to performing some kind of pragmatic reasoning.

2022-01-01

COLING (published)

dblp.uni-trier.de

KNIFE: Kernelized-Neural Differential Entropy Estimation

Georg Pichler

Pierre Colombo

Malik Boudiaf

Gunther Koliander

Pablo Piantanida

Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when l… (see more)earn dis-entangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread use in neural network training. Although DE offers a potentially wider range of applications than MI, off-the-shelf DE estimators are either non differentiable, computationally intractable or fail to adapt to changes in the underlying distribution. These drawbacks prevent them from being used as regularizers in neural networks training. To address shortcomings in previously proposed estimators for DE, here we introduce K NIFE , a fully parameterized, differentiable kernel-based estimator of DE. The ﬂexibility of our approach also allows us to construct K NIFE -based estimators for conditional (on either discrete or continuous variables) DE, as well as MI. We empirically validate our method on high-dimensional synthetic data and further apply it to guide the training of neural networks for real-world tasks. Our experiments on a large variety of tasks, including visual domain adaptation, textual fair classiﬁcation, and textual ﬁne-tuning demonstrate the effectiveness of K NIFE - based estimation. Code can be found at https: //github.com/g-pichler/knife .

2022-01-01

arXiv.org (preprint)

dblp.uni-trier.de

Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

Thomas George

Guillaume Lajoie

Aristide Baratin

Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `laz… (see more)y' training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of easy-to-learn spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.

2022-01-01

Trans. Mach. Learn. Res. (published)

doi.org

openreview.net

On Learning Fairness and Accuracy on Multiple Subgroups

Changjian Shui

Gezheng Xu

Qi CHEN

Jiaqi Li

Charles Ling

Tal Arbel

Boyu Wang

Christian Gagné

We propose an analysis in fair learning that preserves the utility of the data while reducing prediction disparities under the criteria of g… (see more)roup sufficiency. We focus on the scenario where the data contains multiple or even many subgroups, each with limited number of samples. As a result, we present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective. Specifically, the subgroup specific predictors are learned in the lower-level through a small amount of data and the fair predictor. In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors. We further prove that such a bilevel objective can effectively control the group sufficiency and generalization error. We evaluate the proposed framework on real-world datasets. Empirical evidence suggests the consistently improved fair predictions, as well as the comparable accuracy to the baselines.

openreview.net

Learning Inter-Modal Correspondence and Phenotypes From Multi-Modal Electronic Health Records

Kejing Yin

William K. Cheung

Benjamin Fung

Jonathan Poon

Non-negative tensor factorization has been shown a practical solution to automatically discover phenotypes from the electronic health record… (see more)s (EHR) with minimal human supervision. Such methods generally require an input tensor describing the inter-modal interactions to be pre-established; however, the correspondence between different modalities (e.g., correspondence between medications and diagnoses) can often be missing in practice. Although heuristic methods can be applied to estimate them, they inevitably introduce errors, and leads to sub-optimal phenotype quality. This is particularly important for patients with complex health conditions (e.g., in critical care) as multiple diagnoses and medications are simultaneously present in the records. To alleviate this problem and discover phenotypes from EHR with unobserved inter-modal correspondence, we propose the collective hidden interaction tensor factorization (cHITF) to infer the correspondence between multiple modalities jointly with the phenotype discovery. We assume that the observed matrix for each modality is marginalization of the unobserved inter-modal correspondence, which are reconstructed by maximizing the likelihood of the observed matrices. Extensive experiments conducted on the real-world MIMIC-III dataset demonstrate that cHITF effectively infers clinically meaningful inter-modal correspondence, discovers phenotypes that are more clinically relevant and diverse, and achieves better predictive performance compared with a number of state-of-the-art computational phenotyping models.

2022-01-01

IEEE Transactions on Knowledge and Data Engineering (published)

doi.org

arxiv.org

A Learning Metaheuristic Algorithm for a Scheduling Application

Nazgol Niroumandrad

Nadia Lahrichi

Andrea Lodi

2022-01-01

Modeling, Identification and Control (published)

doi.org

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

Zhepei Wang

Cem (Yusuf) Subakan

Xilin Jiang

Junkai Wu

Efthymios Tzinis

Mirco Ravanelli

Paris Smaragdis

In this article, we work on a sound recognition system that continually incorporates new sound classes. Our main goal is to develop a framew… (see more)ork where the model can be updated without relying on labeled data. For this purpose, we propose adopting representation learning, where an encoder is trained using unlabeled data. This learning framework enables the study and implementation of a practically relevant use case where only a small amount of the labels is available in a continual learning context. We also make the empirical observation that a similarity-based representation learning method within this framework is robust to forgetting even if no explicit mechanism against forgetting is employed. We show that this approach obtains similar performance compared to several distillation-based continual learning methods when employed on self-supervised representation learning methods.

2022-01-01

IEEE Signal Processing Letters (published)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications