Publications

Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

Pau Rodríguez

Yash Sharma

Katie E Everett

Rémi Le Priol

Alexandre Lacoste

This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (voir plus)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.

2021-12-31

CLeaR (publié)

proceedings.mlr.press

Distinguishing between fake news and satire with transformers

Jwen Fai Low

Benjamin C. M. Fung

Farkhund Iqbal

Shih-Chia Huang

2021-12-31

Expert systems with applications (publié)

doi.org

DsMLP: A Learning-Based Multi-Layer Perception for MIMO Detection Implemented by Dynamic Stochastic Computing

Qidie Wu

Jinsheng Kuang

Jiyun Tao

Jienan Chen

Warren J. Gross

As the number of antennas increases in multi-input and multi-output (MIMO) systems, even linear detection methods suffer from sharply increa… (voir plus)sing complexity. This paper proposes a learning-based multi-layer perception (MLP), named dynamic stochastic multi-layer perception (DsMLP), which is implemented by dynamic stochastic computing (DSC). We first establish a similar form between the MLP structure and minimum mean square error (MMSE) matrix operations. Consequently, DsMLP transforms the complex computation problem into an optimization problem of MLP training. Due to the specific design of MLP structure, e.g., same input/output dimension and single layer without activation function, the mathematical representation of DsMLP is identical to the MMSE matrix operations. Therefore, DsMLP guarantees sound model explainability in mathematics, fast convergence in training, and low complexity in computation. Furthermore, we transform the MLP training process to the DSC domain and propose a hardware-efficient scheme for DsMLP. Compared with other state-of-the-art MIMO detectors, DsMLP achieves 1.2× energy efficiency and 1.74× area efficiency.

2021-12-31

IEEE Transactions on Signal Processing (publié)

doi.org

DyG2Vec: Representation Learning for Dynamic Graphs with Self-Supervision

Mohammad Alomrani

Mahdi Biparva

Yingxue Zhang

Mark J. Coates

Temporal graph neural networks have shown promising results in learning inductive representations by automatically extracting temporal patte… (voir plus)rns. However, previous works often rely on complex memory modules or inefficient random walk methods to construct temporal representations. In addition, the existing dynamic graph encoders are non-trivial to adapt to self-supervised paradigms, which prevents them from utilizing unlabeled data. To address these limitations, we present an efficient yet effective attention-based encoder that leverages temporal edge encodings and window-based subgraph sampling to generate task-agnostic embeddings. Moreover, we propose a joint-embedding architecture using non-contrastive SSL to learn rich temporal embeddings without labels. Experimental results on 7 benchmark datasets indicate that on average, our model outperforms SoTA baselines on the future link prediction task by 4.23% for the transductive setting and 3.30% for the inductive setting while only requiring 5-10x less training/inference time. Additionally, we empirically validate the SSL pre-training significance under two probings commonly used in language and vision modalities. Lastly, different aspects of the proposed framework are investigated through experimental analysis and ablation studies.

2021-12-31

arXiv.org (prépublication)

doi.org

openreview.net

Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Antonio Orvieto

Simon Lacoste-Julien

Nicolas Loizou

Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochastic Polyak stepsize (SPS). The proposed … (voir plus)SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Efficient Continual Learning Ensembles in Neural Network Subspaces

Thang Doan

Seyed Iman Mirzadeh

Joelle Pineau

Mehrdad Farajtabar

A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to allev… (voir plus)iate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet eﬀective method to improve continual performance. However, the training and inference cost of ensembles can increase linearly with the number of models. Motivated by this limitation, we leverage the recent advances in the deep learning optimization literature, such as mode connectivity and neural network subspaces, to derive a new method that is both computationally advantageous and can outperform the state-of-the-art continual learning algorithms

2021-12-31

arXiv.org (prépublication)

dblp.uni-trier.de

Enhanced Biomedical Knowledge Discovery From Unstructured Text Using Contextual Embeddings

Iz Beltagy

Kyle Lo

Arman Cohan. 2019

Scib-500

Yoshua Bengio

R´ejean Ducharme

P Vincent

Rishi Bommasani

Kelly Davis

Claire Cardie

Billy Chiu

Sampo Pyysalo

Ivan Vuli´c

Extracting knowledge from large, unstruc-001 tured text corpora presents a challenge. Re-002 cently, authors have utilized unsupervised, 003… (voir plus) static word embeddings to uncover "latent 004 knowledge" contained within domain-speciﬁc 005 scientiﬁc corpora. Here semantic-similarity 006 measures between representations of concepts, 007 objects or entities were used to predict re-008 lationships, which were later veriﬁed using 009 physical methods. Static language models 010 have recently been surpassed at most down-011 stream tasks by massively pre-trained, contex-012 tual language models like BERT. Some have 013 postulated that contextualized embeddings po-014 tentially yield word representations superior 015 to static ones for knowledge-discovery pur-016 poses. In an effort to address this ques-017 tion, two biomedically-trained BERT models 018 (BioBERT, SciBERT) were used to encode 019 n = 500, 1000 or 5000 sentences containing 020 words of interest extracted from a biomedical 021 corpus (Coronavirus Open Research Dataset). 022 The n representations for the words of inter-023 est were subsequently extracted and then ag-024 gregated to yield static-equivalent word rep-025 resentations. These words belonged to the 026 vocabularies of intrinsic benchmarking tools 027 for the biomedical domain (Bio-SimVerb and 028 Bio-SimLex), which assess quality of word 029 representations using semantic-similarity and 030 relatedness measures. Using intrinsic bench-031 marking tasks, feasibility of using contextual-032 ized word representations for knowledge dis-033 covery tasks can be assessed: Word represen-034 tations that better encode described reality are 035 expected to perform better (i.e. closer to do-036 main experts). As postulated, BERT embed-037 dings outperform static counterparts

2021-12-31

(publié)

www.semanticscholar.org

Equivariant Networks for Crystal Structures

Sékou-Oumar Kaba

Siamak Ravanbakhsh

Supervised learning with deep models has tremendous potential for applications in materials science. Recently, graph neural networks have be… (voir plus)en used in this context, drawing direct inspiration from models for molecules. However, materials are typically much more structured than molecules, which is a feature that these models do not leverage. In this work, we introduce a class of models that are equivariant with respect to crystalline symmetry groups. We do this by defining a generalization of the message passing operations that can be used with more general permutation groups, or that can alternatively be seen as defining an expressive convolution operation on the crystal graph. Empirically, these models achieve competitive results with state-of-the-art on property prediction tasks.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

doi.org

openreview.net

Extended Abstract Track

Amin Mansouri

Jason Hartford

Kartik Ahuja

Yoshua Bengio

Christian Shewmake

Simone Azeglio

Arianna Di Bernardo

Nina Miolane

2021-12-31

(publié)

www.semanticscholar.org

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

2021-12-31

Findings (publié)

doi.org

Feeding What You Need by Understanding What You Learned

Xiaoqiang Wang

Bang Liu

Fangli Xu

Bo Long

Siliang Tang

Lingfei Wu

2021-12-31

ACL (1) (publié)

doi.org

arxiv.org

Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning

Ernie Chang

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Vera Demberg

The surging demand for multilingual dialogue systems often requires a costly labeling process for each language addition. For low resource l… (voir plus)anguages, human annotators are continuously tasked with the adaptation of resource-rich language utterances for each new domain. However, this prohibitive and impractical process can often be a bottleneck for low resource languages that are still without proper translation systems nor parallel corpus. In particular, it is difficult to obtain task-specific low resource language annotations for the English-derived creoles (e.g. Nigerian and Cameroonian Pidgin). To address this issue, we utilize the pretrained language models i.e. BART which has shown great potential in language generation/understanding – we propose to finetune the BART model to generate utterances in Pidgin by leveraging the proximity of the source and target languages, and utilizing positive and negative examples in constrastive training objectives. We collected and released the first parallel Pidgin-English conversation corpus in two dialogue domains and showed that this simple and effective technique is suffice to yield impressive results for English-to-Pidgin generation, which are two closely-related languages.

2021-12-31

COLING (publié)

dblp.uni-trier.de

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications