Publications

GraphMix: Improved Training of Graph Neural Networks for Semi-Supervised Learning

Vikas Verma

Meng Qu

Alex Lamb

Juho Kannala

We present GraphMix , a regularized training scheme for Graph Neural Network based semi-supervised object classiﬁcation, leveraging the re… (see more)cent advances in the regularization of classical deep neural networks. Speciﬁcally, we pro-pose a uniﬁed approach in which we train a fully-connected network jointly with the graph neural network via parameter sharing, interpolation-based regularization and self-predicted-targets. Our proposed method is architecture agnostic in the sense that it can be applied to any variant of graph neural networks which applies a parametric transformation to the features of the graph nodes. Despite its simplicity, with GraphMix we can consistently improve results and achieve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets :Cora-Full, Co-author-CS and Co-author-Physics.

2020-01-01

(published)

www.semanticscholar.org

How to make your optimizer generalize better

Sharan Vaswani

Reza Babenzhad

Sait AI Lab

Montreal.

Jose Gallego

Aaron Mishkin

Simon Lacoste-Julien

Nicolas Le Roux

We study the implicit regularization of optimization methods for linear models interpolating the training data in the under-parametrized and… (see more) over-parametrized regimes. For over-parameterized linear regression, where there are inﬁnitely many interpolating solutions, different optimization methods can converge to solutions with varying generalization performance. In this setting, we show that projections onto linear spans can be used to move between solutions. Furthermore, via a simple reparameterization, we can ensure that an arbitrary optimizer converges to the minimum (cid:96) 2 -norm solution with favourable generalization properties. For under-parameterized linear clas-siﬁcation, optimizers can converge to different decision boundaries separating the data. We prove that for any such classiﬁer, there exists a family of quadratic norms (cid:107)·(cid:107) P such that the classiﬁer’s direction is the same as that of the maximum P -margin solution. We argue that analyzing convergence to the standard maximum (cid:96) 2 -margin is arbitrary and show that minimizing the norm induced by the data can result in better generalization. We validate our theoretical results via experiments on synthetic and real datasets.

2020-01-01

(published)

www.semanticscholar.org

Hybrid Models for Learning to Branch

Prateek Gupta

Maxime Gasse

Elias Boutros Khalil

Pawan Mudigonda

M. Pawan Kumar

Andrea Lodi

Yoshua Bengio

arxiv.org

Intelligent Tools for Precision Public Health.

Anya Okhmatovskaia

David Buckeridge

2020-01-01

MIE (published)

doi.org

Investigating the Barriers to Physician Adoption of an Artificial Intelligence- Based Decision Support System in Emergency Care: An Interpretative Qualitative Study.

Cécile Petitgand

Aude Motulsky

Jean-Louis Denis

Catherine Régis

2020-01-01

MIE (published)

doi.org

Investigating the Influence of Selected Linguistic Features on Authorship Attribution using German News Articles

Manuel Sage

Pietro Cruciata

Raed Abdo

Jackie Cheung

Yaoyao Fiona Zhao

In this work, we perform authorship attri-bution on a new dataset of German news articles. We seek to classify over 3,700 articles to their … (see more)ﬁve corresponding authors, using four conventional machine learning approaches (na¨ıve Bayes, logistic regression, SVM and kNN) and a convolutional neural network. We analyze the effect of character and word n-grams on the prediction accuracy, as well as the inﬂuence of stop words, punctuation, numbers, and lowercasing when preprocessing raw text. The experiments show that higher order character n-grams (n = 5,6) perform better than lower orders and word n-grams slightly outperform those with characters. Combining both in fusion models further improves results up to 92% for SVM. A multilayer convolutional structure allows the CNN to achieve 90.5% accuracy. We found stop words and punctuation to be important features for author identiﬁcation; removing them leads to a measurable decrease in performance. Finally, we evaluate the topic dependency of the algorithms by gradually replacing named entities, nouns, verbs and eventually all to-kens in the dataset according to their POS-tags.

2020-01-01

SwissText/KONVENS (published)

dblp.uni-trier.de

Investigating the interconnections between human, technology and context in the implementation of a AI-based health information technology: a dynamic technological frame perspective

Catherine Régis

2020-01-01

(published)

www.semanticscholar.org

Joint Learning of Generative Translator and Classifier for Visually Similar Classes

Byungin Yoo

Tristan Sylvain

Yoshua Bengio

Junmo Kim

In this paper, we propose a Generative Translation Classification Network (GTCN) for improving visual classification accuracy in settings wh… (see more)ere classes are visually similar and data is scarce. For this purpose, we propose joint learning from a scratch to train a classifier and a generative stochastic translation network end-to-end. The translation network is used to perform on-line data augmentation across classes, whereas previous works have mostly involved domain adaptation. To help the model further benefit from this data-augmentation, we introduce an adaptive fade-in loss and a quadruplet loss. We perform experiments on multiple datasets to demonstrate the proposed method’s performance in varied settings. Of particular interest, training on 40% of the dataset is enough for our model to surpass the performance of baselines trained on the full dataset. When our architecture is trained on the full dataset, we achieve comparable performance with state-of-the-art methods despite using a light-weight architecture.

2020-01-01

IEEE Access (published)

doi.org

arxiv.org

Language GANs Falling Short

Massimo Caccia

Lucas Caccia

William Fedus

Hugo Larochelle

Joelle Pineau

Laurent Charlin

Generating high-quality text with sufficient diversity is essential for a wide range of Natural Language Generation (NLG) tasks. Maximum-Lik… (see more)elihood (MLE) models trained with teacher forcing have consistently been reported as weak baselines, where poor performance is attributed to exposure bias (Bengio et al., 2015; Ranzato et al., 2015); at inference time, the model is fed its own prediction instead of a ground-truth token, which can lead to accumulating errors and poor samples. This line of reasoning has led to an outbreak of adversarial based approaches for NLG, on the account that GANs do not suffer from exposure bias. In this work, we make several surprising observations which contradict common beliefs. First, we revisit the canonical evaluation framework for NLG, and point out fundamental flaws with quality-only evaluation: we show that one can outperform such metrics using a simple, well-known temperature parameter to artificially reduce the entropy of the model's conditional distributions. Second, we leverage the control over the quality / diversity trade-off given by this parameter to evaluate models over the whole quality-diversity spectrum and find MLE models constantly outperform the proposed GAN variants over the whole quality-diversity space. Our results have several implications: 1) The impact of exposure bias on sample quality is less severe than previously thought, 2) temperature tuning provides a better quality / diversity trade-off than adversarial training while being easier to train, easier to cross-validate, and less computationally expensive. Code to reproduce the experiments is available at github.com/pclucas14/GansFallingShort

2020-01-01

ICLR (published)

openreview.net