Publications

The Impact of Time Interval between Extubation and Reintubation on Death or Bronchopulmonary Dysplasia in Extremely Preterm Infants
Wissam Shalish
Lara Kanbar
Lajos Kovacs
Sanjay Chawla
Martin Keszler
Smita Rao
Bogdan Panaitescu
Alyse Laliberte
Karen Brown
Robert E. Kearney
Guilherme M. Sant'Anna
Author Correction: Why rankings of biomedical image analysis competitions should be interpreted with care
Lena Maier-Hein
Matthias Eisenmann
Annika Reinke
Sinan Onogur
Marko Stankovic
Patrick Scholz
Hrvoje Bogunovic
Andrew P. Bradley
Aaron Carass
Carolin Feldmann
Alejandro F. Frangi
Peter M. Full
Bram van Ginneken
Allan Hanbury
Katrin Honauer
Michal Kozubek
Bennett Landman
Keno März
Oskar Maier … (see 18 more)
Klaus Maier-Hein
Bjoern Menze
Henning Müller
Peter F. Neher
Wiro Niessen
Nasir Rajpoot
Gregory C. Sharp
Korsuk Sirinukunwattana
Stefanie Speidel
Christian Stock
Danail Stoyanov
Abdel Aziz Taha
Fons van der Sommen
Ching-Wei Wang
Marc-André Weber
Guoyan Zheng
Pierre Jannin
Annette Kopp-Schneider
Session-Based Social Recommendation via Dynamic Graph Attention Networks
Zhiping Xiao
Yifan Wang
Ming Zhang
Online communities such as Facebook and Twitter are enormously popular and have become an essential part of the daily life of many of their … (see more)users. Through these platforms, users can discover and create information that others will then consume. In that context, recommending relevant information to users becomes critical for viability. However, recommendation in online communities is a challenging problem: 1) users' interests are dynamic, and 2) users are influenced by their friends. Moreover, the influencers may be context-dependent. That is, different friends may be relied upon for different topics. Modeling both signals is therefore essential for recommendations. We propose a recommender system for online communities based on a dynamic-graph-attention neural network. We model dynamic user behaviors with a recurrent neural network, and context-dependent social influence with a graph-attention neural network, which dynamically infers the influencers based on users' current interests. The whole model can be efficiently fit on large-scale data. Experimental results on several real-world data sets demonstrate the effectiveness of our proposed approach over several competitive baselines including state-of-the-art models.
Maximum Entropy Generators for Energy-Based Models
Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In t… (see more)his work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entropy of the generated samples, which we perform using recently proposed nonparametric mutual information estimators. Finally, to stabilize the resulting adversarial game, we use a zero-centered gradient penalty derived as a necessary condition from the score matching literature. The proposed technique can generate sharp images with Inception and FID scores competitive with recent GAN techniques, does not suffer from mode collapse, and is competitive with state-of-the-art anomaly detection techniques.
What comes next? Extractive summarization by next-sentence prediction
Jingyun Liu
Jackie CK Cheung
Annie Priyadarshini Louis
Existing approaches to automatic summarization assume that a length limit for the summary is given, and view content selection as an optimiz… (see more)ation problem to maximize informativeness and minimize redundancy within this budget. This framework ignores the fact that human-written summaries have rich internal structure which can be exploited to train a summarization system. We present NEXTSUM, a novel approach to summarization based on a model that predicts the next sentence to include in the summary using not only the source article, but also the summary produced so far. We show that such a model successfully captures summary-specific discourse moves, and leads to better content selection performance, in addition to automatically predicting how long the target summary should be. We perform experiments on the New York Times Annotated Corpus of summaries, where NEXTSUM outperforms lead and content-model summarization baselines by significant margins. We also show that the lengths of summaries produced by our system correlates with the lengths of the human-written gold standards.
The Benefits of Over-parameterization at Initialization in Deep ReLU Networks
It has been noted in existing literature that over-parameterization in ReLU networks generally improves performance. While there could be se… (see more)veral factors involved behind this, we prove some desirable theoretical properties at initialization which may be enjoyed by ReLU networks. Specifically, it is known that He initialization in deep ReLU networks asymptotically preserves variance of activations in the forward pass and variance of gradients in the backward pass for infinitely wide networks, thus preserving the flow of information in both directions. Our paper goes beyond these results and shows novel properties that hold under He initialization: i) the norm of hidden activation of each layer is equal to the norm of the input, and, ii) the norm of weight gradient of each layer is equal to the product of norm of the input vector and the error at output layer. These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound. As we show, this lower bound depends on the depth of the network and the number of samples, and by the virtue of being a lower bound, over-parameterized ReLU networks are endowed with these desirable properties. For the aforementioned hidden activation norm property under He initialization, we further extend our theory and show that this property holds for a finite width network even when the number of data samples is infinite. Thus we overcome several limitations of existing papers, and show new properties of deep ReLU networks at initialization.
1. Searching for Big-Oh in the Data: Inferring Asymptotic Complexity from Experiments
Catherine McGeoch
Peter Sanders 0001
Rudolf Fleischer
Paul R. Cohen
Adversarial Domain Adaptation for Stable Brain-Machine Interfaces
Ali Farshchian
Juan A. Gallego
Lee Miller
Sara Solla
Brain-Machine Interfaces (BMIs) have recently emerged as a clinically viable option to restore voluntary movements after paralysis. These de… (see more)vices are based on the ability to extract information about movement intent from neural signals recorded using multi-electrode arrays chronically implanted in the motor cortices of the brain. However, the inherent loss and turnover of recorded neurons requires repeated recalibrations of the interface, which can potentially alter the day-to-day user experience. The resulting need for continued user adaptation interferes with the natural, subconscious use of the BMI. Here, we introduce a new computational approach that decodes movement intent from a low-dimensional latent representation of the neural data. We implement various domain adaptation methods to stabilize the interface over significantly long times. This includes Canonical Correlation Analysis used to align the latent variables across days; this method requires prior point-to-point correspondence of the time series across domains. Alternatively, we match the empirical probability distributions of the latent variables across days through the minimization of their Kullback-Leibler divergence. These two methods provide a significant and comparable improvement in the performance of the interface. However, implementation of an Adversarial Domain Adaptation Network trained to match the empirical probability distribution of the residuals of the reconstructed neural signals outperforms the two methods based on latent variables, while requiring remarkably few data points to solve the domain adaptation problem.
On Adversarial Mixup Resynthesis
In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore mo… (see more)dels that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.
Adversarial Mixup Resynthesizers
In this paper, we explore new approaches to combining information encoded within the learned representations of autoencoders. We explore mod… (see more)els that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.
Artificial Intelligence Cytometer in Blood
Geoffrey Hinton
Attraction-Repulsion Actor-Critic for Continuous Control Reinforcement Learning
Thang Doan
R Devon Hjelm
Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensiona… (see more)l state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. One way to avoid local optima is to use a population of agents to ensure coverage of the policy space, yet learning a population with the "best" coverage is still an open problem. In this work, we present a novel approach to population-based RL in continuous control that leverages properties of normalizing flows to perform attractive and repulsive operations between current members of the population and previously observed policies. Empirical results on the MuJoCo suite demonstrate a high performance gain for our algorithm compared to prior work, including Soft-Actor Critic (SAC).