Publications

Sim-to-Real Transfer with Neural-Augmented Robot Simulation
Despite the recent successes of deep reinforcement learning, teaching complex motor skills to a physical robot remains a hard problem. While… (see more) learning directly on a real system is usually impractical, doing so in simulation has proven to be fast and safe. Nevertheless, because of the "reality gap," policies trained in simulation often perform poorly when deployed on a real system. In this work, we introduce a method for training a recurrent neural network on the differences between simulated and real robot trajectories and then using this model to augment the simulator. This Neural-Augmented Simulation (NAS) can be used to learn control policies that transfer significantly better to real environments than policies learned on existing simulators. We demonstrate the potential of our approach through a set of experiments on the Mujoco simulator with added backlash and the Poppy Ergo Jr robot. NAS allows us to learn policies that are competitive with ones that would have been learned directly on the real robot.
BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific … (see more)reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.
Deep Learning. Das umfassende Handbuch
Visual Reasoning with Multi-hop Feature Modulation
Mathieu Seurin
Jérémie Mary
P. Preux
Olivier Pietquin
BanditSum: Extractive Summarization as a Contextual Bandit
Herke van Hoof
Jackie CK Cheung
In this work, we propose a novel method for training neural networks to perform single-document extractive summarization without heuristical… (see more)ly-generated extractive labels. We call our approach BanditSum as it treats extractive summarization as a contextual bandit (CB) problem, where the model receives a document to summarize (the context), and chooses a sequence of sentences to include in the summary (the action). A policy gradient reinforcement learning algorithm is used to train the model to select sequences of sentences that maximize ROUGE score. We perform a series of experiments demonstrating that BanditSum is able to achieve ROUGE scores that are better than or comparable to the state-of-the-art for extractive summarization, and converges using significantly fewer update steps than competing approaches. In addition, we show empirically that BanditSum performs significantly better than competing approaches when good summary sentences appear late in the source document.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
William W. Cohen
Russ Salakhutdinov
Christopher D Manning
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We int… (see more)roduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
A Knowledge Hunting Framework for Common Sense Reasoning
Noelia De La Cruz
Adam Trischler
Kaheer Suleman
Jackie CK Cheung
We introduce an automatic system that achieves state-of-the-art results on the Winograd Schema Challenge (WSC), a common sense reasoning tas… (see more)k that requires diverse, complex forms of inference and knowledge. Our method uses a knowledge hunting module to gather text from the web, which serves as evidence for candidate problem resolutions. Given an input problem, our system generates relevant queries to send to a search engine, then extracts and classifies knowledge from the returned results and weighs them to make a resolution. Our approach improves F1 performance on the full WSC by 0.21 over the previous best and represents the first system to exceed 0.5 F1. We further demonstrate that the approach is competitive on the Choice of Plausible Alternatives (COPA) task, which suggests that it is generally applicable.
Introduction to NIPS 2017 Competition Track
Sergio Escalera
Markus Weimer
Mikhail Burtsev
Valentin Malykh
Varvara Logacheva
Iulian V. Serban
Alexander Rudnicky
Alan W. Black
Shrimai Prabhumoye
Łukasz Kidziński
Sharada Prasanna Mohanty
Carmichael F. Ong
Jennifer L. Hicks
Sergey Levine
Marcel Salathé
Scott Delp
Iker Huerga
Alexander Grigorenko … (see 19 more)
Leifur Thorbergsson
Anasuya Das
Kyla Nemitz
Jenna Sandker
Stephen King
Alexander S. Ecker
Leon A. Gatys
Matthias Bethge
Jordan Boyd-Graber
Shi Feng
Pedro Rodriguez
Mohit Iyyer
He He
Hal Daumé III
Sean McGregor
Amir Banifatemi
Alexey Kurakin
Ian G Goodfellow
The First Conversational Intelligence Challenge
Mikhail Burtsev
Varvara Logacheva
Valentin Malykh
Iulian V. Serban
Shrimai Prabhumoye
Alan W. Black
Alexander Rudnicky
Combining adaptive algorithms and hypergradient method: a performance and robustness study
Nicolas Roux
Convergence Properties of Deep Neural Networks on Separable Data
Remi Tachet des Combes
Samira Shabanian
While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely… (see more) misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features’ frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize gradient starvation where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
Deep Graph Infomax
William Fedus
William L. Hamilton
Pietro Lio
R Devon Hjelm
We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised ma… (see more)nner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.