Portrait of Stanislaw Jastrzebski is unavailable

Stanislaw Jastrzebski

Alumni

Publications

A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (see more)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Learning to Compute Word Embeddings on the Fly
Tom Bosc
Stanisław Jastrzębski
Edward Grefenstette
Yoshua Bengios
Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for word… (see more)s in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.
Deep Nets Don't Learn Via Memorization