Mila > Publication > Generic Methods to Improve Training and Generalization > RandomOut: Using a convolutional gradient norm to rescue convolutional filters

RandomOut: Using a convolutional gradient norm to rescue convolutional filters

Generic Methods to Improve Training and Generalization
Feb 2016

RandomOut: Using a convolutional gradient norm to rescue convolutional filters

Feb 2016

Filters in convolutional neural networks are sensitive to their initialization. The random numbers used to initialize filters are a bias and determine if you will” win” and converge to a satisfactory local minimum so we call this The Filter Lottery. We observe that the 28×28 Inception-V3 model without Batch Normalization fails to train 26% of the time when varying the random seed alone. This is a problem that affects the trial and error process of designing a network. Because random seeds have a large impact it makes it hard to evaluate a network design without trying many different random starting weights. This work aims to reduce the bias imposed by the initial weights so a network converges more consistently. We propose to evaluate and replace specific convolutional filters that have little impact on the prediction. We use the gradient norm to evaluate the impact of a filter on error, and re-initialize filters when the gradient norm of its weights falls below a specific threshold. This consistently improves accuracy on the 28×28 Inception-V3 with a median increase of +3.3%. In effect our method RandomOut increases the number of filters explored without increasing the size of the network. We observe that the RandomOut method has more consistent generalization performance, having a standard deviation of 1.3% instead of 2% when varying random seeds, and does so faster and with fewer parameters.

Cohen, Joseph Paul, et al. “RandomOut: Using a Convolutional Gradient Norm to Rescue Convolutional Filters.” International Conference on Learning Representations Workshop, 2016, http://arxiv.org/abs/1602.05931.

Reference

[arXiv:1602.05931] [ICLRW2016 Workshop]

Linked Profiles