Scaling laws for neural networks, in which the loss decays as a power-law in the number of parameters, data, and compute, depend fundamental
… (voir plus)ly on the spectral structure of the data covariance, with power-law eigenvalue decay appearing ubiquitously in vision and language tasks. A central question is whether this spectral structure is preserved or destroyed when data passes through the basic building block of a neural network: a random linear projection followed by a nonlinear activation. We study this question for the random feature model: given data