Join us on November 19 for the third edition of Mila’s science popularization contest, where students will present their complex research in just three minutes before a jury.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is … (see more)unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i)
2025-10-06
Proceedings of the 42nd International Conference on Machine Learning (published)
Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is … (see more)unclear to what extent such effects lead to meaningfully different networks, either in terms of the models’ weights or the underlying functions that were learned. In this work, we show that during the initial "chaotic" phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i)
2025-10-06
Proceedings of the 42nd International Conference on Machine Learning (published)
Neural network training begins with a chaotic phase in which the network is sensitive to small perturbations, such as those caused by stocha… (see more)stic gradient descent (SGD). This sensitivity can cause identically initialized networks to diverge both in parameter space and functional similarity.
However, the exact degree to which networks are sensitive to perturbation, and the sensitivity of networks as they transition out of the chaotic phase, is unclear.
To address this uncertainty, we apply a controlled perturbation at a single point in training time and measure its effect on otherwise identical training trajectories.
We find that both the