Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illus- trates gradually more concepts, and gradu- ally more complex ones. Here, we formal- ize such training strategies in the context of machine learning, and call them “curricu- lum learning”. In the context of recent re- search studying the difficulty of training in the presence of non-convex training criteria (for deep deterministic and stochastic neu- ral networks), we explore curriculum learn- ing in various set-ups. The experiments show that significant improvements in generaliza- tion can be achieved. We hypothesize that curriculum learning has both an effect on the speed of convergence of the training process to a minimum and, in the case of non-convex criteria, on the quality of the local minima obtained: curriculum learning can be seen as a particular form of continuation method (a general strategy for global optimization of non-convex functions).