Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Language models transfer to time-series forecasting, but it is unclear whether this reflects reusable internal structure or rapid relearning… (voir plus) under a familiar architecture.
We study this transfer directly by comparing pretrained and randomly initialized versions of the same model on a forecasting objective whose inputs have little semantic overlap with text but still require autoregressive sequential structure.
Across Qwen3-0.6B finetuning experiments, language initialization gives coherent per-example gradients from the first update, while random initialization first passes through a low-alignment warmup phase.
Effective-rank and hidden-state analyses show that finetuning selectively reshapes an existing representation geometry rather than constructing the simpler temporal geometry found by models trained from scratch.
Cross-domain sparse features and causal ablations then expose candidate transferred primitives, including a Layer~1 head--MLP circuit whose ablation selectively increases loss on periodic forecasting and repetitive language passages.
These results support an account of cross-modal transfer in which autoregressive pretraining creates temporal feature geometry that can be selected and specialized outside language.
Predicting how a sequence will continue is a basic problem for intelligent systems. We show that large language models contain usable
foreca… (voir plus)sting structure before any explicit time-series supervision. A
single linear readout from frozen Qwen3-0.6B hidden states maps ordinary text
sequences to numerical trajectories that resemble real time series, and those
trajectories can be used for straightforward forecasts. The distribution over output tokens also gives coherent, non-crossing probabilistic forecasts in a single forward pass. After time-series
specialization, pretrained models show aligned gradients and improve
immediately, whereas randomly initialized models spend early training in a
destructive-interference regime. These findings suggest that auto-regressive
pretraining already shapes representations around temporal continuation; and
finetuning adapts that structure to numerical forecasting rather than
creating it from scratch.
Can language-pretrained transformers become effective time-series forecasters, and why? In this paper, we show that cross-modal transfer ari… (voir plus)ses because language pretraining preconditions time series training with a reusable manifold. A linear probe on frozen LLM states decodes realistic time-series trajectories without paired supervision, and retrieval in this projected space yields competitive forecasts, showing that structure and dynamics exist before finetuning. Pretrained initialization also improves optimization, producing coherent gradients and a highly anisotropic loss landscape unlike random initialization. Finetuning then acts as low-dimensional alignment, reusing existing directions rather than learning temporal primitives from scratch, as evidenced by low-rank updates, subspace alignment, and shared features for periodicity, trend, and repetition. Together, these results support a geometric account of LLM-to-time-series transfer: language pretraining builds the manifold, and finetuning projects numerical dynamics onto task-relevant directions.
The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and ben… (voir plus)chmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AI-based assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research.
Despite significant progress, Vision-Language Models (VLMs) still struggle with hallucinations, especially in long-form responses. Existing … (voir plus)strategies have had limited successes in specific cases, and long-form generation remains problematic.
In this work we attempt to establish the link between the data used to train the model and the hallucinations in the model's output.
To this end, we examine hallucinations through data corruption. We develop a method to corrupt training data and then train models with this data to see the effect on performance. We will show that corrupting only a small portion of the long-form training data significantly impairs the performance of the model on long-form tasks, while leaving simpler tasks like visual question-answering and multiple choice relatively intact. All training code and models are released for reproducibility and future research.
Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they becoming increasingly pr… (voir plus)evalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.