VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially obscured … (voir plus)texts using pixel-level hints within images through complex reasoning. This task stems from the observation that text embedded in images intrinsically differs from common visual elements and text due to the need to align the modalities of vision, text, and text embedded in images. While many works incorporate text into images for visual question answering, they mostly rely on OCR or masked language modeling, reducing the task to text-based processing. However, text-based processing becomes ineffective in VCR as accurate text restoration depends on the combined information from provided images, context, and subtle cues from the tiny, exposed areas of masked texts. We develop a pipeline to generate synthetic images for the VCR task using image-caption pairs, with adjustable caption visibility to control the task difficulty. With this pipeline, we construct VCR-WIKI for VCR using Wikipedia images with captions, including 2.11M English and 346K Chinese training entities, plus 5K validation and 5K test entities in both languages, each in easy and hard configurations. We also make a hidden test set, VCR-HIDDEN, to avoid potential overfitting on VCR-WIKI. Our results reveal that current vision-language models significantly lag behind human performance in the VCR task, and merely fine-tuning the models on our dataset does not lead to notable improvements. We release VCR-WIKI and the data construction code to facilitate future research.
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Cristina Nader Vasconcelos
Candice Schumann
Deepak Ramachandran
Junfeng He
Mohammad Havaei
Katherine A Heller
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Cristina Nader Vasconcelos
Deepak Ramachandran
Candice Schumann
Junfeng He
Katherine A Heller
Mohammad Havaei
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of sample… (voir plus)s. For any generative model, including pre-trained foundation models with GAN, Transformer or Diffusion architectures, generation performance can vary significantly based on which part of the learned data manifold is sampled. In this paper we study the post-training local geometry of the learned manifold and its relationship to generation outcomes for models ranging from toy settings to the latent decoder of the near state-of-the-art Stable Diffusion 1.4 Text-to-Image model. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling (
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin
Ahmed Imtiaz Humayun
Utku Evci
Suvinay Subramanian
Amir Yazdanbakhsh
Dan Alistarh
Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large l… (voir plus)anguage models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Arthur Dehgan
Hamza Abdelhedi
Vanessa Hadid
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (voir plus)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Arthur Dehgan
Hamza Abdelhedi
Vanessa Hadid
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (voir plus)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
David Williams-King
Linh Le
Adam Oberman
As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (voir plus)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.
Supervised Large Neighbourhood Search for MIPs
Charly Robinson La Rocca
Jean-François Cordeau
Large Neighbourhood Search (LNS) is a powerful heuristic framework for solving Mixed-Integer Programming (MIP) problems. However, designing … (voir plus)effective variable selection strategies in LNS remains challenging, especially for diverse sets of problems. In this paper, we propose an approach that integrates Machine Learning (ML) within the destroy operator of LNS for MIPs with a focus on minimal offline training. We implement a modular LNS matheuristic as a test bench to compare different LNS heuristics, including our ML-enhanced LNS. Experimental results on the MIPLIB 2017 dataset demonstrate that the matheuristic can significantly improve the performance of state-of-the-art solvers like Gurobi and SCIP. We conduct analyses on noisy oracles to explore the impact of prediction accuracy on solution quality. Additionally, we develop techniques to enhance the ML model through loss adjustments and sampling routines. Our findings suggest that while random LNS remains competitive, our Supervised LNS (SLNS) outperforms other baselines and helps set the foundation for future research on ML for LNS methods that are both efficient and general.
Supervised Large Neighbourhood Search for MIPs
Charly Robinson La Rocca
Jean-François Cordeau
Large Neighbourhood Search (LNS) is a powerful heuristic framework for solving Mixed-Integer Programming (MIP) problems. However, designing … (voir plus)effective variable selection strategies in LNS remains challenging, especially for diverse sets of problems. In this paper, we propose an approach that integrates Machine Learning (ML) within the destroy operator of LNS for MIPs with a focus on minimal offline training. We implement a modular LNS matheuristic as a test bench to compare different LNS heuristics, including our ML-enhanced LNS. Experimental results on the MIPLIB 2017 dataset demonstrate that the matheuristic can significantly improve the performance of state-of-the-art solvers like Gurobi and SCIP. We conduct analyses on noisy oracles to explore the impact of prediction accuracy on solution quality. Additionally, we develop techniques to enhance the ML model through loss adjustments and sampling routines. Our findings suggest that while random LNS remains competitive, our Supervised LNS (SLNS) outperforms other baselines and helps set the foundation for future research on ML for LNS methods that are both efficient and general.
Multi-center benchmarking of cervical spinal cord RF coils for 7 T MRI: A traveling spines study
Eva Alonso‐Ortiz
Daniel Papp
Robert L. Barry
Kyota Poëti
Alan C. Seifert
Kyle M. Gilbert
Nibardo Lopez‐Rios
Jan Paska
Falk Eippert
Nikolaus Weiskopf
Laura Beghini
Nadine Graedel
Robert Trampel
Martina F Callaghan
Christoph S. Aigner
Patrick Freund
Maryam Seif
Aurélien Destruel
Virginie Callot
Johanna Vannesjo … (voir 1 de plus)
Multi-center benchmarking of cervical spinal cord RF coils for 7 T MRI: A traveling spines study
Eva Alonso‐Ortiz
Daniel Papp
Robert L. Barry
Kyota Poëti
Alan C. Seifert
Kyle M. Gilbert
Nibardo Lopez‐Rios
Jan Paska
Falk Eippert
N. Weiskopf
Laura Beghini
Nadine Graedel
Robert Trampel
M. F. Callaghan
Christoph S Aigner
Patrick Freund
Maryam Seif
A. Destruel
Virginie Callot
Johanna Vannesjo … (voir 1 de plus)
Purpose The depth within the body, small diameter, long length, and varying tissue surrounding the spinal cord impose specific consideration… (voir plus)s when designing radiofrequency coils. The optimal coil configuration for 7 T cervical spinal cord MRI is unknown and, currently, there are very few coil options. The purpose of this work was (1) to establish a quality control protocol for evaluating 7 T cervical spinal cord coils and (2) to use that protocol to evaluate the performance of 4 different coil designs. Methods Three healthy volunteers and a custom anthropomorphic phantom (the traveling spines cohort) were scanned at seven 7 T imaging centers using a common protocol and each center’s specific cervical spinal cord coil. Four different coil designs were tested (two in-house, one Rapid Biomedical, and one MRI.TOOLS design). Results The Rapid Biomedical coil was found to have the highest B1+ efficiency, whereas one of the in-house designs (NeuroPoly Lab) had the highest SNR and the largest spinal cord coverage. The MRI.TOOLS coil had the most uniform B1+ profile along the cervical spinal cord; however, it was limited in its ability to provide the requested flip angles (especially for larger individuals). The latter was also the case for the second in-house coil (MSSM). Conclusion The results of this study serve as a guide for the spinal cord MRI community in selecting the most suitable coil based on specific requirements and offer a standardized protocol for assessing future coils.
Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning
Tzung-Han Juang