What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Cristina Nader Vasconcelos
Deepak Ramachandran
Candice Schumann
Junfeng He
Katherine A Heller
Mohammad Havaei
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
Ahmed Imtiaz Humayun
Ibtihel Amara
Cristina Nader Vasconcelos
Deepak Ramachandran
Candice Schumann
Junfeng He
Katherine A Heller
Mohammad Havaei
Deep Generative Models are frequently used to learn continuous representations of complex data distributions using a finite number of sample… (see more)s. For any generative model, including pre-trained foundation models with GAN, Transformer or Diffusion architectures, generation performance can vary significantly based on which part of the learned data manifold is sampled. In this paper we study the post-training local geometry of the learned manifold and its relationship to generation outcomes for models ranging from toy settings to the latent decoder of the near state-of-the-art Stable Diffusion 1.4 Text-to-Image model. Building on the theory of continuous piecewise-linear (CPWL) generators, we characterize the local geometry in terms of three geometric descriptors - scaling (
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
Tian Jin
Ahmed Imtiaz Humayun
Utku Evci
Suvinay Subramanian
Amir Yazdanbakhsh
Dan Alistarh
Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large l… (see more)anguage models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Arthur Dehgan
Hamza Abdelhedi
Vanessa Hadid
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (see more)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.
Artificial Neural Networks for Magnetoencephalography: A review of an emerging field
Arthur Dehgan
Hamza Abdelhedi
Vanessa Hadid
Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (see more)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
David Williams-King
Linh Le
Adam Oberman
As LLMs develop increasingly advanced capabilities, there is an increased need to minimize the harm that could be caused to society by certa… (see more)in model outputs; hence, most LLMs have safety guardrails added, for example via fine-tuning. In this paper, we argue the position that current safety fine-tuning is very similar to a traditional cat-and-mouse game (or arms race) between attackers and defenders in cybersecurity. Model jailbreaks and attacks are patched with bandaids to target the specific attack mechanism, but many similar attack vectors might remain. When defenders are not proactively coming up with principled mechanisms, it becomes very easy for attackers to sidestep any new defenses. We show how current defenses are insufficient to prevent new adversarial jailbreak attacks, reward hacking, and loss of control problems. In order to learn from past mistakes in cybersecurity, we draw analogies with historical examples and develop lessons learned that can be applied to LLM safety. These arguments support the need for new and more principled approaches to designing safe models, which are architected for security from the beginning. We describe several such approaches from the AI literature.
Supervised Large Neighbourhood Search for MIPs
Charly Robinson La Rocca
Jean-François Cordeau
Large Neighbourhood Search (LNS) is a powerful heuristic framework for solving Mixed-Integer Programming (MIP) problems. However, designing … (see more)effective variable selection strategies in LNS remains challenging, especially for diverse sets of problems. In this paper, we propose an approach that integrates Machine Learning (ML) within the destroy operator of LNS for MIPs with a focus on minimal offline training. We implement a modular LNS matheuristic as a test bench to compare different LNS heuristics, including our ML-enhanced LNS. Experimental results on the MIPLIB 2017 dataset demonstrate that the matheuristic can significantly improve the performance of state-of-the-art solvers like Gurobi and SCIP. We conduct analyses on noisy oracles to explore the impact of prediction accuracy on solution quality. Additionally, we develop techniques to enhance the ML model through loss adjustments and sampling routines. Our findings suggest that while random LNS remains competitive, our Supervised LNS (SLNS) outperforms other baselines and helps set the foundation for future research on ML for LNS methods that are both efficient and general.
Supervised Large Neighbourhood Search for MIPs
Charly Robinson La Rocca
Jean-François Cordeau
Large Neighbourhood Search (LNS) is a powerful heuristic framework for solving Mixed-Integer Programming (MIP) problems. However, designing … (see more)effective variable selection strategies in LNS remains challenging, especially for diverse sets of problems. In this paper, we propose an approach that integrates Machine Learning (ML) within the destroy operator of LNS for MIPs with a focus on minimal offline training. We implement a modular LNS matheuristic as a test bench to compare different LNS heuristics, including our ML-enhanced LNS. Experimental results on the MIPLIB 2017 dataset demonstrate that the matheuristic can significantly improve the performance of state-of-the-art solvers like Gurobi and SCIP. We conduct analyses on noisy oracles to explore the impact of prediction accuracy on solution quality. Additionally, we develop techniques to enhance the ML model through loss adjustments and sampling routines. Our findings suggest that while random LNS remains competitive, our Supervised LNS (SLNS) outperforms other baselines and helps set the foundation for future research on ML for LNS methods that are both efficient and general.
Multi-center benchmarking of cervical spinal cord RF coils for 7 T MRI: A traveling spines study
Eva Alonso‐Ortiz
Daniel Papp
Robert L. Barry
Kyota Poëti
Alan C. Seifert
Kyle M. Gilbert
Nibardo Lopez‐Rios
Jan Paska
Falk Eippert
Nikolaus Weiskopf
Laura Beghini
Nadine Graedel
Robert Trampel
Martina F Callaghan
Christoph S. Aigner
Patrick Freund
Maryam Seif
Aurélien Destruel
Virginie Callot
Johanna Vannesjo … (see 1 more)
Multi-center benchmarking of cervical spinal cord RF coils for 7 T MRI: A traveling spines study
Eva Alonso‐Ortiz
Daniel Papp
Robert L. Barry
Kyota Poëti
Alan C. Seifert
Kyle M. Gilbert
Nibardo Lopez‐Rios
Jan Paska
Falk Eippert
N. Weiskopf
Laura Beghini
Nadine Graedel
Robert Trampel
M. F. Callaghan
Christoph S Aigner
Patrick Freund
Maryam Seif
A. Destruel
Virginie Callot
Johanna Vannesjo … (see 1 more)
Purpose The depth within the body, small diameter, long length, and varying tissue surrounding the spinal cord impose specific consideration… (see more)s when designing radiofrequency coils. The optimal coil configuration for 7 T cervical spinal cord MRI is unknown and, currently, there are very few coil options. The purpose of this work was (1) to establish a quality control protocol for evaluating 7 T cervical spinal cord coils and (2) to use that protocol to evaluate the performance of 4 different coil designs. Methods Three healthy volunteers and a custom anthropomorphic phantom (the traveling spines cohort) were scanned at seven 7 T imaging centers using a common protocol and each center’s specific cervical spinal cord coil. Four different coil designs were tested (two in-house, one Rapid Biomedical, and one MRI.TOOLS design). Results The Rapid Biomedical coil was found to have the highest B1+ efficiency, whereas one of the in-house designs (NeuroPoly Lab) had the highest SNR and the largest spinal cord coverage. The MRI.TOOLS coil had the most uniform B1+ profile along the cervical spinal cord; however, it was limited in its ability to provide the requested flip angles (especially for larger individuals). The latter was also the case for the second in-house coil (MSSM). Conclusion The results of this study serve as a guide for the spinal cord MRI community in selecting the most suitable coil based on specific requirements and offer a standardized protocol for assessing future coils.
Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning
Tzung-Han Juang
Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning
Tzung-Han Juang
While traditional HLS (High-Level Synthesis) converts “high-level” C-like programs into hardware automatically, producing high-performan… (see more)ce designs still requires hardware expertise. Optimizations such as data partitioning can have a large impact on performance since they directly affect data reuse patterns and the ability to reuse hardware. However, optimizing partitioning is a difficult process since minor changes in the parameter choices can lead to totally unpredictable performance. Functional array-based languages have been proposed instead of C-based approaches, as they offer stronger performance guarantees. This paper proposes to follow a similar approach and exposes a divide-and-conquer primitive at the algorithmic level to let users partition any arbitrary computation. The compiler is then free to explore different partition shapes to maximize both data and hardware reuse automatically. The main challenge remains that the impact of partitioning is only known much later in the compilation flow. This is due to the hard-to-predict effects of the many optimizations applied during compilation. To solve this problem, the partitioning is expressed using a set of symbolic tunable parameters, introduced early in the compilation pipeline. A symbolic performance model is then used in the last compilation stage to predict performance based on the possible values of the tunable parameters. Using this approach, a design space exploration is conducted on an Intel Arria 10 FPGAs (Field Programmable Gate Arrays), and competitive performance is achieved on the classical VGG and TinyYolo neural networks.