Publications

Structured Pruning of Neural Networks for Constraints Learning

Matteo Cacciola

Antonio Frangioni

Andrea Lodi

In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse app… (see more)lications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.

2023-07-14

ArXiv (preprint)

The default network dominates neural responses to evolving movie stories

Enning Yang

Filip Milisav

Jakub Kopal

Avram J. Holmes

Georgios D. Mitsis

Bratislav Mišić

Emily S. Finn

Danilo Bzdok

2023-07-14

Nature Communications (published)

An 8‐channel Tx dipole and 20‐channel Rx loop coil array for MRI of the cervical spinal cord at 7 Tesla

Nibardo Lopez‐Rios

Kyle M. Gilbert

Daniel Papp

Gaspard Cereza

Alexandru Foias

Deshpande Rangaprakash

Markus W. May

Bastien Guerin

Lawrence L. Wald

Boris Keil

Jason P. Stockmann

Robert L. Barry

Julien Cohen-Adad

The quality of cervical spinal cord images can be improved by the use of tailored radiofrequency (RF) coil solutions for ultrahigh field ima… (see more)ging; however, very few commercial and research 7‐T RF coils currently exist for the spinal cord, and in particular, those with parallel transmission (pTx) capabilities. This work presents the design, testing, and validation of a pTx/Rx coil for the human neck and cervical/upper thoracic spinal cord. The pTx portion is composed of eight dipoles to ensure high homogeneity over this large region of the spinal cord. The Rx portion is made up of twenty semiadaptable overlapping loops to produce high signal‐to‐noise ratio (SNR) across the patient population. The coil housing is designed to facilitate patient positioning and comfort, while also being tight fitting to ensure high sensitivity. We demonstrate RF shimming capabilities to optimize B1+ uniformity, power efficiency, and/or specific absorption rate efficiency. B1+ homogeneity, SNR, and g‐factor were evaluated in adult volunteers and demonstrated excellent performance from the occipital lobe down to the T4‐T5 level. We compared the proposed coil with two state‐of‐the‐art head and head/neck coils, confirming its superiority in the cervical and upper thoracic regions of the spinal cord. This coil solution therefore provides a convincing platform for producing the high image quality necessary for clinical and research scanning of the upper spinal cord.

2023-07-13

NMR in Biomedicine (published)

Transformers in Reinforcement Learning: A Survey

Pranav Agarwal

Aamer Abdul Rahman

Pierre-Luc St-Charles

Simon J. D. Prince

Samira Ebrahimi Kahou

Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve perform… (see more)ance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.

2023-07-12

ArXiv (preprint)

AI For Global Climate Cooperation 2023 Competition Proceedings

Prateek Arun Gupta

Lu Li

Soham R. Phade

Sunil Srinivasa

andrew williams

Tianyu Zhang

Yangtian Zhang

Stephan Tao Zheng

The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achie… (see more)ve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.

2023-07-10

ArXiv (preprint)

Generative Flow Networks: a Markov Chain Perspective

Tristan Deleu

2023-07-04

ArXiv (preprint)

CrossSplit: Mitigating Label Noise Memorization through Data Splitting

Jihye Kim

Aristide Baratin

Yan Zhang

Simon Lacoste-Julien

We approach the problem of improving robustness of deep learning algorithms in the presence of label noise. Building upon existing label cor… (see more)rection and co-teaching methods, we propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit, which uses a pair of neural networks trained on two disjoint parts of the labeled dataset. CrossSplit combines two main ingredients: (i) Cross-split label correction. The idea is that, since the model trained on one part of the data cannot memorize example-label pairs from the other part, the training labels presented to each network can be smoothly adjusted by using the predictions of its peer network; (ii) Cross-split semi-supervised training. A network trained on one part of the data also uses the unlabeled inputs of the other part. Extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Discovering Object-Centric Generalized Value Functions From Pixels

Somjit Nath

Gopeshh Subbaraj

Khimya Khetarpal

Samira Ebrahimi Kahou

Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (see more)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent"question"functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

Discrete Key-Value Bottleneck

Frederik Träuble

Anirudh Goyal

Nasim Rahaman

Michael Curtis Mozer

Kenji Kawaguchi

Bernhard Schölkopf

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

FAENet: Frame Averaging Equivariant GNN for Materials Modeling

Alexandre AGM Duval

Victor Schmidt

Alex Hernandez-Garcia

Santiago Miret

Fragkiskos D. Malliaros

David Rolnick

Applications of machine learning techniques for materials modeling typically involve functions known to be equivariant or invariant to speci… (see more)fic symmetries. While graph neural networks (GNNs) have proven successful in such tasks, they enforce symmetries via the model architecture, which often reduces their expressivity, scalability and comprehensibility. In this paper, we introduce (1) a flexible framework relying on stochastic frame-averaging (SFA) to make any model E(3)-equivariant or invariant through data transformations. (2) FAENet: a simple, fast and expressive GNN, optimized for SFA, that processes geometric information without any symmetrypreserving design constraints. We prove the validity of our method theoretically and empirically demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X). A package implementation is available at https://faenet.readthedocs.io.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)

GFlowNet-EM for Learning Compositional Latent Variable Models

Edward J Hu

Nikolay Malkin

Moksh J. Jain

Katie E Everett

Alexandros Graikos

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large nu… (see more)mber of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.

2023-07-03

Proceedings of the 40th International Conference on Machine Learning (published)