Publications

Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

Jayakumar Subramanian

Amit Sinha

2023-01-21

Dynamic Games and Applications (published)

Disentangling poststroke cognitive deficits and their neuroanatomical correlates through combined multivariable and multioutcome lesion‐symptom mapping

Nick A. Weaver

Muhammad Hasnain Mamdani

Jae‐Sung Lim

J. Matthijs Biesbroek

Geert Jan Biessels

Irene M. C. Huenges Wajer

Yeonwook Kang

Beom Joon Kim

Byung‐Chul Lee

Keon‐Joo Lee

Kyung‐Ho Yu

Hee-Joon Bae

Danilo Bzdok

Hugo J. Kuijf

2023-01-20

Human Brain Mapping (published)

doi.org

lo-fi: distributed fine-tuning without communication

Mitchell Wortsman

Suchin Gururangan

Shen Li

Ali Farhadi

Ludwig Schmidt

Michael Rabbat

Ari S. Morcos

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contra… (see more)st, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node fine-tunes independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

2023-01-20

TMLR (accepted)

doi.org

openreview.net

A Framework for Obtaining Accurate Posteriors of Strong Gravitational Lensing Parameters with Flexible Priors and Implicit Likelihoods Using Density Estimation

Ronan Legin

Yashar Hezaveh

Laurence Perreault-Levasseur

Benjamin Wandelt

We report the application of implicit likelihood inference to the prediction of the macroparameters of strong lensing systems with neural ne… (see more)tworks. This allows us to perform deep-learning analysis of lensing systems within a well-defined Bayesian statistical framework to explicitly impose desired priors on lensing variables, obtain accurate posteriors, and guarantee convergence to the optimal posterior in the limit of perfect performance. We train neural networks to perform a regression task to produce point estimates of lensing parameters. We then interpret these estimates as compressed statistics in our inference setup and model their likelihood function using mixture density networks. We compare our results with those of approximate Bayesian neural networks, discuss their significance, and point to future directions. Based on a test set of 100,000 strong lensing simulations, our amortized model produces accurate posteriors for any arbitrary confidence interval, with a maximum percentage deviation of 1.4% at the 21.8% confidence level, without the need for any added calibration procedure. In total, inferring 100,000 different posteriors takes a day on a single GPU, showing that the method scales well to the thousands of lenses expected to be discovered by upcoming sky surveys.

2023-01-19

The Astrophysical Journal (published)

doi.org

arxiv.org

Label fusion and training methods for reliable representation of inter-rater uncertainty

Andreanne Lemay

Charley Gros

Julien Cohen-Adad

Enamundram Naga Karthik

2023-01-18

Machine Learning for Biomedical Imaging (published)

doi.org

arxiv.org

From IID to the Independent Mechanisms assumption in continual learning

Oleksiy Ostapenko

Pau Rodriguez

Alexandre Lacoste

Laurent Charlin

2023-01-11

AAAI.org/2023/Bridge/CCBridge (accepted)

proceedings.mlr.press

openreview.net

From IID to the Independent Mechanisms assumption in continual learning

Oleksiy Ostapenko

Pau Rodriguez

Alexandre Lacoste

Laurent Charlin

Current machine learning algorithms are successful in learning clearly defined tasks from large i.i.d. data. Continual learning (CL) require… (see more)s learning without iid-ness and developing algorithms capable of knowledge retention and transfer, the latter can be boosted through systematic generalization. Dropping the i.i.d. assumption requires replacing it with another hypothesis. While there are several candidates, here we advocate that the independent mechanism assumption (IM) (Sch¨olkopf et al., 2012) is a useful hypothesis for representing knowledge in a form, that makes it easy to adapt to new tasks in CL. Specifically, we review several types of distribution shifts that are common in CL and point out in which way a system that represents knowledge in the form of causal modules may outperform monolithic counterparts in CL. Intuitively, the efficacy of IM solution emerges since (i) causal modules learn mechanisms invariant across domains; (ii) if causal mechanisms must be updated, modularity can enable efficient and sparse updates.

2023-01-11

AAAI.org/2023/Bridge/CCBridge (published)

openreview.net

Studying Logging Practice in Machine Learning-based Applications

Patrick Loic Foalem

Foutse Khomh

Heng Li

Logging is a common practice in traditional software development. Several research works have been done to investigate the different charact… (see more)eristics of logging practices in traditional software systems (e.g., Android applications, JAVA applications, C/C++ applications). Nowadays, we are witnessing more and more development of Machine Learning-based applications (ML-based applications). Today, there are many popular libraries that facilitate and contribute to the development of such applications, among which we can mention: Pytorch, Tensorflow, Theano, MXNet, Scikit-Learn, Caffe, and Keras. Despite the popularity of ML, we don't have a clear understanding of logging practices in ML applications. In this paper, we aim to fill this knowledge gap and help ML practitioners understand the characteristics of logging in ML-based applications. In particular, we conduct an empirical study on 110 open-source ML-based applications. Through a quantitative analysis, we find that logging practice in ML-based applications is less pervasive than in traditional applications including Android, JAVA, and C/C++ applications. Furthermore, the majority of logging statements in ML-based applications are in info and warn levels, compared to traditional applications where info is the majority of logging statement in C/C++ application and debug, error levels constitute the majority of logging statement in Android application. We also perform a quantitative and qualitative analysis of a random sample of logging statements to understand where ML developers put most of logging statements and examine why and how they are using logging. These analyses led to the following observations: (i) ML developers put most of the logging statements in model training, and in non-ML components. (ii) Data and model management appear to be the main reason behind the introduction of logging statements in ML-based applications.

2023-01-10

ArXiv (preprint)

doi.org

arxiv.org

SantaCoder: don't reach for the stars!

Loubna Ben allal

Raymond Li

Denis Kocetkov

Chenghao Mou

Christopher Akiki

Carlos Muñoz Ferrandis

Niklas Muennighoff

Mayank Mishra

Alex Gu

Manan Dey

Logesh Kumar Umapathi

Carolyn Jane Anderson

Yangtian Zi

Joel Lamy Poirier

Hailey Schoelkopf

S. Troshin

Dmitry Abulkhanov

Manuel L. Romero

M. Lappert

Francesco De Toni … (see 21 more)

Bernardo Garc'ia del R'io

Qian Liu

Shamik Bose

Urvashi Bhattacharyya

Terry Yue Zhuo

Ian Yu

Paulo Villegas

Marco Zocca

Sourab Mangrulkar

D. Lansky

Huu Nguyen

Danish Contractor

Luisa Villa

Jia LI

Dzmitry Bahdanau

Yacine Jernite

Sean Christopher Hughes

Daniel Fried

Arjun Guha

Harm de Vries

Leandro Von Werra

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech … (see more)report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigating better preprocessing methods for the training data. We train 1.1B parameter models on the Java, JavaScript, and Python subsets of The Stack and evaluate them on the MultiPL-E text-to-code benchmark. We find that more aggressive filtering of near-duplicates can further boost performance and, surprisingly, that selecting files from repositories with 5+ GitHub stars deteriorates performance significantly. Our best model outperforms previous open-source multilingual code generation models (InCoder-6.7B and CodeGen-Multi-2.7B) in both left-to-right generation and infilling on the Java, JavaScript, and Python portions of MultiPL-E, despite being a substantially smaller model. All models are released under an OpenRAIL license at https://hf.co/bigcode.

2023-01-09

ArXiv (preprint)

doi.org

arxiv.org

SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design

Houssem Ben Braiek

Ali Tfaily

Foutse Khomh

Thomas Reid

Ciro Guida

2023-01-05

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (published)

doi.org

arxiv.org

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Moein Heidari

Amirhossein Kazerouni

Milad Soltany

Reza Azad

Ehsan Khodapanah Aghdam

Julien Cohen-Adad

Dorit Merhof

Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation … (see more)in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure. Extensive experiments on various medical image segmentation datasets demonstrate the effectiveness of HiFormer over other CNN-based, transformer-based, and hybrid methods in terms of computational complexity, quantitative and qualitative results. Our code is publicly available at GitHub.

2023-01-02

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

doi.org

arxiv.org

Hyperspherical Quantization: Toward Smaller and More Accurate Models

Dan Liu

Xi Chen

Chen Ma

Xue (Steve) Liu

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing t… (see more)he model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32×, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels (~30×, ~40×), our method significantly improves the test accuracy and reduces the model size.

2023-01-02

2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (published)

doi.org

arxiv.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications