Publications

SCIsegV2: A Universal Tool for Segmentation of Intramedullary Lesions in Spinal Cord Injury

Enamundram Naga Karthik

Jan Valošek

Lynn Farner

Dario Pfyffer

Simon Schading-Sassenhausen

Anna Lebret

Gergely David

Andrew C. Smith

Kenneth A. Weber

Maryam Seif

Rhscir Network Imaging Group

Patrick Freund

Julien Cohen-Adad

2024-01-01

AMAI@MICCAI (publié)

doi.org

arxiv.org

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Siva Reddy

2024-01-01

Trans. Assoc. Comput. Linguistics (publié)

doi.org

arxiv.org

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation.

Guillaume Huguet

James Vuckovic

Kilian FATRAS

Eric Laufer

Pablo Lemos

Riashat Islam

Cheng-Hao Liu

Jarrid Rector-Brooks

Tara Akhound-Sadegh

Michael M. Bronstein

Alexander Tong

Joey Bose

2024-01-01

Neural Information Processing Systems (publié)

dblp.uni-trier.de

Sharpness-Aware Minimization Scaled by Outlier Normalization for Robust DNNs on In-Memory Computing Accelerators

Sébastien Henwood

Goncalo Mordido

Yvon Savaria

Sarath Chandar

François Leduc-Primeau

Many deep neural network (DNN) models consume a significant amount of energy at inference time, in large part due to energy consumed by memo… (voir plus)ry access. In-memory computing addresses this problem by eliminating many memory accesses, but exposes model weights to noise and circuit variations. While several methods have been proposed to train DNNs robust to weight noise they typically require knowledge of the noise distribution, or degrade the DNN performance in noiseless setting. In this work, we first show that applying sharpness-aware training, by optimizing for both the loss value and loss sharpness, significantly improves robustness to noisy weights at inference time. Then, we propose a new adaptive sharpness-aware method that conditions the worst-case perturbation of a given weight not only on its magnitude but also on the range of the weight distribution. This is achieved by performing sharpness-aware minimization scaled by outlier normalization (SAMSON). Results on computer-vision benchmarks show that SAMSON increases model robustness to noisy weights without compromising generalization performance in noiseless regimes.

2024-01-01

IEEECONF (publié)

doi.org

SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects

David Ifeoluwa Adelani

Hannah Liu

Xiaoyu Shen

Nikita Vassilyev

Jesujoba Oluwadara Alabi

Yanke Mao

Haonan Gao

Annie En-Shiun Lee

2024-01-01

EACL (1) (publié)

doi.org

arxiv.org

Simulation-Free Schrödinger Bridges via Score and Flow Matching

Alexander Tong

Nikolay Malkin

Kilian FATRAS

Lazar Atanackovic

Yanlei Zhang

Guillaume Huguet

Guy Wolf

Yoshua Bengio

We present simulation-free score and flow matching ([SF]…

2024-01-01

AISTATS (publié)

doi.org

openreview.net

Simultaneous linear connectivity of neural networks modulo permutation

Ekansh Sharma

Devin Kwok

Tom Denton

Daniel M. Roy

David Rolnick

Gintare Karolina Dziugaite

2024-01-01

ECML/PKDD (7) (publié)

doi.org

arxiv.org

softmax is not enough (for sharp out-of-distribution)

Petar Veličković

Christos Perivolaropoulos

Federico Barbero

Razvan Pascanu

A key property of reasoning systems is the ability to make sharp decisions on their input data. For contemporary AI systems, a key carrier o… (voir plus)f sharp behaviour is the softmax function, with its capability to perform differentiable query-key lookups. It is a common belief that the predictive power of networks leveraging softmax arises from "circuits" which sharply perform certain kinds of computations consistently across many diverse inputs. However, for these circuits to be robust, they would need to generalise well to arbitrary valid inputs. In this paper, we dispel this myth: even for tasks as simple as finding the maximum key, any learned circuitry must disperse as the number of items grows at test time. We attribute this to a fundamental limitation of the softmax function to robustly approximate sharp functions, prove this phenomenon theoretically, and propose adaptive temperature as an ad-hoc technique for improving the sharpness of softmax at inference time.

2024-01-01

arXiv.org (prépublication)

doi.org

arxiv.org

Source-Free Domain Adaptation for YOLO Object Detection

Simon Varailhon

Masih Aminbeidokhti

Marco Pedersoli

Eric Granger

Source-free domain adaptation (SFDA) is a challenging problem in object detection, where a pre-trained source model is adapted to a new targ… (voir plus)et domain without using any source domain data for privacy and efficiency reasons. Most state-of-the-art SFDA methods for object detection have been proposed for Faster-RCNN, a detector that is known to have high computational complexity. This paper focuses on domain adaptation techniques for real-world vision systems, particularly for the YOLO family of single-shot detectors known for their fast baselines and practical applications. Our proposed SFDA method - Source-Free YOLO (SF-YOLO) - relies on a teacher-student framework in which the student receives images with a learned, target domain-specific augmentation, allowing the model to be trained with only unlabeled target data and without requiring feature alignment. A challenge with self-training using a mean-teacher architecture in the absence of labels is the rapid decline of accuracy due to noisy or drifting pseudo-labels. To address this issue, a teacher-to-student communication mechanism is introduced to help stabilize the training and reduce the reliance on annotated target data for model selection. Despite its simplicity, our approach is competitive with state-of-the-art detectors on several challenging benchmark datasets, even sometimes outperforming methods that use source data for adaptation.

2024-01-01

ECCV Workshops (18) (publié)

doi.org

arxiv.org

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Ankit Vani

Bac Nguyen

Samuel Lavoie

Ranjay Krishna

Aaron Courville

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception al… (voir plus)lows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose SPARO, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using SPARO with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using SPARO, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual SPARO concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of SPARO's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.

2024-01-01

ECCV (66) (publié)

doi.org

arxiv.org

Stochastic Frank-Wolfe: Unified Analysis and Zoo of Special Cases

Ruslan Nazykov

Aleksandr Shestakov

Vladimir Solodkin

Aleksandr Beznosikov

Gauthier Gidel

Alexander Gasnikov

The Conditional Gradient (or Frank-Wolfe) method is one of the most well-known methods for solving constrained optimization problems appeari… (voir plus)ng in various machine learning tasks. The simplicity of iteration and applicability to many practical problems helped the method to gain popularity in the community. In recent years, the Frank-Wolfe algorithm received many different extensions, including stochastic modifications with variance reduction and coordinate sampling for training of huge models or distributed variants for big data problems. In this paper, we present a unified convergence analysis of the Stochastic Frank-Wolfe method that covers a large number of particular practical cases that may have completely different nature of stochasticity, intuitions and application areas. Our analysis is based on a key parametric assumption on the variance of the stochastic gradients. But unlike most works on unified analysis of other methods, such as SGD, we do not assume an unbiasedness of the real gradient estimation. We conduct analysis for convex and non-convex problems due to the popularity of both cases in machine learning. With this general theoretical framework, we not only cover rates of many known methods, but also develop numerous new methods. This shows the flexibility of our approach in developing new algorithms based on the Conditional Gradient approach. We also demonstrate the properties of the new methods through numerical experiments.

2024-01-01

International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

Stochastic Simulated Quantum Annealing for Fast Solution of Combinatorial Optimization Problems

Naoya Onizawa

Ryoma Sasaki

Duckgyu Shin

Warren Gross

Takahiro Hanyu

In this paper, we introduce stochastic simulated quantum annealing (SSQA) for large-scale combinatorial optimization problems. SSQA is desig… (voir plus)ned based on stochastic computing and quantum Monte Carlo, which can simulate quantum annealing (QA) by using multiple replicas of spins (probabilistic bits) in classical computing. The use of stochastic computing leads to an efficient parallel spin-state update algorithm, enabling quick search for a solution around the global minimum energy. Therefore, SSQA realizes quantum-like annealing for large-scale problems and can handle fully connected models in combinatorial optimization, unlike QA. The proposed method is evaluated in MATLAB on graph isomorphism problems, which are typical combinatorial optimization problems. The proposed method achieves a convergence speed an order of magnitude faster than a conventional stochastic simulaated annealing method. Additionally, it can handle a 100-times larger problem size compared to QA and a 25-times larger problem size compared to a traditional SA method, respectively, for similar convergence probabilities.

2024-01-01

IEEE Access (publié)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications