Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Hiroki Naganuma
Kartik Ahuja
Shiro Takagi
Tetsuya Motokawa
Rio Yokota
Kohta Ishikawa
Ikuro Sato
Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution.… (see more) While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts---namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings, which we expect to be helpful for practitioners: i) adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum SGD) on out-of-distribution performance. In particular, even though there is no significant difference in in-distribution performance, we show a measurable difference in out-of-distribution performance. ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset---linear returns, increasing returns, and diminishing returns. For example, in the training of natural language data using Adam, fine-tuning the performance of in-distribution performance does not significantly contribute to the out-of-distribution generalization performance.
Interpretable deep learning architectures for improving drug response prediction performance: myth or reality?
Yihui Li
David Earl Hostallero
Motivation: Recent advances in deep learning model development have enabled more accurate prediction of drug response in cancer. However, th… (see more)e black-box nature of these models still remains a hurdle in their adoption for precision cancer medicine. Recent efforts have focused on making these models interpretable by incorporating signaling pathway information in model architecture. While these models improve interpretability, it is unclear whether this higher interpretability comes at the cost of less accurate predictions, or a prediction improvement can also be obtained. Results: In this study, we comprehensively and systematically assessed four state-of-the-art interpretable models developed for drug response prediction to answer this question using three pathway collections. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. Moreover, in most evaluation setups the best performance is achieved using a simple black-box model. In addition, replacing the signaling pathways with randomly generated pathways shows a comparable performance for the majority of these interpretable models. Our results suggest that new interpretable models are necessary to improve the drug response prediction performance. In addition, the current study provides different baseline models and evaluation setups necessary for such new models to demonstrate their superior prediction performance. Availability and Implementation: Implementation of all methods are provided in https://github.com/Emad-COMBINE-lab/InterpretableAI_for_DRP. Generated uniform datasets are in https://zenodo.org/record/7101665#.YzS79HbMKUk. Contact: amin.emad@mcgill.ca Supplementary Information: Online-only supplementary data is available at the journal’s website.
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
In this paper, we explore effective prompting techniques to enhance zero- and few-shot Visual Question Answering (VQA) performance in contem… (see more)porary Vision-Language Models (VLMs). Central to our investigation is the role of question templates in guiding VLMs to generate accurate answers. We identify that specific templates significantly influence VQA outcomes, underscoring the need for strategic template selection. Another pivotal aspect of our study is augmenting VLMs with image captions, providing them with additional visual cues alongside direct image features in VQA tasks. Surprisingly, this augmentation significantly improves the VLMs' performance in many cases, even though VLMs"see"the image directly! We explore chain-of-thought (CoT) reasoning and find that while standard CoT reasoning causes drops in performance, advanced methods like self-consistency can help recover it. Furthermore, we find that text-only few-shot examples enhance VLMs' alignment with the task format, particularly benefiting models prone to verbose zero-shot answers. Lastly, to mitigate the challenges associated with evaluating free-form open-ended VQA responses using string-matching based VQA metrics, we introduce a straightforward LLM-guided pre-processing technique to adapt the model responses to the expected ground-truth answer distribution. In summary, our research sheds light on the intricacies of prompting strategies in VLMs for VQA, emphasizing the synergistic use of captions, templates, and pre-processing to enhance model efficacy.
Preventing Dimensional Collapse in Contrastive Local Learning with Subsampling
Louis Fournier
Adeetya Patel
Michael Eickenberg
Edouard Oyallon
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long… (see more) sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.
GEANT4-DNA simulation of temperature-dependent and pH-dependent yields of chemical radiolytic species
Jingyi Bian
Juan Duran
Wook-Geun Shin
Jose Ramos-Méndez
Jack C Sankey
Lilian Childress
Jan Seuntjens
A solution algorithm for chance-constrained problems with integer second-stage recourse decisions
Andrea Lodi
Enrico Malaguti
Michele Monaci
Giacomo Nannicini
Paolo
Paronuzzi
A2CiD2: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Edouard Oyallon