Publications

DA Wand: Distortion-Aware Selection Using Neural Mesh Parameterization
Richard Liu
Vladimir Kim
Rana Hanocka
We present a neural technique for learning to select a local sub-region around a point which can be used for mesh parameterization. The moti… (see more)vation for our framework is driven by interactive workflows used for decaling, texturing, or painting on surfaces. Our key idea is to incorporate segmentation probabilities as weights of a classical parameterization method, implemented as a novel differentiable parameterization layer within a neural network framework. We train a segmentation network to select 3D regions that are parameterized into 2D and penalized by the resulting distortion, giving rise to segmentations which are distortion-aware. Following training, a user can use our system to interactively select a point on the mesh and obtain a large, meaningful region around the selection which induces a low-distortion parameterization. Our code11https://github.com/threedle/DA-Wand and project22https://threedle.github.io/DA-Wand/ are publicly available.
Empirical Study on Optimizer Selection for Out-of-Distribution Generalization
Hiroki Naganuma
Kartik Ahuja
Shiro Takagi
Tetsuya Motokawa
Rio Yokota
Kohta Ishikawa
Ikuro Sato
Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution.… (see more) While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular first-order optimizers for different classes of distributional shift under empirical risk minimization and invariant risk minimization. We address this question for image and text classification using DomainBed, WILDS, and Backgrounds Challenge as testbeds for studying different types of shifts---namely correlation and diversity shift. We search over a wide range of hyperparameters and examine classification accuracy (in-distribution and out-of-distribution) for over 20,000 models. We arrive at the following findings, which we expect to be helpful for practitioners: i) adaptive optimizers (e.g., Adam) perform worse than non-adaptive optimizers (e.g., SGD, momentum SGD) on out-of-distribution performance. In particular, even though there is no significant difference in in-distribution performance, we show a measurable difference in out-of-distribution performance. ii) in-distribution performance and out-of-distribution performance exhibit three types of behavior depending on the dataset---linear returns, increasing returns, and diminishing returns. For example, in the training of natural language data using Adam, fine-tuning the performance of in-distribution performance does not significantly contribute to the out-of-distribution generalization performance.
Interpretable deep learning architectures for improving drug response prediction performance: myth or reality?
Yihui Li
David Earl Hostallero
Motivation: Recent advances in deep learning model development have enabled more accurate prediction of drug response in cancer. However, th… (see more)e black-box nature of these models still remains a hurdle in their adoption for precision cancer medicine. Recent efforts have focused on making these models interpretable by incorporating signaling pathway information in model architecture. While these models improve interpretability, it is unclear whether this higher interpretability comes at the cost of less accurate predictions, or a prediction improvement can also be obtained. Results: In this study, we comprehensively and systematically assessed four state-of-the-art interpretable models developed for drug response prediction to answer this question using three pathway collections. Our results showed that models that explicitly incorporate pathway information in the form of a latent layer perform worse compared to models that incorporate this information implicitly. Moreover, in most evaluation setups the best performance is achieved using a simple black-box model. In addition, replacing the signaling pathways with randomly generated pathways shows a comparable performance for the majority of these interpretable models. Our results suggest that new interpretable models are necessary to improve the drug response prediction performance. In addition, the current study provides different baseline models and evaluation setups necessary for such new models to demonstrate their superior prediction performance. Availability and Implementation: Implementation of all methods are provided in https://github.com/Emad-COMBINE-lab/InterpretableAI_for_DRP. Generated uniform datasets are in https://zenodo.org/record/7101665#.YzS79HbMKUk. Contact: amin.emad@mcgill.ca Supplementary Information: Online-only supplementary data is available at the journal’s website.
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Md. Rabiul Awal
Le Zhang
In this paper, we explore effective prompting techniques to enhance zero- and few-shot Visual Question Answering (VQA) performance in contem… (see more)porary Vision-Language Models (VLMs). Central to our investigation is the role of question templates in guiding VLMs to generate accurate answers. We identify that specific templates significantly influence VQA outcomes, underscoring the need for strategic template selection. Another pivotal aspect of our study is augmenting VLMs with image captions, providing them with additional visual cues alongside direct image features in VQA tasks. Surprisingly, this augmentation significantly improves the VLMs' performance in many cases, even though VLMs"see"the image directly! We explore chain-of-thought (CoT) reasoning and find that while standard CoT reasoning causes drops in performance, advanced methods like self-consistency can help recover it. Furthermore, we find that text-only few-shot examples enhance VLMs' alignment with the task format, particularly benefiting models prone to verbose zero-shot answers. Lastly, to mitigate the challenges associated with evaluating free-form open-ended VQA responses using string-matching based VQA metrics, we introduce a straightforward LLM-guided pre-processing technique to adapt the model responses to the expected ground-truth answer distribution. In summary, our research sheds light on the intricacies of prompting strategies in VLMs for VQA, emphasizing the synergistic use of captions, templates, and pre-processing to enhance model efficacy.
Preventing Dimensional Collapse in Contrastive Local Learning with Subsampling
Louis Fournier
Adeetya Patel
Michael Eickenberg
Edouard Oyallon
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang
Rabiul Awal
Vision-Language Models (VLMs), such as CLIP, exhibit strong image-text comprehension abilities, facilitating advances in several downstream … (see more)tasks such as zero-shot image classification, image-text retrieval, and text-to-image generation. However, the compositional reasoning abilities of existing VLMs remains subpar. The root of this limitation lies in the inadequate alignment between the images and captions in the pretraining datasets. Additionally, the current contrastive learning objective fails to focus on fine-grained grounding components like relations, actions, and attributes, resulting in"bag-of-words"representations. We introduce a simple and effective method to improve compositional reasoning in VLMs. Our method better leverages available datasets by refining and expanding the standard image-text contrastive learning framework. Our approach does not require specific annotations and does not incur extra parameters. When integrated with CLIP, our technique yields notable improvement over state-of-the-art baselines across five vision-language compositional benchmarks. We open-source our code at https://github.com/lezhang7/Enhance-FineGrained.
GEANT4-DNA simulation of temperature-dependent and pH-dependent yields of chemical radiolytic species
Jingyi Bian
Juan Duran
Wook-Geun Shin
Jose Ramos-Méndez
Jack C Sankey
Lilian Childress
Jan Seuntjens
LEAD: Min-Max Optimization from a Physical Perspective
Reyhane Askari Hemmat
Amartya Mitra
Adversarial formulations have rekindled interest in two-player min-max games. A central obstacle in the optimization of such games is the ro… (see more)tational dynamics that hinder their convergence. In this paper, we show that game optimization shares dynamic properties with particle systems subject to multiple forces, and one can leverage tools from physics to improve optimization dynamics. Inspired by the physical framework, we propose LEAD, an optimizer for min-max games. Next, using Lyapunov stability theory from dynamical systems as well as spectral analysis, we study LEAD’s convergence properties in continuous and discrete time settings for a class of quadratic min-max games to demonstrate linear convergence to the Nash equilibrium. Finally, we empirically evaluate our method on synthetic setups and CIFAR-10 image generation to demonstrate improvements in GAN training.
A solution algorithm for chance-constrained problems with integer second-stage recourse decisions
Enrico Malaguti
Michele Monaci
Giacomo Nannicini
Paolo
Paronuzzi
A2CiD2: Accelerating Asynchronous Communication in Decentralized Deep Learning
Adel Nabli
Edouard Oyallon
Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision
Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant… (see more) item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distinguish. We address the lack of sensitivity of reciprocal rank by introducing and connecting it to the concept of best-case retrieval, an evaluation method focusing on assessing the quality of a ranking for the most satisfied possible user across possible recall requirements. This perspective allows us to generalize reciprocal rank and define a new preference-based evaluation we call lexicographic precision or lexiprecision. By mathematical construction, we ensure that lexiprecision preserves differences detected by reciprocal rank, while empirically improving sensitivity and robustness across a broad set of retrieval and recommendation tasks.
Who Controlled the Evidence? Question Answering for Disclosure Information Extraction
Hardy Hardy
Derek Ruths
Nicholas B King
Conflict of interest (COI) disclosure statements provide rich information to support trans-parency and reduce bias in research. We introduce… (see more) a novel task to identify relationships between sponsoring entities and the research studies they sponsor from the disclosure statement. This task is challenging due to the complexity of recognizing all potential relationship patterns and the hierarchical nature of identifying entities first and then extracting their relationships to the study. To overcome these challenges, in this paper, we also constructed a new annotated dataset and proposed a Question Answering-based method to recognize entities and extract relationships. Our method has demonstrated robustness in handling diverse relationship patterns, and it remains effective even when trained on a low-resource dataset.