Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda
Matheus Gadelha
Vladimir Kim
Amit H. Bermano
Thibault Groueix
We propose a generative technique to edit 3D shapes, represented as meshes, NeRFs, or Gaussian Splats, in approximately 3 seconds, without t… (see more)he need for running an SDS type of optimization. Our key insight is to cast 3D editing as a multiview image inpainting problem, as this representation is generic and can be mapped back to any 3D representation using the bank of available Large Reconstruction Models. We explore different fine-tuning strategies to obtain both multiview generation and inpainting capabilities within the same diffusion model. In particular, the design of the inpainting mask is an important factor of training an inpainting model, and we propose several masking strategies to mimic the types of edits a user would perform on a 3D shape. Our approach takes 3D generative editing from hours to seconds and produces higher-quality results compared to previous works.
Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers
Fatemeh Nourilenjan Nokabadi
Jean-Francois Lalonde
Adversarial perturbations aim to deceive neural networks into predicting inaccurate results. For visual object trackers, adversarial attacks… (see more) have been developed to generate perturbations by manipulating the outputs. However, transformer trackers predict a specific bounding box instead of an object candidate list, which limits the applicability of many existing attack scenarios. To address this issue, we present a novel white-box approach to attack visual object trackers with transformer backbones using only one bounding box. From the tracker predicted bounding box, we generate a list of adversarial bounding boxes and compute the adversarial loss for those bounding boxes. Experimental results demonstrate that our simple yet effective attack outperforms existing attacks against several robust transformer trackers, including TransT-M, ROMTrack, and MixFormer, on popular benchmark tracking datasets such as GOT-10k, UAV123, and VOT2022STS.
Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers
Fatemeh Nourilenjan Nokabadi
Jean-Francois Lalonde
Adversarial perturbations aim to deceive neural networks into predicting inaccurate results. For visual object trackers, adversarial attacks… (see more) have been developed to generate perturbations by manipulating the outputs. However, transformer trackers predict a specific bounding box instead of an object candidate list, which limits the applicability of many existing attack scenarios. To address this issue, we present a novel white-box approach to attack visual object trackers with transformer backbones using only one bounding box. From the tracker predicted bounding box, we generate a list of adversarial bounding boxes and compute the adversarial loss for those bounding boxes. Experimental results demonstrate that our simple yet effective attack outperforms existing attacks against several robust transformer trackers, including TransT-M, ROMTrack, and MixFormer, on popular benchmark tracking datasets such as GOT-10k, UAV123, and VOT2022STS.
Tracing Optimization for Performance Modeling and Regression Detection
Kaveh Shahedi
Heng Li
Maxime Lamothe
Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describe… (see more)s the relationship between the performance of a system and its runtime activities. This process typically examines various aspects of a system's runtime behavior, such as the execution frequency of functions or methods, to forecast performance metrics like program execution time. By using performance models, developers can predict expected performance and thereby effectively identify and address unexpected performance regressions when actual performance deviates from the model's predictions. One common and precise method for capturing performance behavior is software tracing, which involves instrumenting the execution of a program, either at the kernel level (e.g., system calls) or application level (e.g., function calls). However, due to the nature of tracing, it can be highly resource-intensive, making it impractical for production environments where resources are limited. In this work, we propose statistical approaches to reduce tracing overhead by identifying and excluding performance-insensitive code regions, particularly application-level functions, from tracing while still building accurate performance models that can capture performance degradations. By selecting an optimal set of functions to be traced, we can construct optimized performance models that achieve an R-2 score of up to 99% and, sometimes, outperform full tracing models (models using non-optimized tracing data), while significantly reducing the tracing overhead by more than 80% in most cases. Our optimized performance models can also capture performance regressions in our studied programs effectively, demonstrating their usefulness in real-world scenarios. Our approach is fully automated, making it ready to be used in production environments with minimal human effort.
Tracing Optimization for Performance Modeling and Regression Detection
Kaveh Shahedi
Heng Li
Maxime Lamothe
Software performance modeling plays a crucial role in developing and maintaining software systems. A performance model analytically describe… (see more)s the relationship between the performance of a system and its runtime activities. This process typically examines various aspects of a system's runtime behavior, such as the execution frequency of functions or methods, to forecast performance metrics like program execution time. By using performance models, developers can predict expected performance and thereby effectively identify and address unexpected performance regressions when actual performance deviates from the model's predictions. One common and precise method for capturing performance behavior is software tracing, which involves instrumenting the execution of a program, either at the kernel level (e.g., system calls) or application level (e.g., function calls). However, due to the nature of tracing, it can be highly resource-intensive, making it impractical for production environments where resources are limited. In this work, we propose statistical approaches to reduce tracing overhead by identifying and excluding performance-insensitive code regions, particularly application-level functions, from tracing while still building accurate performance models that can capture performance degradations. By selecting an optimal set of functions to be traced, we can construct optimized performance models that achieve an R-2 score of up to 99% and, sometimes, outperform full tracing models (models using non-optimized tracing data), while significantly reducing the tracing overhead by more than 80% in most cases. Our optimized performance models can also capture performance regressions in our studied programs effectively, demonstrating their usefulness in real-world scenarios. Our approach is fully automated, making it ready to be used in production environments with minimal human effort.
Words Matter: Leveraging Individual Text Embeddings for Code Generation in CLIP Test-Time Adaptation
Shambhavi Mishra
Julio Silva-Rodríguez
Ismail Ben Ayed
Jose Dolz
Vision-language foundation models, such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, … (see more)these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we explore how to efficiently leverage class text information to mitigate these distribution drifts encountered by large pre-trained vision-language models (VLMs) during test-time inference. In particular, we propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem, which is efficiently solved with Optimal Transport. Furthermore, the proposed adaptation method (CLIP-OT) integrates a multiple template knowledge distillation approach, which replicates multi-view contrastive learning strategies in unsupervised representation learning but without incurring additional computational complexity. Extensive experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT, achieving performance gains of up to 7% over recent state-of-the-art methods, yet being computationally and memory efficient.
Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning
Ziyang Song
Qincheng Lu
He Zhu
Learning time-series representations for discriminative tasks, such as classification and regression, has been a long-standing challenge in … (see more)the healthcare domain. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on biosignals and longitudinal clinical records by both next-token and previous-token prediction in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignals and longitudinal clinical records, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from biosignal time-series sequences, even more so after fine-tuning on the task.
Bidirectional Generative Pre-training for Improving Healthcare Time-series Representation Learning
Ziyang Song
Qincheng Lu
Mike He Zhu
Evaluating the effectiveness of the Smart About Meds (SAM) mobile application among patients discharged from hospital: protocol of a randomised controlled trial
Robyn Tamblyn
Bettina Habib
Daniala L Weir
Elizaveta Frolova
Rolan Alattar
Jessica Rogozinsky
Caroline Beauchamp
Rosalba Pupo
Susan J Bartlett
Emily McDonald
Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
Heitor Rapela Medeiros
Masih Aminbeidokhti
Fidel A. Guerrero Peña
David Latortue
Eric Granger
A common practice in deep learning involves training large neural networks on massive datasets to achieve high accuracy across various domai… (see more)ns and tasks. While this approach works well in many application areas, it often fails drastically when processing data from a new modality with a significant distribution shift from the data used to pre-train the model. This paper focuses on adapting a large object detection model trained on RGB images to new data extracted from IR images with a substantial modality shift. We propose Modality Translator (ModTr) as an alternative to the common approach of fine-tuning a large model to the new modality. ModTr adapts the IR input image with a small transformation network trained to directly minimize the detection loss. The original RGB model can then work on the translated inputs without any further changes or fine-tuning to its parameters. Experimental results on translating from IR to RGB images on two well-known datasets show that our simple approach provides detectors that perform comparably or better than standard fine-tuning, without forgetting the knowledge of the original model. This opens the door to a more flexible and efficient service-based detection pipeline, where a unique and unaltered server, such as an RGB detector, runs constantly while being queried by different modalities, such as IR with the corresponding translations model. Our code is available at: https://github.com/heitorrapela/ModTr.
Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems
Emma Harvey
Emily Sheng
Su Lin Blodgett
Alexandra Chouldechova
Jean Garcia-Gathright
Hanna Wallach
To facilitate the measurement of representational harms caused by large language model (LLM)-based systems, the NLP research community has p… (see more)roduced and made publicly available numerous measurement instruments, including tools, datasets, metrics, benchmarks, annotation instructions, and other techniques. However, the research community lacks clarity about whether and to what extent these instruments meet the needs of practitioners tasked with developing and deploying LLM-based systems in the real world, and how these instruments could be improved. Via a series of semi-structured interviews with practitioners in a variety of roles in different organizations, we identify four types of challenges that prevent practitioners from effectively using publicly available instruments for measuring representational harms caused by LLM-based systems: (1) challenges related to using publicly available measurement instruments; (2) challenges related to doing measurement in practice; (3) challenges arising from measurement tasks involving LLM-based systems; and (4) challenges specific to measuring representational harms. Our goal is to advance the development of instruments for measuring representational harms that are well-suited to practitioner needs, thus better facilitating the responsible development and deployment of LLM-based systems.
Gaps Between Research and Practice When Measuring Representational Harms Caused by LLM-Based Systems
Emma Harvey
Emily Sheng
Su Lin Blodgett
Alexandra Chouldechova
Jean Garcia-Gathright
Hanna Wallach
To facilitate the measurement of representational harms caused by large language model (LLM)-based systems, the NLP research community has p… (see more)roduced and made publicly available numerous measurement instruments, including tools, datasets, metrics, benchmarks, annotation instructions, and other techniques. However, the research community lacks clarity about whether and to what extent these instruments meet the needs of practitioners tasked with developing and deploying LLM-based systems in the real world, and how these instruments could be improved. Via a series of semi-structured interviews with practitioners in a variety of roles in different organizations, we identify four types of challenges that prevent practitioners from effectively using publicly available instruments for measuring representational harms caused by LLM-based systems: (1) challenges related to using publicly available measurement instruments; (2) challenges related to doing measurement in practice; (3) challenges arising from measurement tasks involving LLM-based systems; and (4) challenges specific to measuring representational harms. Our goal is to advance the development of instruments for measuring representational harms that are well-suited to practitioner needs, thus better facilitating the responsible development and deployment of LLM-based systems.