Publications

Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
Rishika Bhagwatkar
Shravan Nayak
Reza Bayat
Alexis Roger
Daniel Z Kaplan
Vision-Language Models (VLMs) have witnessed a surge in both research and real-world applications. However, as they becoming increasingly pr… (voir plus)evalent, ensuring their robustness against adversarial attacks is paramount. This work systematically investigates the impact of model design choices on the adversarial robustness of VLMs against image-based attacks. Additionally, we introduce novel, cost-effective approaches to enhance robustness through prompt formatting. By rephrasing questions and suggesting potential adversarial perturbations, we demonstrate substantial improvements in model robustness against strong image-based attacks such as Auto-PGD. Our findings provide important guidelines for developing more robust VLMs, particularly for deployment in safety-critical environments.
Voices Unheard: NLP Resources and Models for Yor\`ub\'a Regional Dialects
Orevaoghene Ahia
Aremu Anuoluwapo
Diana Abagyan
Hila Gonen
Daud Abolade
Noah A. Smith
Yulia Tsvetkov
A Context-Driven Approach for Co-Auditing Smart Contracts with The Support of GPT-4 code interpreter
Mohamed Salah Bouafif
Chen Zheng
Ilham Qasse
Ed Zulkoski
Mohammad Hamdaqa
The surge in the adoption of smart contracts necessitates rigorous auditing to ensure their security and reliability. Manual auditing, altho… (voir plus)ugh comprehensive, is time-consuming and heavily reliant on the auditor's expertise. With the rise of Large Language Models (LLMs), there is growing interest in leveraging them to assist auditors in the auditing process (co-auditing). However, the effectiveness of LLMs in smart contract co-auditing is contingent upon the design of the input prompts, especially in terms of context description and code length. This paper introduces a novel context-driven prompting technique for smart contract co-auditing. Our approach employs three techniques for context scoping and augmentation, encompassing code scoping to chunk long code into self-contained code segments based on code inter-dependencies, assessment scoping to enhance context description based on the target assessment goal, thereby limiting the search space, and reporting scoping to force a specific format for the generated response. Through empirical evaluations on publicly available vulnerable contracts, our method demonstrated a detection rate of 96\% for vulnerable functions, outperforming the native prompting approach, which detected only 53\%. To assess the reliability of our prompting approach, manual analysis of the results was conducted by expert auditors from our partner, Quantstamp, a world-leading smart contract auditing company. The experts' analysis indicates that, in unlabeled datasets, our proposed approach enhances the proficiency of the GPT-4 code interpreter in detecting vulnerabilities.
Data harmonization for Advancing research on Personalized Rehabilitation Interventions for Patients with Traumatic Brain Injury and Stroke: A proof of concept
Dorra Rakia Allegue
Despoina Petsani
Nathalie Ponthon
Evdokimos Konstantinidis
Panagiotis Bamidis
Eva Kehayia
Sara Ahmed
Stroke and traumatic brain injury (TBI) are leading causes of morbidity and mortality, affecting survivors’ mobility and social participat… (voir plus)ion. Although personalized interventions could positively impact survivors' recovery, the effectiveness of such interventions remains unclear. Open-access data repositories can provide access to multiple shared data which could help uncover new evidence of effective interventions; however, harmonizing data between different studies requires many steps to make it possible given the various methods of data collection, intervention characteristics and population sociodemographic profile. This proof-of-concept study aimed to describe the steps and anchors that contributed to the development of guiding frameworks to harmonize data across different studies. Data were extracted from the Federal Interagency Traumatic Brain Injury Research (FITBIR) repository and stored on an online cloud platform. The outcome measures were mapped to mobility determinants using the International Classification of Functioning, Disability, and Health (ICF) and Webber framework. The intervention's effect was categorized according to the Minimal Clinically Important Difference (MCID)s of the measures administered. The study proposed a novel framework for intervention features, which aims to enhance our understanding of the mechanisms of action and potential impact of rehabilitation interventions. The framework classified interventions based on their nature, context, specific body systems, dosage, caregiver assistance, and behaviour change strategies. In conclusion, this study demonstrated the feasibility of harmonizing data extracted from different sources in the FITBIR repository. Leveraging existing open databases offers tremendous opportunities to advance research on personalized interventions for patients with TBI and stroke and inform decision-making during transitions.
Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers
Jonas Ngnaw'e
Sabyasachi Sahoo
Yann Pequignot
Frederic Precioso
Detecting Brittle Decisions for Free: Leveraging Margin Consistency in Deep Robust Classifiers
Jonas Ngnaw'e
Sabyasachi Sahoo
Yann Batiste Pequignot
Fr'ed'eric Precioso
Despite extensive research on adversarial training strategies to improve robustness, the decisions of even the most robust deep learning mod… (voir plus)els can still be quite sensitive to imperceptible perturbations, creating serious risks when deploying them for high-stakes real-world applications. While detecting such cases may be critical, evaluating a model's vulnerability at a per-instance level using adversarial attacks is computationally too intensive and unsuitable for real-time deployment scenarios. The input space margin is the exact score to detect non-robust samples and is intractable for deep neural networks. This paper introduces the concept of margin consistency -- a property that links the input space margins and the logit margins in robust models -- for efficient detection of vulnerable samples. First, we establish that margin consistency is a necessary and sufficient condition to use a model's logit margin as a score for identifying non-robust samples. Next, through comprehensive empirical analysis of various robustly trained models on CIFAR10 and CIFAR100 datasets, we show that they indicate strong margin consistency with a strong correlation between their input space margins and the logit margins. Then, we show that we can effectively use the logit margin to confidently detect brittle decisions with such models and accurately estimate robust accuracy on an arbitrarily large test set by estimating the input margins only on a small subset. Finally, we address cases where the model is not sufficiently margin-consistent by learning a pseudo-margin from the feature representation. Our findings highlight the potential of leveraging deep representations to efficiently assess adversarial vulnerability in deployment scenarios.
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Alexander Khazatsky
Karl Pertsch
Suraj Nair
Ashwin Balakrishna
Sudeep Dasari
Siddharth Karamcheti
Soroush Nasiriany
Mohan Kumar Srirama
Lawrence Yunliang Chen
Kirsty Ellis
Peter David Fagan
Joey Hejna
Masha Itkina
Marion Lepert
Yecheng Jason Ma
Ye Ma
Patrick Tree Miller
Jimmy Wu
Suneel Belkhale
Shivin Dass … (voir 80 de plus)
Huy Ha
Arhan Jain
Abraham Lee
Youngwoon Lee
Marius Memmel
Sungjae Park
Ilija Radosavovic
Kaiyuan Wang
Albert Zhan
Kevin Black
Cheng Chi
Kyle Beltran Hatch
Shan Lin
Jingpei Lu
Jean Mercat
Abdul Rehman
Pannag R Sanketi
Archit Sharma
Cody Simpson
Quan Vuong
Homer Rich Walke
Blake Wulfe
Ted Xiao
Jonathan Heewon Yang
Arefeh Yavary
Tony Z. Zhao
Christopher Agia
Rohan Baijal
Mateo Guaman Castro
Daphne Chen
Qiuyu Chen
Trinity Chung
Jaimyn Drake
Ethan Paul Foster
Jensen Gao
David Antonio Herrera
Minho Heo
Kyle Hsu
Jiaheng Hu
Donovon Jackson
Charlotte Le
Yunshuang Li
K. Lin
Roy Lin
Zehan Ma
Abhiram Maddukuri
Suvir Mirchandani
Daniel Morton
Tony Khuong Nguyen
Abigail O'Neill
Rosario Scalise
Derick Seale
Victor Son
Stephen Tian
Emi Tran
Andrew E. Wang
Yilin Wu
Annie Xie
Jingyun Yang
Patrick Yin
Yunchu Zhang
Osbert Bastani
Jeannette Bohg
Ken Goldberg
Abhinav Gupta
Abhishek Gupta
Dinesh Jayaraman
Joseph J Lim
Jitendra Malik
Roberto Martín-Martín
Subramanian Ramamoorthy
Dorsa Sadigh
Shuran Song
Jiajun Wu
Michael C. Yip
Yuke Zhu
Thomas Kollar
Sergey Levine
Chelsea Finn
The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and … (voir plus)robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
Implicit Diffusion: Efficient Optimization through Stochastic Sampling
Pierre Marion
Anna Korba
Peter Bartlett
Mathieu Blondel
Valentin De Bortoli
Arnaud Doucet
Felipe Llinares-López
Quentin Berthet
Learning to Design Data-structures: A Case Study of Nearest Neighbor Search
Omar Salemohamed
Vatsal Sharan
Shivam Garg
Gregory Valiant
We propose a general framework for automating data-structure design and apply it to the problem of nearest neighbor search. Our model adapts… (voir plus) to the underlying data distribution and provides fine-grained control over query and space complexity, enabling the discovery of solutions tailored to problem-specific constraints. We are able to reverse-engineer learned algorithms in several settings. In 1D, the model discovers optimal distribution (in)dependent algorithms such as binary search and variants of interpolation search. In higher dimensions, the model learns solutions that resemble K-d trees in some regimes, while in others, have elements of locality-sensitive hashing.
Mixture of Experts in a Mixture of RL settings
Timon Willi
Johan Samir Obando Ceron
Jakob Nicolaus Foerster
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to … (voir plus)distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with"amplified"non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.
BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging
Zeinab Sherkatghanad
Moloud Abdar
Mohammadreza Bakhtyari
Test-time augmentation (TTA) is a well-known technique employed during the testing phase of computer vision tasks. It involves aggregating m… (voir plus)ultiple augmented versions of input data. Combining predictions using a simple average formulation is a common and straightforward approach after performing TTA. This paper introduces a novel framework for optimizing TTA, called BayTTA (Bayesian-based TTA), which is based on Bayesian Model Averaging (BMA). First, we generate a model list associated with different variations of the input data created through TTA. Then, we use BMA to combine model predictions weighted by their respective posterior probabilities. Such an approach allows one to take into account model uncertainty, and thus to enhance the predictive performance of the related machine learning or deep learning model. We evaluate the performance of BayTTA on various public data, including three medical image datasets comprising skin cancer, breast cancer, and chest X-ray images and two well-known gene editing datasets, CRISPOR and GUIDE-seq. Our experimental results indicate that BayTTA can be effectively integrated into state-of-the-art deep learning models used in medical image analysis as well as into some popular pre-trained CNN models such as VGG-16, MobileNetV2, DenseNet201, ResNet152V2, and InceptionRes-NetV2, leading to the enhancement in their accuracy and robustness performance.
On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Johan Samir Obando Ceron
J. G. Ara'ujo
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and car… (voir plus)eful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed technique. Despite their crucial impact on performance, hyper-parameter choices are frequently overshadowed by algorithmic advancements. This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters. Our findings not only help establish which hyper-parameters are most critical to tune, but also help clarify which tunings remain consistent across different training regimes.