Publications

TrackPGD: Efficient Adversarial Attack using Object Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi

Yann Batiste Pequignot

Jean-Francois Lalonde

Adversarial perturbations can deceive neural networks by adding small, imperceptible noise to the input. Recent object trackers with transfo… (voir plus)rmer backbones have shown strong performance on tracking datasets, but their adversarial robustness has not been thoroughly evaluated. While transformer trackers are resilient to black-box attacks, existing white-box adversarial attacks are not universally applicable against these new transformer trackers due to differences in backbone architecture. In this work, we introduce TrackPGD, a novel white-box attack that utilizes predicted object binary masks to target robust transformer trackers. Built upon the powerful segmentation attack SegPGD, our proposed TrackPGD effectively influences the decisions of transformer-based trackers. Our method addresses two primary challenges in adapting a segmentation attack for trackers: limited class numbers and extreme pixel class imbalance. TrackPGD uses the same number of iterations as other attack methods for tracker networks and produces competitive adversarial examples that mislead transformer and non-transformer trackers such as MixFormerM, OSTrackSTS, TransT-SEG, and RTS on datasets including VOT2022STS, DAVIS2016, UAV123, and GOT-10k.

2024-10-15

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (publié)

openreview.net

Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration

Rongge Zhang

Haechan Mark Bong

Giovanni Beltrame

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

doi.org

arxiv.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (voir plus)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

IEEE/RJS International Conference on Intelligent RObots and Systems (publié)

doi.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (voir plus)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

doi.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (voir plus)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

IEEE/RJS International Conference on Intelligent RObots and Systems (publié)

doi.org

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Oliver Limoyo

Jimmy Li

Dmitriy Rivkin

Jonathan Kelly

Gregory Dudek

We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance an… (voir plus)d a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

doi.org

arxiv.org

Working Backwards: Learning to Place by Picking

Oliver Limoyo

Abhisek Konar

Trevor Ablett

Jonathan Kelly

Francois Hogan

Gregory Dudek

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

doi.org

arxiv.org

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

DiJia Su

Sainbayar Sukhbaatar

Michael Rabbat

Yuandong Tian

Qinqing Zheng

In human cognition theory, human thinking is governed by two systems: the fast and intuitive System 1 and the slower but more deliberative S… (voir plus)ystem 2. Recent studies have shown that incorporating System 2 process into Transformers including large language models (LLMs), significantly enhances their reasoning capabilities. Nevertheless, models that purely resemble System 2 thinking require substantially higher computational costs and are much slower to respond. To address this challenge, we present Dualformer, a single Transformer model that seamlessly integrates both the fast and slow reasoning modes. Dualformer is obtained by training on data with randomized reasoning traces, where different parts of the traces are dropped during training. The dropping strategies are specifically tailored according to the trace structure, analogous to analyzing our thinking process and creating shortcuts with patterns. At inference time, our model can be configured to output only the solutions (fast mode) or both the reasoning chain and the final solution (slow mode), or automatically decide which mode to engage (auto mode). In all cases, Dualformer outperforms the corresponding baseline models in both performance and computational efficiency: (1) in slow mode, Dualformer optimally solves unseen 30 x 30 maze navigation tasks 97.6% of the time, surpassing the Searchformer (trained on data with complete reasoning traces) baseline performance of 93.3%, while only using 45.5% fewer reasoning steps; (2) in fast mode, Dualformer completes those tasks with an 80% optimal rate, significantly outperforming the Solution-Only model (trained on solution-only data), which has an optimal rate of only 30%. For math problems, our techniques have also achieved improved performance with LLM fine-tuning, showing its generalization beyond task-specific models.

2024-10-13

ArXiv (prépublication)

doi.org

arxiv.org

Dynamic Abstractions: Building the Next Generation of Cognitive Tools and Interfaces

Sangho Suh

Hai Dang

Ryan Yen

Josh M. Pollock

Ian Arawjo

Rubaiat Habib Kazi

Hariharan Subramonyam

Jingyi Li

Nazmus Saquib

Arvind Satyanarayan

2024-10-13

The 37th Annual ACM Symposium on User Interface Software and Technology (publié)

doi.org

Effective Protein-Protein Interaction Exploration with PPIretrieval

Chenqing Hua

Connor W. Coley

Guy Wolf

Doina Precup

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

doi.org

openreview.net

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Chenqing Hua

Yong Liu

Dinghuai Zhang

Odin Zhang

Sitao Luan

Kevin K Yang

Guy Wolf

Doina Precup

Shuangjia Zheng

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

doi.org

openreview.net

Molphenix: A Multimodal Foundation Model for PhenoMolecular Retrieval

Philip Fradkin

Puria Azadi Moghadam

Karush Suri

Frederik Wenkel

Maciej Sypetkowski

Dominique Beaini

Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellu… (voir plus)lar morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1

2024-10-13

NeurIPS.cc/2024/Workshop/AIDrugX (poster)

openreview.net

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications