Publications

The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse

Ekansh Sharma

Daniel M. Roy

Gintare Karolina Dziugaite

2024-10-16

ArXiv (preprint)

doi.org

arxiv.org

Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers

Fatemeh Nourilenjan Nokabadi

Jean-Francois Lalonde

Christian Gagné

Adversarial perturbations aim to deceive neural networks into predicting inaccurate results. For visual object trackers, adversarial attacks… (see more) have been developed to generate perturbations by manipulating the outputs. However, transformer trackers predict a specific bounding box instead of an object candidate list, which limits the applicability of many existing attack scenarios. To address this issue, we present a novel white-box approach to attack visual object trackers with transformer backbones using only one bounding box. From the tracker predicted bounding box, we generate a list of adversarial bounding boxes and compute the adversarial loss for those bounding boxes. Experimental results demonstrate that our simple yet effective attack outperforms existing attacks against several robust transformer trackers, including TransT-M, ROMTrack, and MixFormer, on popular benchmark tracking datasets such as GOT-10k, UAV123, and VOT2022STS.

2024-10-15

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

doi.org

openreview.net

Learning to Forget using Hypernetworks

Jose Miguel Lara Rangel

Usman Anwar

Stefan Schoepf

Jack Foster

David Scott Krueger

Machine unlearning is gaining increasing attention as a way to remove adversarial data poisoning attacks from already trained models and to … (see more)comply with privacy and AI regulations. The objective is to unlearn the effect of undesired data from a trained model while maintaining performance on the remaining data. This paper introduces HyperForget, a novel machine unlearning framework that leverages hypernetworks– neural networks that generate parameters for other networks– to dynamically sample models that lack knowledge of targeted data while preserving essential capabilities. Leveraging diffusion models, we implement two Diffusion HyperForget Networks and used them to sample unlearned models in Proof-of-Concept experiments. The unlearned models obtained zero accuracy on the forget set, while preserving good accuracy on the retain sets, highlighting the potential of HyperForget for dynamic targeted data removal and a promising direction for developing adaptive machine unlearning algorithms.

2024-10-15

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

openreview.net

Structure-function coupling and decoupling during movie-watching and resting-state: Novel insights bridging EEG and structural imaging

Venkatesh Subramani

Giulia Lioi

Karim Jerbi

Nicolas Farrugia

The intricate structural and functional architecture of the brain enables a wide range of cognitive processes ranging from perception and ac… (see more)tion to higher-order abstract thinking. Despite important progress, the relationship between the brain’s structural and functional properties is not yet fully established. In particular, the way the brain’s anatomy shapes its electrophysiological dynamics remains elusive. The electroencephalography (EEG) activity recorded during naturalistic tasks is thought to exhibit patterns of coupling with the underlying brain structure that vary as a function of behavior. Yet these patterns have not yet been sufficiently quantified. We address this gap by jointly examining individual Diffusion-Weighted Imaging (DWI) scans and continuous EEG recorded during video-watching and resting state, using a Graph Signal Processing (GSP) framework. By decomposing the structural graph into Eigenmodes and expressing the EEG activity as an extension of anatomy, GSP provides a way to quantify the structure-function coupling. We elucidate how the structure shapes function during naturalistic tasks such as movie-watching and how this association is modulated by tasks. We quantify the coupling relationship in a region-, time-, frequency-resolved manner. First of all, our findings indicate that the EEG activity in the sensorimotor cortex is strongly coupled with brain structure, while the activity in higher-order systems is less constrained by anatomy, i.e., shows more flexibility. In addition, we found that watching videos was associated with stronger structure-function coupling in the sensorimotor cortex, as compared to resting-state data. Second, time-resolved analysis revealed that the unimodal systems undergo minimal temporal fluctuation in structure-function association, and the transmodal system displays highest temporal fluctuations, with the exception of PCC seeing low fluctuations. Lastly, our frequency-resolved analysis revealed a consistent topography across different EEG rhythms, suggesting a similar relationship with the anatomical structure across frequency bands. Together, this unprecedented characterization of the link between structure and function using continuous EEG during naturalistic behavior underscores the role of anatomy in shaping ongoing cognitive processes. Taken together, by combining the temporal and spectral resolution of EEG and the methodological advantages of GSP, our work sheds new light onto the anatomo-functional organization of the brain.

2024-10-15

bioRxiv (preprint)

doi.org

Structure-function coupling and decoupling during movie-watching and resting-state: Novel insights bridging EEG and structural imaging

Venkatesh Subramani

Giulia Lioi

Karim Jerbi

Nicolas Farrugia

2024-10-15

bioRxiv (preprint)

doi.org

TrackPGD: Efficient Adversarial Attack using Object Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi

Yann Batiste Pequignot

Jean-Francois Lalonde

Christian Gagné

Adversarial perturbations can deceive neural networks by adding small, imperceptible noise to the input. Recent object trackers with transfo… (see more)rmer backbones have shown strong performance on tracking datasets, but their adversarial robustness has not been thoroughly evaluated. While transformer trackers are resilient to black-box attacks, existing white-box adversarial attacks are not universally applicable against these new transformer trackers due to differences in backbone architecture. In this work, we introduce TrackPGD, a novel white-box attack that utilizes predicted object binary masks to target robust transformer trackers. Built upon the powerful segmentation attack SegPGD, our proposed TrackPGD effectively influences the decisions of transformer-based trackers. Our method addresses two primary challenges in adapting a segmentation attack for trackers: limited class numbers and extreme pixel class imbalance. TrackPGD uses the same number of iterations as other attack methods for tracker networks and produces competitive adversarial examples that mislead transformer and non-transformer trackers such as MixFormerM, OSTrackSTS, TransT-SEG, and RTS on datasets including VOT2022STS, DAVIS2016, UAV123, and GOT-10k.

2024-10-15

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

openreview.net

TrackPGD: Efficient Adversarial Attack using Object Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi

Yann Batiste Pequignot

Jean-Francois Lalonde

Christian Gagné

2024-10-15

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

openreview.net

Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration

Rongge Zhang

Haechan Mark Bong

Giovanni Beltrame

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

arxiv.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (see more)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

IEEE/RJS International Conference on Intelligent RObots and Systems (published)

doi.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (see more)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jägersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (see more)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-14

IEEE/RJS International Conference on Intelligent RObots and Systems (published)

doi.org

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Oliver Limoyo

Jimmy Li

Dmitriy Rivkin

Jonathan Kelly

Gregory Dudek

We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance an… (see more)d a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.

2024-10-14

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications