Publications

Training Compute-Optimal Vision Transformers for Brain Encoding

Sana Ahmadi

Fraçois Paugam

Tristan Glatard

Lune P Bellec

The optimal training of a vision transformer for brain encoding depends on three factors: model size, data size, and computational resources… (see more). This study investigates these three pillars, focusing on the effects of data scaling, model scaling, and high-performance computing on brain encoding results. Using VideoGPT to extract efficient spatiotemporal features from videos and training a Ridge model to predict brain activity based on these features, we conducted benchmark experiments with varying data sizes (10k, 100k, 1M, 6M) and different model configurations of GPT-2, including hidden layer dimensions, number of layers, and number of attention heads. We also evaluated the effects of training models with 32-bit vs 16-bit floating point representations. Our results demonstrate that increasing the hidden layer dimensions significantly improves brain encoding performance, as evidenced by higher Pearson correlation coefficients across all subjects. In contrast, the number of attention heads does not have a significant effect on the encoding results. Additionally, increasing the number of layers shows some improvement in brain encoding correlations, but the trend is not as consistent as that observed with hidden layer dimensions. The data scaling results show that larger training datasets lead to improved brain encoding performance, with the highest Pearson correlation coefficients observed for the largest dataset size (6M). These findings highlight that the effects of data scaling are more significant compared to model scaling in enhancing brain encoding performance. Furthermore, we explored the impact of floating-point precision by comparing 32-bit and 16-bit representations. Training with 16-bit precision yielded the same brain encoding accuracy as 32-bit, while reducing training time by 1.17 times, demonstrating its efficiency for high-performance computing tasks.

2024-10-16

ArXiv (preprint)

doi.org

arxiv.org

BlabberSeg: Real-Time Embedded Open-Vocabulary Aerial Segmentation

Haechan Mark Bong

Ricardo de Azambuja

Giovanni Beltrame

Real-time aerial image segmentation plays an important role in the environmental perception of Uncrewed Aerial Vehicles (UAVs). We introduce… (see more) BlabberSeg, an optimized Vision-Language Model built on CLIPSeg for on-board, real-time processing of aerial images by UAVs. BlabberSeg improves the efficiency of CLIPSeg by reusing prompt and model features, reducing computational overhead while achieving real-time open-vocabulary aerial segmentation. We validated BlabberSeg in a safe landing scenario using the Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI) framework, which uses visual servoing and open-vocabulary segmentation. BlabberSeg reduces computational costs significantly, with a speed increase of 927.41% (16.78 Hz) on a NVIDIA Jetson Orin AGX (64GB) compared with the original CLIPSeg (1.81Hz), achieving real-time aerial segmentation with negligible loss in accuracy (2.1% as the ratio of the correctly segmented area with respect to CLIPSeg). BlabberSeg's source code is open and available online.

2024-10-15

ArXiv (preprint)

doi.org

arxiv.org

The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse

Ekansh Sharma

Daniel M. Roy

Gintare Karolina Dziugaite

2024-10-15

ArXiv (preprint)

doi.org

arxiv.org

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Genta Indra Winata

Frederikus Hudi

Patrick Amadeus Irawan

David Anugraha

Rifki Afina Putri

Yutong Wang

Adam Nohejl

Ubaidillah Ariq Prathama

Nedjma OUSIDHOUM

Afifa Amriani

Anar Rzayev

Anirban Das

Ashmari Pramodya

Aulia Adila

Bryan Wilie

Candy Olivia Mawalim

Ching Lam Cheng

Daud Abolade

Emmanuele Chersoni

Enrico Santus … (see 31 more)

Fariz Ikhwantri

Garry Kuwanto

Hanyang Zhao

Haryo Akbarianto Wibowo

Holy Lovenia

Jan Christian Blaise Cruz

Jan Wira Gotama Putra

Junho Myung

Lucky Susanto

Maria Angelica Riera Machin

Marina Zhukova

Michael Anugraha

Muhammad Farid Adilazuarda

Natasha Santosa

Peerat Limkonchotiwat

Raj Dabre

Rio Alexander Audino

Samuel Cahyawijaya

Shi-Xiong Zhang

Stephanie Yulia Salim

Yi Zhou

Yinxuan Gui

David Ifeoluwa Adelani

En-Shiun Annie Lee

Shogo Okada

Ayu Purwarianti

Alham Fikri Aji

Taro Watanabe

Derry Tanti Wijaya

Alice Oh

Chong-Wah Ngo

2024-10-15

ArXiv (preprint)

doi.org

arxiv.org

Adversarial Bounding Boxes Generation (ABBG) Attack against Visual Object Trackers

Fatemeh Nourilenjan Nokabadi

Jean-François Lalonde

Christian Gagné

Adversarial perturbations aim to deceive neural networks into predicting inaccurate results. For visual object trackers, adversarial attacks… (see more) have been developed to generate perturbations by manipulating the outputs. However, transformer trackers predict a specific bounding box instead of an object candidate list, which limits the applicability of many existing attack scenarios. To address this issue, we present a novel white-box approach to attack visual object trackers with transformer backbones using only one bounding box. From the tracker predicted bounding box, we generate a list of adversarial bounding boxes and compute the adversarial loss for those bounding boxes. Experimental results demonstrate that our simple yet effective attack outperforms existing attacks against several robust transformer trackers, including TransT-M, ROMTrack, and MixFormer, on popular benchmark tracking datasets such as GOT-10k, UAV123, and VOT2022STS.

2024-10-14

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

doi.org

openreview.net

Comparative evaluation of methodologies for estimating the effectiveness of non-pharmaceutical interventions in the context of COVID-19: a simulation study

Iris Ganser

Juliette Paireau

David L Buckeridge

Simon Cauchemez

Rodolphe Thiébaut

M. Prague

2024-10-14

medRxiv (preprint)

doi.org

Learning to Forget using Hypernetworks

Jose Miguel Lara Rangel

Usman Anwar

Stefan Schoepf

Jack Foster

David M. Krueger

Machine unlearning is gaining increasing attention as a way to remove adversarial data poisoning attacks from already trained models and to … (see more)comply with privacy and AI regulations. The objective is to unlearn the effect of undesired data from a trained model while maintaining performance on the remaining data. This paper introduces HyperForget, a novel machine unlearning framework that leverages hypernetworks– neural networks that generate parameters for other networks– to dynamically sample models that lack knowledge of targeted data while preserving essential capabilities. Leveraging diffusion models, we implement two Diffusion HyperForget Networks and used them to sample unlearned models in Proof-of-Concept experiments. The unlearned models obtained zero accuracy on the forget set, while preserving good accuracy on the retain sets, highlighting the potential of HyperForget for dynamic targeted data removal and a promising direction for developing adaptive machine unlearning algorithms.

2024-10-14

NeurIPS.cc/2024/Workshop/AdvML-Frontiers (published)

openreview.net

Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration

Rongge Zhang

Haechan Mark Bong

Giovanni Beltrame

2024-10-13

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

arxiv.org

Local Linearity is All You Need (in Data-Driven Teleoperation)

Michael Przystupa

Gauthier Gidel

Matthew E. Taylor

Martin Jagersand

Justus Piater

Samuele Tosatto

One of the critical aspects of assistive robotics is to provide a control system of a high-dimensional robot from a low-dimensional user inp… (see more)ut (i.e. a 2D joystick). Data-driven teleoperation seeks to provide an intuitive user interface called an action map to map the low dimensional input to robot velocities from human demonstrations. Action maps are machine learning models trained on robotic demonstration data to map user input directly to desired movements as opposed to aspects of robot pose ("move to cup or pour content" vs. "move along x- or y-axis"). Many works have investigated nonlinear action maps with multi-layer perceptrons, but recent work suggests that local-linear neural approximations provide better control of the system. However, local linear models assume actions exist on a linear subspace and may not capture nuanced motions in training data. In this work, we hypothesize that local-linear neural networks are effective because they make the action map odd w.r.t. the user input, enhancing the intuitiveness of the controller. Based on this assumption, we propose two nonlinear means of encoding odd behavior that do not constrain the action map to a local linear function. However, our analysis reveals that these models effectively behave like local linear models for relevant mappings between user joysticks and robot movements. We support this claim in simulation, and show on a realworld use case that there is no statistical benefit of using non-linear maps, according to the users experience. These negative results suggest that further investigation into model architectures beyond local linear models may offer diminishing returns for improving user experience in data-driven teleoperation systems.

2024-10-13

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

PhotoBot: Reference-Guided Interactive Photography via Natural Language

Oliver Limoyo

Jimmy Li

Dmitriy Rivkin

Jonathan Kelly

Gregory Dudek

We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance an… (see more)d a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.

2024-10-13

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

arxiv.org

The Canadian VirusSeq Data Portal and Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology

Erin E. Gill

Baofeng Jia

Carmen Lia Murall

Raphaël Poujol

Muhammad Zohaib Anwar

Nithu Sara John

Justin Richardsson

Ashley Hobb

Abayomi S. Olabode

Alexandru Lepsa

Ana T. Duggan

Andrea D. Tyler

Arnaud N'Guessan

Atul Kachru

Brandon Chan

Catherine Yoshida

Christina K. Yung

David Bujold

Dusan Andric

Edmund Su … (see 47 more)

Emma J. Griffiths

Gary Van Domselaar

Gordon W. Jolly

Heather K. E. Ward

Henrich Feher

Jared Baker

Jared T. Simpson

Jaser Uddin

Jiannis Ragoussis

Jon Eubank

Jörg H. Fritz

José Héctor Gálvez

Karen Fang

Kim Cullion

Leonardo Rivera

Linda Xiang

Matthew A. Croxen

Mitchell Shiell

Natalie Prystajecky

Pierre-Olivier Quirion

Rosita Bajari

Samantha Rich

Samira Mubareka

Sandrine Moreira

Scott Cain

Steven G. Sutcliffe

Susanne A. Kraemer

Yelizar Alturmessov

Yann Joly

Marc Fiume

Terrance P. Snutch

Cindy Bell

Catalina Lopez-Correa

Julie G. Hussin

Jeffrey B. Joy

Caroline Colijn

Paul M. K. Gordon

William W. L. Hsiao

Art F. Y. Poon

Natalie C. Knox

Mélanie Courtot

Lincoln Stein

Sarah P. Otto

Guillaume Bourque

B. Jesse Shapiro

Fiona S. L. Brinkman

The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform t… (see more)he public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN – VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). In addition, the portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. From inception to execution, the portal was developed with a conscientious focus on strong data governance principles and practices. Extensive efforts ensured a commitment to Canadian privacy laws, data security standards, and organizational processes. This portal has been coupled with other resources, such as Viral AI, and was further leveraged by the Coronavirus Variants Rapid Response Network (CoVaRR-Net) to produce a suite of continually updated analytical tools and notebooks. Here we highlight this portal (https://virusseq-dataportal.ca/), including its contextual data not available elsewhere, and the Duotang (https://covarr-net.github.io/duotang/duotang.html), a web platform that presents key genomic epidemiology and modelling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the portal (COVID-MVP, CoVizu), are all open source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.

2024-10-13

Microbial Genomics (published)

doi.org

Working Backwards: Learning to Place by Picking

Oliver Limoyo

Abhisek Konar

Trevor Ablett

Jonathan Kelly

Francois Hogan

Gregory Dudek

2024-10-13

2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

doi.org

arxiv.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications