Publications

TrackPGD: Efficient Adversarial Attack using Object Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi

Yann Batiste Pequignot

Jean-Francois Lalonde

Christian Gagné

2025-05-27

Proceedings of the Conference on Robots and Vision (published)

doi.org

openreview.net

ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs

Pooneh Mousavi

Yingzhi Wang

Mirco Ravanelli

Cem Subakan

2025-05-26

ArXiv (preprint)

arxiv.org

Improving Multilingual Math Reasoning for African Languages

Odunayo Ogundepo

Akintunde Oladipo

Kelechi Ogueji

Esther Adenuga

David Ifeoluwa Adelani

Jimmy Lin

Researchers working on low-resource languages face persistent challenges due to limited data availability and restricted access to computati… (see more)onal resources. Although most large language models (LLMs) are predominantly trained in high-resource languages, adapting them to low-resource contexts, particularly African languages, requires specialized techniques. Several strategies have emerged for adapting models to low-resource languages in todays LLM landscape, defined by multi-stage pre-training and post-training paradigms. However, the most effective approaches remain uncertain. This work systematically investigates which adaptation strategies yield the best performance when extending existing LLMs to African languages. We conduct extensive experiments and ablation studies to evaluate different combinations of data types (translated versus synthetically generated), training stages (pre-training versus post-training), and other model adaptation configurations. Our experiments focuses on mathematical reasoning tasks, using the Llama 3.1 model family as our base model.

2025-05-26

ArXiv (preprint)

arxiv.org

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

Le Zhang

Bo Wang

Xipeng Qiu

Siva Reddy

Aishwarya Agrawal

We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, sign… (see more)ificantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.

2025-05-26

ArXiv (preprint)

arxiv.org

SCAR: Shapley Credit Assignment for More Efficient RLHF

Meng Cao

Shuyuan Zhang

Xiaojun Chang

Doina Precup

2025-05-26

ArXiv (preprint)

arxiv.org

The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages

Chris Emezue

The NaijaVoices Community

Busayo Awobade

Abraham Owodunni

Handel Emezue

Gloria Monica Tobechukwu Emezue

N. N. Emezue

Sewade Ogun

Bunmi Akinremi

David Ifeoluwa Adelani

Chris Pal

2025-05-26

ArXiv (preprint)

arxiv.org

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change

Manuela Gonz'alez-Gonz'alez

Soufiane Belharbi

Muhammad Osama Zeeshan

Masoumeh Sharafi

Muhammad Haseeb Aslam

Marco Pedersoli

Alessandro Lameiras Koerich

Simon Bacon

Eric Granger

Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of … (see more)digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H. This paper introduces a first Behavioural Ambivalence/Hesitancy (BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data. We include results baselines for BAH at frame- and video-level recognition in multi-modal setups, in addition to zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.

2025-05-25

ArXiv (preprint)

arxiv.org

Caption This, Reason That: VLMs Caught in the Middle

Zihan Weng

Lucas Gomez

Taylor Whittington Webb

Pouya Bashivan

2025-05-24

ArXiv (preprint)

arxiv.org

Leveraging Per-Instance Privacy for Machine Unlearning

Nazanin Mohammadi Sepahvand

Anvith Thudi

Berivan Isik

Ashmita Bhattacharyya

Nicolas Papernot

Eleni Triantafillou

Daniel M. Roy

Gintare Karolina Dziugaite

2025-05-24

ArXiv (preprint)

arxiv.org

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Pooneh Mousavi

Shubham Gupta

Cem Subakan

Mirco Ravanelli

Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (see more) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.

2025-05-24

ArXiv (preprint)

arxiv.org

Response letter to “Confounding by indication and exposure misclassification may undermine corticosteroid effect estimates in ICU patients with alcohol-related hepatitis”

Guillaume Dumas

Maxime Gasperment

Hafid AIT-OUFELLA

2025-05-24

Annals of Intensive Care (published)

doi.org

Introduction to the special issue on Computational Terminology

Ayla Rigouts Terryn

Patrick Drouin

2025-05-23

Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication (published)

doi.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications