Publications

One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration

Jinbang Huang

Yixin Xiao

Zhanguang Zhang

Jianye Hao

Yingxue Zhang

Pre-trained Large Language Models (LLMs) have shown promise in solving planning problems but often struggle to ensure plan correctness, espe… (voir plus)cially for long-horizon tasks. Meanwhile, traditional robotic task and motion planning (TAMP) frameworks address these challenges more reliably by combining high-level symbolic search with low-level motion planning. However, TAMP relies on the availability of planning domains that typically involve substantial manual effort and domain expertise, limiting its generalizability. We introduce Planning Domain Derivation with LLMs (PDDLLM), a novel approach that combines simulated physical interaction with LLM reasoning to improve planning performance. The method reduces reliance on humans by inferring planning domains from a single annotated task-execution demonstration. Unlike prior domain-inference methods that rely on partially predefined or language descriptions of planning domains, PDDLLM constructs domains entirely from scratch and automatically integrates them with low-level motion planning skills, enabling fully automated long-horizon planning. PDDLLM is evaluated on over 1,200 diverse tasks spanning nine environments and benchmarked against six LLM-based planning baselines, demonstrating superior planning performance, lower token costs, and successful deployment on multiple robot platforms.

2025-05-28

thecvf.com/CVPR/2025/Workshop/FMEA (présentation orale)

openreview.net

Artificial Neural Networks for Magnetoencephalography: A review of an emerging field

Vanessa Hadid

Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive proces… (voir plus)ses with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.

2025-05-27

Journal of Neural Engineering (publié)

doi.org

arxiv.org

Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead

Jesujoba Oluwadara Alabi

Michael A. Hedderich

David Ifeoluwa Adelani

Dietrich Klakow

2025-05-27

ArXiv (prépublication)

doi.org

arxiv.org

Combining cortical and spinal stimulation maximizes improvement of gait after spinal cord injury

Roxanne Drainville

Marco Bonizzato

Davide Burchielli

Rose Guay-Hottin

Alexandre Sheasby

Marina Martinez

2025-05-27

bioRxiv (prépublication)

doi.org

A Python Toolbox for Representational Similarity Analysis

Jasper JF van den Bosch

Tal Golan

Benjamin Peters

JohnMark Taylor

Mahdiyar Shahbazi

Baihan Lin

Ian Charest

Jörn Diedrichsen

Nikolaus Kriegeskorte

Marieke Mur

Heiko H. Schütt

2025-05-27

bioRxiv (prépublication)

doi.org

Quantifying antimicrobial resistance in food-producing animals in North America

Mohamed Mediouni

Abdoulaye Banire Diallo

Vladimir Makarenkov

The global misuse of antimicrobial medication has further exacerbated the problem of antimicrobial resistance (AMR), enriching the pool of g… (voir plus)enetic mechanisms previously adopted by bacteria to evade antimicrobial drugs. AMR can be either intrinsic or acquired. It can be acquired either by selective genetic modification or by horizontal gene transfer that allows microorganisms to incorporate novel genes from other organisms or environments into their genomes. To avoid an eventual antimicrobial mistreatment, the use of antimicrobials in farm animal has been recently reconsidered in many countries. We present a systematic review of the literature discussing the cases of AMR and the related restrictions applied in North American countries (including Canada, Mexico, and the USA). The Google Scholar, PubMed, Embase, Web of Science, and Cochrane databases were searched to find plausible information on antimicrobial use and resistance in food-producing animals, covering the time period from 2015 to 2024. A total of 580 articles addressing the issue of antibiotic resistance in food-producing animals in North America met our inclusion criteria. Different AMR rates, depending on the bacterium being observed, the antibiotic class being used, and the farm animal being considered, have been identified. We determined that the highest average AMR rates have been observed for pigs (60.63% on average), the medium for cattle (48.94% on average), and the lowest for poultry (28.43% on average). We also found that Cephalosporines, Penicillins, and Tetracyclines are the antibiotic classes with the highest average AMR rates (65.86%, 61.32%, and 58.82%, respectively), whereas the use of Sulfonamides and Quinolones leads to the lowest average AMR (21.59% and 28.07%, respectively). Moreover, our analysis of antibiotic-resistant bacteria shows that Streptococcus suis (S. suis) and S. auerus provide the highest average AMR rates (71.81% and 69.48%, respectively), whereas Campylobacter spp. provides the lowest one (29.75%). The highest average AMR percentage, 57.46%, was observed in Mexico, followed by Canada at 45.22%, and the USA at 42.25%, which is most probably due to the presence of various AMR control strategies, such as stewardship programs and AMR surveillance bodies, existing in Canada and the USA. Our review highlights the need for better strategies and regulations to control the spread of AMR in North America.

2025-05-27

Frontiers in Microbiology (publié)

doi.org

Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan A. Rodriguez

Haotian Zhang

Abhay Puri

Aarash Feizi

Rishav Pramanik

Pascal Wichmann

Arnab Mondal

Mohammad Reza Samsami

Rabiul Awal

Perouz Taslakian

Spandana Gella

Sai Rajeswar

David Vazquez

Chris Pal

Marco Pedersoli

Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in vision-lang… (voir plus)uage models (VLMs) have enabled high-quality SVG generation by framing the problem as a code generation task and leveraging large-scale pretraining. VLMs are particularly suitable for this task as they capture both global semantics and fine-grained visual patterns, while transferring knowledge across vision, natural language, and code domains. However, existing VLM approaches often struggle to produce faithful and efficient SVGs because they never observe the rendered images during training. Although differentiable rendering for autoregressive SVG code generation remains unavailable, rendered outputs can still be compared to original inputs, enabling evaluative feedback suitable for reinforcement learning (RL). We introduce RLRF(Reinforcement Learning from Rendering Feedback), an RL method that enhances SVG generation in autoregressive VLMs by leveraging feedback from rendered SVG outputs. Given an input image, the model generates SVG roll-outs that are rendered and compared to the original image to compute a reward. This visual fidelity feedback guides the model toward producing more accurate, efficient, and semantically coherent SVGs. RLRF significantly outperforms supervised fine-tuning, addressing common failure modes and enabling precise, high-quality SVG generation with strong structural understanding and generalization.

2025-05-27

ArXiv (prépublication)

doi.org

arxiv.org

TrackPGD: Efficient Adversarial Attack using Object Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi

Yann Batiste Pequignot

Jean-Francois Lalonde

Christian Gagné

2025-05-27

Proceedings of the Conference on Robots and Vision (publié)

doi.org

openreview.net

Disentangled Source-Free Personalization for Facial Expression Recognition with Neutral Target Data

Masoumeh Sharafi

Emma Ollivier

Muhammad Osama Zeeshan

Soufiane Belharbi

Marco Pedersoli

Alessandro Lameiras Koerich

Simon Bacon

Eric Granger

2025-05-26

2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG) (publié)

doi.org

arxiv.org

The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages

Chris Emezue

The NaijaVoices Community

Busayo Awobade

Abraham Owodunni

Handel Emezue

Gloria Monica Tobechukwu Emezue

N. N. Emezue

Sewade Ogun

Bunmi Akinremi

David Ifeoluwa Adelani

Chris Pal

The development of high-performing, robust, and reliable speech technologies depends on large, high-quality datasets. However, African langu… (voir plus)ages -- including our focus, Igbo, Hausa, and Yoruba -- remain under-represented due to insufficient data. Popular voice-enabled technologies do not support any of the 2000+ African languages, limiting accessibility for circa one billion people. While previous dataset efforts exist for the target languages, they lack the scale and diversity needed for robust speech models. To bridge this gap, we introduce the NaijaVoices dataset, a 1,800-hour speech-text dataset with 5,000+ speakers. We outline our unique data collection approach, analyze its acoustic diversity, and demonstrate its impact through finetuning experiments on automatic speech recognition, averagely achieving 75.86% (Whisper), 52.06% (MMS), and 42.33% (XLSR) WER improvements. These results highlight NaijaVoices' potential to advance multilingual speech processing for African languages.

2025-05-26

ArXiv (prépublication)

doi.org

arxiv.org

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change

Manuela Gonz'alez-Gonz'alez

Soufiane Belharbi

Muhammad Osama Zeeshan

Masoumeh Sharafi

Muhammad Haseeb Aslam

Marco Pedersoli

Alessandro Lameiras Koerich

Simon Bacon

Eric Granger

Recognizing complex emotions linked to ambivalence and hesitancy (A/H) can play a critical role in the personalization and effectiveness of … (voir plus)digital behaviour change interventions. These subtle and conflicting emotions are manifested by a discord between multiple modalities, such as facial and vocal expressions, and body language. Although experts can be trained to identify A/H, integrating them into digital interventions is costly and less effective. Automatic learning systems provide a cost-effective alternative that can adapt to individual users, and operate seamlessly within real-time, and resource-limited environments. However, there are currently no datasets available for the design of ML models to recognize A/H. This paper introduces a first Behavioural Ambivalence/Hesitancy (BAH) dataset collected for subject-based multimodal recognition of A/H in videos. It contains videos from 224 participants captured across 9 provinces in Canada, with different age, and ethnicity. Through our web platform, we recruited participants to answer 7 questions, some of which were designed to elicit A/H while recording themselves via webcam with microphone. BAH amounts to 1,118 videos for a total duration of 8.26 hours with 1.5 hours of A/H. Our behavioural team annotated timestamp segments to indicate where A/H occurs, and provide frame- and video-level annotations with the A/H cues. Video transcripts and their timestamps are also included, along with cropped and aligned faces in each frame, and a variety of participants meta-data. We include results baselines for BAH at frame- and video-level recognition in multi-modal setups, in addition to zero-shot prediction, and for personalization using unsupervised domain adaptation. The limited performance of baseline models highlights the challenges of recognizing A/H in real-world videos. The data, code, and pretrained weights are available.

2025-05-25

ArXiv (prépublication)

arxiv.org

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (voir plus) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.

2025-05-24

ArXiv (prépublication)

arxiv.org

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Publications

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Hugo Larochelle nommé directeur scientifique de Mila

Mots-clés populaires:

Publications