Portrait of Samira Ebrahimi Kahou

Samira Ebrahimi Kahou

Affiliate Member
Associate Professor, University of Calgary, Deparment of Electrical and Software Engineering
Adjunct Professor, École de technologie suprérieure, School of Computer Science
Adjunct Professor, McGill University, School of Computer Science
Research Topics
Computer Vision
Deep Learning
Medical Machine Learning
Multimodal Learning
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

Samira is an Associate Professor at the University of Calgary in the Schulich School of Engineering. She is also an Adjunct Professor at École de technologie supérieure (ÉTS) in the Department of Software Engineering and Information Technology and McGill University in the School of Computer Science. She is an academic member of Mila - Québec Artificial Intelligence Institute and holds a Canada CIFAR AI Chair. Samira received her Ph.D. in Computer Engineering from Polytechnique Montréal/Mila with an award for the best thesis in the department. Samira also worked as a Postdoctoral Fellow at the McGill School of Computer Science and as a Researcher at Microsoft Research Montréal.

Samira and her group work on solving fundamental problems in representation learning for decision making, with a broad focus on explainability, generalization and efficient learning. Her work has been published in top-tier venues, such as NeurIPS, ICLR, ICML, ICCV, CVPR, TMLR and CoRL. Samira received the 2024 Early Career Excellence in Research Award from the Schulich School of Engineering. Her impactful work in multi-modal learning was recognized twice by ACM ICMI Ten-Year Technical Impact Awards, 2023 as Runner-up and 2025 as Winner.

Current Students

PhD - Université de Montréal
Principal supervisor :
PhD - École de technologie suprérieure
Principal supervisor :
PhD - École de technologie suprérieure
Principal supervisor :
PhD - McGill University
Co-supervisor :
PhD - McGill University
Principal supervisor :

Publications

The Invisible Hand of Physics: When Video Diffusion Models Know More Than They Show
Parsa Esmati
Katja Hofmann
Majid Mirmehdi
Modern video diffusion models generate increasingly realistic and temporally coherent videos, motivating their use as candidate world simula… (see more)tors. Yet it remains unclear whether these models internally encode physical structure, or merely reproduce motion patterns seen during training. We study this question by probing video diffusion models along latent trajectories corresponding to real videos with known physical plausibility. To obtain such trajectories, we approximately invert the deterministic sampling process by integrating the learned velocity field backward from a clean video latent to noise, giving access to the model's intermediate states and attention maps. Using these recovered trajectories, we show that physical plausibility is linearly decodable from diffusion transformer states across IntPhys and InfLevel, reaching around 81.27% average accuracy and outperforming dedicated representation-learning baselines such as V-JEPA and VideoMAE. Surprisingly, this signal is absent from the VAE latent input and emerges inside the denoising transformer itself, despite the model not being trained with a self-supervised predictive objective. These findings suggest that physically meaningful representations can arise as a byproduct of generative denoising.
Structured Representation Learning with Locally Linear Embeddings and Adaptive Feature Fusion
Neuroscientific research has revealed that the brain encodes complex behaviors by leveraging structured, low-dimensional manifolds and dynam… (see more)ically fusing multiple sources of information through adaptive gating mechanisms. Inspired by these principles, we propose a novel reinforcement learning (RL) framework that encourages the disentanglement of dynamics-specific and reward-specific features, drawing direct parallels to how neural circuits separate and integrate information for efficient decision-making. Our approach leverages locally linear embeddings (LLEs) to capture the intrinsic, locally linear structure inherent in many environments—mirroring the local smoothness observed in neural population activity—while concurrently deriving reward-specific features through the standard RL objective. An attention mechanism, analogous to cortical gating, adaptively fuses these complementary representations on a per-state basis. Experimental results on benchmark tasks demonstrate that our method, grounded in neuroscientific principles, improves learning efficiency and overall performance compared to conventional RL approaches, highlighting the benefits of explicitly modeling local state structures and adaptive feature selection as observed in biological systems.
Bootstrap Sampling Improves Model Soup Performance via Increased Model Diversity for Pneumonia Classification
Sara Early
Omata I. Ehizokhale
Nils D. Forkert
Model soups combine multiple trained neural network checkpoints through weight averaging, often outperforming individual models and achievin… (see more)g performance comparable to deep ensembles without increasing inference cost. However, their effectiveness depends critically on checkpoint diversity, and when models are trained on the same dataset, optimization trajectories may converge toward similar regions of parameter space, limiting this diversity. In this work, we investigate bootstrap resampling as a simple data-level mechanism for increasing checkpoint diversity. Using a binary pneumonia classification task and 644 radiographs from the National Institutes of Health (NIH) ChestXray14 dataset, we train pools of convolutional neural networks under varying bootstrap ratios and construct greedy model soups. While checkpoint models trained on the full dataset achieve the highest mean individual accuracy, they are highly similar and offer little complementary signal, limiting the effectiveness of greedy selection. Bootstrap sampling introduces variability in the training data, producing more diverse checkpoints that, although individually weaker, enable greedy soup construction to combine complementary representations and achieve superior overall performance. The strongest model soup, obtained with 70\% bootstrap sampling, achieves a test accuracy of 0.650, representing a 9.8 percentage point improvement over the mean individual checkpoint accuracy (0.551) under the same condition. While absolute performance is limited by the small cohort size and training-from-scratch setting, this result highlights the substantial gains achievable through diversity-driven weight averaging.
Estimation of head motion in structural MRI and its impact on cortical morphometry
Motion-related artifacts are inevitable in Magnetic Resonance Imaging (MRI) and can bias automated neuroanatomical metrics such as cortical … (see more)thickness. These biases can interfere with statistical analysis which is a major concern as motion has been shown to be more prominent in certain populations such as children or individuals with ADHD. Manual review cannot objectively quantify motion in anatomical scans, and existing quantitative automated approaches often require specialized hardware or custom acquisition protocols. Here, we train a 3D convolutional neural network to estimate a summary motion metric in retrospective routine research scans by leveraging a large training dataset of synthetically motion-corrupted volumes. We validate our method with one held-out site from our training cohort and with 14 fully independent datasets, including one with manual ratings, achieving a Spearman Rank correlation of 0.71 vs. manual labels. We also tested the correlation of our predicted motion score with morphometric measurements known to be impacted by motion, achieving significant correlation on most datasets. Furthermore, our predicted motion correlates with subject age in line with prior studies. Our approach shows good generalization across scanner brands and protocols, enabling objective, scalable motion assessment in structural MRI studies without prospective motion correction. Finally, we provide empirical evidence that our motion estimator significantly improve model fitness when studying cortical thickness and volume. Our final model is made openly and freely available through “Agitation," a tool usable as a CLI, python package and integrated in Nipoppy and Boutiques. By providing reliable motion estimates, our method offers researchers a tool to assess and account for potential biases in cortical morphometric analyses.
Survey on <scp>AI</scp> Ethics: A Socio‐Technical Perspective
Dave Mbiazi
Ivaxi Sheth
Patrik Joslin Kenfack
Abstract The past decade has observed a significant advancement in AI, with deep learning‐based models being deployed in diverse scenarios… (see more), including safety‐critical applications. As these AI systems become deeply embedded in our societal infrastructure, the repercussions of their decisions and actions have significant consequences, making the ethical implications of AI deployment highly relevant and essential. The ethical concerns associated with AI are multifaceted, including challenging issues of fairness, privacy and data protection, responsibility and accountability, safety and robustness, transparency and explainability, and environmental impact. These principles together form the foundations of ethical AI considerations that concern every stakeholder in the AI system lifecycle. In light of the present ethical and future x‐risk concerns, governments have shown increasing interest in establishing guidelines for the ethical deployment of AI. This work unifies the current and future ethical concerns of deploying AI into society. While we acknowledge and appreciate the technical surveys for each of the ethical principles concerned, in this paper, we aim to provide a comprehensive overview that not only addresses each principle from a technical point of view but also discusses them from a social perspective.
Attention-Based Multi-Agent RL for Multi-Machine Tending Using Mobile Robots
Abdalwhab Bakheet Mohamed Abdalwhab
David St-Onge
Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborat… (see more)ive robots can tackle that can also greatly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. We introduce a multi-agent multi-machine-tending learning framework using mobile robots based on multi-agent reinforcement learning (MARL) techniques, with the design of a suitable observation and reward. Moreover, we integrate an attention-based encoding mechanism into the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine-tending scenarios. Our model (AB-MAPPO) outperforms MAPPO in this new challenging scenario in terms of task success, safety, and resource utilization. Furthermore, we provided an extensive ablation study to support our design decisions.
Source-free Domain Adaptation Requires Penalized Diversity
While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance… (see more) of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.
Learning From the Past with Cascading Eligibility Traces
Tokiniaina Raharison Ralambomihanta
Blake A. Richards
Handling Delay in Real-Time Reinforcement Learning
Real-time reinforcement learning (RL) introduces several challenges. First, policies are constrained to a fixed number of actions per second… (see more) due to hardware limitations. Second, the environment may change while the network is still computing an action, leading to observational delay. The first issue can partly be addressed with pipelining, leading to higher throughput and potentially better policies. However, the second issue remains: if each neuron operates in parallel with an execution time of
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent v… (see more)ersion updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.
Learning to Play Atari in a World of Tokens
Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extende… (see more)d context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.
Prioritizing Samples in Reinforcement Learning with Reducible Loss
Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed i… (see more)n the past. Not all samples carry the same amount of significance and simply assigning equal importance to each of the samples is a naïve strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay.