Portrait of Shahab Bakhtiari

Shahab Bakhtiari

Associate Academic Member
Assistant Professor, Université de Montréal, Department of Psychology
Research Topics
Computational Neuroscience
Computer Vision
Deep Learning
Representation Learning

Biography

Shahab Bakhtiari is an assistant professor in the Department of Psychology at Université de Montréal and an associate academic member of Mila – Quebec Artificial Intelligence Institute. Bakhtiari received his undergraduate and graduate degrees in electrical engineering from the University of Tehran. He then earned a PhD in neuroscience from McGill University and was a postdoctoral researcher at Mila, where he focused on research at the intersection of neuroscience and AI. His research examines visual perception and learning in both biological brains and artificial neural networks. He uses deep learning as a computational framework to model learning and perception in the brain, and aims to leverage our understanding of the nervous system to create more biologically inspired AI.

Current Students

PhD - Université de Montréal
Principal supervisor :
Collaborating researcher
Research Intern - McGill University University
PhD - Université de Montréal
Collaborating researcher - Concordia University University
PhD - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Undergraduate - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
PhD - McGill University
Principal supervisor :

Publications

Shaped by meaning, weighted by reliability: New insights into multisensory integration
Elizaveta Sycheva
Léa St-Gelais
Karim Jerbi CoCo Lab
Franco Lepore
Vanessa Hadid
Interpreting Physics in Video World Models
Quentin Garrido
Randall Balestriero
Matthew Kowal
Thomas Fel
Mike Rabbat
A long-standing question in physical reasoning is whether video-based models need to rely on factorized representations of physical variable… (see more)s in order to make physically accurate predictions, or whether they can implicitly represent such variables in a distributed manner. While modern video world models achieve strong performance on intuitive physics benchmarks, it remains unclear which of these representational regimes they implement internally. Here, we present the first interpretability study to directly examine physical representations inside large-scale video encoders. Using layerwise probing, subspace geometry, patch-level decoding, and targeted attention ablations, we characterize where physical information becomes accessible and how it is organized within encoder-based video transformers. Across architectures, we identify a sharp intermediate-depth transition— which we call the \emph{Physics Emergence Zone}—at which physical variables become accessible. Physics-related representations peak shortly after this transition and degrade toward the output layers. Decomposing motion into explicit variables, we find that scalar quantities such as speed and acceleration are available from early layers onwards, whereas motion direction becomes accessible only at the Physics Emergence Zone. Notably, we find that direction is encoded through a high-dimensional population structure with circular geometry, requiring coordinated multi-feature intervention to control. These findings suggest that modern video models do not use factorized representations of physical variables like a classical physics engine. Instead, they use a distributed representation that is nonetheless sufficient for making physical predictions.
Context-Aware World Models for Task-Agnostic Control
Busra Tugce Gurbuz
Christopher C. Pack
Eilif Benjamin Muller
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Zahraa Chorghay
Blake Aaron Richards
The curriculum effect in visual learning: the role of readout dimensionality
Christopher C. Pack
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Current self-supervised algorithms commonly rely on transformations such as data augmentation and masking to learn visual representations. T… (see more)his is achieved by enforcing invariance or equivariance with respect to these transformations after encoding two views of an image. This dominant two-view paradigm often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we proposes \emph{seq-JEPA}, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally segregated representations: one equivariant to specified transformations and another invariant to them. To do so, our model processes short sequences of different views (observations) of inputs. Each encoded view is concatenated with an embedding of the relative transformation (action) that produces the next observation in the sequence. These view-action pairs are passed through a transformer encoder that outputs an aggregate representation. A predictor head then conditions this aggregate representation on the upcoming action to predict the representation of the next observation. Empirically, seq-JEPA demonstrates strong performance on both equivariant and invariant benchmarks without sacrificing one for the other. Furthermore, it excels at tasks that inherently require aggregating a sequence of observations, such as path integration across actions and predictive learning across eye movements.
Seeing the world as animals do: How to leverage generative AI for ecological neuroscience
Exploiting large-scale neuroimaging datasets to reveal novel insights in vision science
Peter Brotherwood
Catherine Landry
Jasper van den Bosch
Tim Kietzmann
Frédéric Gosselin
Adrien Doerig
Neural responses in space and time to a massive set of natural scenes
Peter Brotherwood
Emmanuel Lebeau
Mathias Salvas-Hébert
Marin Coignard
Frédéric Gosselin
Kendrick Kay
Asymmetric stimulus representations bias visual perceptual learning
Pooya Laamerad
Asmara Awada
Christopher C. Pack
The primate visual cortex contains various regions that exhibit specialization for different stimulus properties, such as motion, shape, and… (see more) color. Within each region, there is often further specialization, such that particular stimulus features, such as horizontal and vertical orientations, are over-represented. These asymmetries are associated with well-known perceptual biases, but little is known about how they influence visual learning. Most theories would predict that learning is optimal, in the sense that it is unaffected by these asymmetries. However, other approaches to learning would result in specific patterns of perceptual biases. To distinguish between these possibilities, we trained human observers to discriminate between expanding and contracting motion patterns, which have a highly asymmetrical representation in the visual cortex. Observers exhibited biased percepts of these stimuli, and these biases were affected by training in ways that were often suboptimal. We simulated different neural network models and found that a learning rule that involved only adjustments to decision criteria, rather than connection weights, could account for our data. These results suggest that cortical asymmetries influence visual perception and that human observers often rely on suboptimal strategies for learning.
Spatial Distribution Modeling of Pistacia atlantica using Artificial Neural Network in Khohir National Park
Tymour Rostani Shahraji
Reza Akhavan
Reza Ebrahimi Atani
Energy efficiency as a normative account for predictive coding