Portrait of Shahab Bakhtiari

Shahab Bakhtiari

Associate Academic Member
Assistant Professor, Université de Montréal, Department of Psychology
Research Topics
Computational Neuroscience
Computer Vision
Deep Learning
Representation Learning

Biography

Shahab Bakhtiari is an assistant professor in the Department of Psychology at Université de Montréal and an associate academic member of Mila – Quebec Artificial Intelligence Institute. Bakhtiari received his undergraduate and graduate degrees in electrical engineering from the University of Tehran. He then earned a PhD in neuroscience from McGill University and was a postdoctoral researcher at Mila, where he focused on research at the intersection of neuroscience and AI. His research examines visual perception and learning in both biological brains and artificial neural networks. He uses deep learning as a computational framework to model learning and perception in the brain, and aims to leverage our understanding of the nervous system to create more biologically inspired AI.

Current Students

PhD - Université de Montréal
Principal supervisor :
Research Intern - McGill University University
PhD - Université de Montréal
Principal supervisor :
Undergraduate - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - Université de Montréal
Postdoctorate - Université de Montréal
Master's Research - McGill University
Principal supervisor :

Publications

Shaped by meaning, weighted by reliability: New insights into multisensory integration
Elizaveta Sycheva
Léa St-Gelais
Karim Jerbi CoCo Lab
Franco Lepore
Vanessa Hadid
Self-Supervised Learning from Structural Invariance
The curriculum effect in visual learning: the role of readout dimensionality
Christopher C. Pack
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Current self-supervised algorithms commonly rely on transformations such as data augmentation and masking to learn visual representations. T… (see more)his is achieved by enforcing invariance or equivariance with respect to these transformations after encoding two views of an image. This dominant two-view paradigm often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we proposes \emph{seq-JEPA}, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally segregated representations: one equivariant to specified transformations and another invariant to them. To do so, our model processes short sequences of different views (observations) of inputs. Each encoded view is concatenated with an embedding of the relative transformation (action) that produces the next observation in the sequence. These view-action pairs are passed through a transformer encoder that outputs an aggregate representation. A predictor head then conditions this aggregate representation on the upcoming action to predict the representation of the next observation. Empirically, seq-JEPA demonstrates strong performance on both equivariant and invariant benchmarks without sacrificing one for the other. Furthermore, it excels at tasks that inherently require aggregating a sequence of observations, such as path integration across actions and predictive learning across eye movements.
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must… (see more) efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.
Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Eilif Benjamin Muller
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (see more)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Eilif Benjamin Muller
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (see more)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
Asymmetric stimulus representations bias visual perceptual learning
Pooya Laamerad
Asmara Awada
Christopher C. Pack
The primate visual cortex contains various regions that exhibit specialization for different stimulus properties, such as motion, shape, and… (see more) color. Within each region there is often further specialization, such that particular stimulus features, such as horizontal and vertical orientations, are overrepresented. These asymmetries are associated with well-known perceptual biases, but little is known about how they influence visual learning. Most theories would predict that learning is optimal, in the sense that it is unaffected by these asymmetries. But other approaches to learning would result in specific patterns of perceptual biases. To distinguish between these possibilities, we trained human observers to discriminate between expanding and contracting motion patterns, which have a highly asymmetrical representation in visual cortex. Observers exhibited biased percepts of these stimuli, and these biases were affected by training in ways that were often suboptimal. We simulated different neural network models and found that a learning rule that involved only adjustments to decision criteria, rather than connection weights, could account for our data. These results suggest that cortical asymmetries influence visual perception and that human observers often rely on suboptimal strategies for learning.
Spatial Distribution Modeling of Pistacia atlantica using Artificial Neural Network in Khohir National Park
Tymour Rostani Shahraji
Reza Akhavan
Reza Ebrahimi Atani
Energy efficiency as a normative account for predictive coding
The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
Patrick J Mineault
Timothy P. Lillicrap
Christopher C. Pack
The visual system of mammals is comprised of parallel, hierarchical specialized pathways. Different pathways are specialized in so far as th… (see more)ey use representations that are more suitable for supporting specific downstream behaviours. In particular, the clearest example is the specialization of the ventral (“what”) and dorsal (“where”) pathways of the visual cortex. These two pathways support behaviours related to visual recognition and movement, respectively. To-date, deep neural networks have mostly been used as models of the ventral, recognition pathway. However, it is unknown whether both pathways can be modelled with a single deep ANN. Here, we ask whether a single model with a single loss function can capture the properties of both the ventral and the dorsal pathways. We explore this question using data from mice, who like other mammals, have specialized pathways that appear to support recognition and movement behaviours. We show that when we train a deep neural network architecture with two parallel pathways using a self-supervised predictive loss function, we can outperform other models in fitting mouse visual cortex. Moreover, we can model both the dorsal and ventral pathways. These results demonstrate that a self-supervised predictive learning approach applied to parallel pathway architectures can account for some of the functional specialization seen in mammalian visual systems.