Portrait de Shahab Bakhtiari

Shahab Bakhtiari

Membre académique associé
Professeur adjoint, Université de Montréal, Département de psychologie
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Neurosciences computationnelles
Vision par ordinateur

Biographie

Shahab Bakhtiari est professeur adjoint au Département de psychologie de l'Université de Montréal et membre académique associé de Mila – Institut québécois d'intelligence artificielle. Il a obtenu un diplôme de premier cycle et un diplôme d'études supérieures en génie électrique à l'Université de Téhéran. Il a ensuite réalisé un doctorat en neurosciences à l'Université McGill, puis a été chercheur postdoctoral à Mila, où il s'est concentré sur la recherche à l'intersection des neurosciences et de l'intelligence artificielle. Dans ses travaux, il examine la perception visuelle et l'apprentissage dans les cerveaux biologiques et les réseaux neuronaux artificiels. Il utilise l'apprentissage profond comme cadre informatique pour modéliser l'apprentissage et la perception dans le cerveau, et tire parti de notre compréhension du système nerveux pour créer une intelligence artificielle d'inspiration plus biologique.

Étudiants actuels

Doctorat - UdeM
Superviseur⋅e principal⋅e :
Stagiaire de recherche - McGill University
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Baccalauréat - UdeM
Maîtrise recherche - UdeM
Postdoctorat - UdeM
Maîtrise recherche - McGill
Superviseur⋅e principal⋅e :

Publications

Shaped by meaning, weighted by reliability: New insights into multisensory integration
Elizaveta Sycheva
Léa St-Gelais
Karim Jerbi CoCo Lab
Franco Lepore
Vanessa Hadid
Self-Supervised Learning from Structural Invariance
The curriculum effect in visual learning: the role of readout dimensionality
Christopher C. Pack
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Current self-supervised algorithms commonly rely on transformations such as data augmentation and masking to learn visual representations. T… (voir plus)his is achieved by enforcing invariance or equivariance with respect to these transformations after encoding two views of an image. This dominant two-view paradigm often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we proposes \emph{seq-JEPA}, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally segregated representations: one equivariant to specified transformations and another invariant to them. To do so, our model processes short sequences of different views (observations) of inputs. Each encoded view is concatenated with an embedding of the relative transformation (action) that produces the next observation in the sequence. These view-action pairs are passed through a transformer encoder that outputs an aggregate representation. A predictor head then conditions this aggregate representation on the upcoming action to predict the representation of the next observation. Empirically, seq-JEPA demonstrates strong performance on both equivariant and invariant benchmarks without sacrificing one for the other. Furthermore, it excels at tasks that inherently require aggregating a sequence of observations, such as path integration across actions and predictive learning across eye movements.
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Why all roads don't lead to Rome: Representation geometry varies across the human visual cortical hierarchy
Biological and artificial intelligence systems navigate the fundamental efficiency-robustness tradeoff for optimal encoding, i.e., they must… (voir plus) efficiently encode numerous attributes of the input space while also being robust to noise. This challenge is particularly evident in hierarchical processing systems like the human brain. With a view towards understanding how systems navigate the efficiency-robustness tradeoff, we turned to a population geometry framework for analyzing representations in the human visual cortex alongside artificial neural networks (ANNs). In the ventral visual stream, we found general-purpose, scale-free representations characterized by a power law-decaying eigenspectrum in most areas. However, in certain higher-order visual areas did not have scale-free representations, indicating that scale-free geometry is not a universal property of the brain. In parallel, ANNs trained with a self-supervised learning objective also exhibited free-free geometry, but not after fine-tune on a specific task. Based on these empirical results and our analytical insights, we posit that a system's representation geometry is not a universal property and instead depends upon the computational objective.
Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Eilif Benjamin Muller
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (voir plus)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Eilif Benjamin Muller
Joint-embedding predictive architecture (JEPA) is a self-supervised learning (SSL) paradigm with the capacity of world modeling via action-c… (voir plus)onditioned prediction. Previously, JEPA world models have been shown to learn action-invariant or action-equivariant representations by predicting one view of an image from another. Unlike JEPA and similar SSL paradigms, animals, including humans, learn to recognize new objects through a sequence of active interactions. To introduce \emph{sequential} interactions, we propose \textit{seq-JEPA}, a novel SSL world model equipped with an autoregressive memory module. Seq-JEPA aggregates a sequence of action-conditioned observations to produce a global representation of them. This global representation, conditioned on the next action, is used to predict the latent representation of the next observation. We empirically show the advantages of this sequence of action-conditioned observations and examine our sequential modeling paradigm in two settings: (1) \emph{predictive learning across saccades}; a method inspired by the role of eye movements in embodied vision. This approach learns self-supervised image representations by processing a sequence of low-resolution visual patches sampled from image saliencies, without any hand-crafted data augmentations. (2) \emph{invariance-equivariance trade-off}; seq-JEPA's architecture results in automatic separation of invariant and equivariant representations, with the aggregated autoregressor outputs being mostly action-invariant and the encoder output being equivariant. This is in contrast with many equivariant SSL methods that expect a single representational space to contain both invariant and equivariant features, potentially creating a trade-off between the two. Empirically, seq-JEPA achieves competitive performance on both invariance and equivariance-related benchmarks compared to existing methods. Importantly, both invariance and equivariance-related downstream performances increase as the number of available observations increases.
Asymmetric stimulus representations bias visual perceptual learning
Pooya Laamerad
Asmara Awada
Christopher C. Pack
The primate visual cortex contains various regions that exhibit specialization for different stimulus properties, such as motion, shape, and… (voir plus) color. Within each region there is often further specialization, such that particular stimulus features, such as horizontal and vertical orientations, are overrepresented. These asymmetries are associated with well-known perceptual biases, but little is known about how they influence visual learning. Most theories would predict that learning is optimal, in the sense that it is unaffected by these asymmetries. But other approaches to learning would result in specific patterns of perceptual biases. To distinguish between these possibilities, we trained human observers to discriminate between expanding and contracting motion patterns, which have a highly asymmetrical representation in visual cortex. Observers exhibited biased percepts of these stimuli, and these biases were affected by training in ways that were often suboptimal. We simulated different neural network models and found that a learning rule that involved only adjustments to decision criteria, rather than connection weights, could account for our data. These results suggest that cortical asymmetries influence visual perception and that human observers often rely on suboptimal strategies for learning.
Spatial Distribution Modeling of Pistacia atlantica using Artificial Neural Network in Khohir National Park
Tymour Rostani Shahraji
Reza Akhavan
Reza Ebrahimi Atani
Energy efficiency as a normative account for predictive coding
The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning
Patrick J Mineault
Timothy P. Lillicrap
Christopher C. Pack
The visual system of mammals is comprised of parallel, hierarchical specialized pathways. Different pathways are specialized in so far as th… (voir plus)ey use representations that are more suitable for supporting specific downstream behaviours. In particular, the clearest example is the specialization of the ventral (“what”) and dorsal (“where”) pathways of the visual cortex. These two pathways support behaviours related to visual recognition and movement, respectively. To-date, deep neural networks have mostly been used as models of the ventral, recognition pathway. However, it is unknown whether both pathways can be modelled with a single deep ANN. Here, we ask whether a single model with a single loss function can capture the properties of both the ventral and the dorsal pathways. We explore this question using data from mice, who like other mammals, have specialized pathways that appear to support recognition and movement behaviours. We show that when we train a deep neural network architecture with two parallel pathways using a self-supervised predictive loss function, we can outperform other models in fitting mouse visual cortex. Moreover, we can model both the dorsal and ventral pathways. These results demonstrate that a self-supervised predictive learning approach applied to parallel pathway architectures can account for some of the functional specialization seen in mammalian visual systems.