Portrait of Samira Ebrahimi Kahou

Samira Ebrahimi Kahou

Affiliate Member
Assistant Professor, University of Calgary, Deparment of Electrical and Software Engineering
Adjunct Professor, École de technologie suprérieure, School of Computer Science
Adjunct Professor, McGill University, School of Computer Science
Research Topics
Computer Vision
Deep Learning
Medical Machine Learning
Multimodal Learning
Natural Language Processing
Reinforcement Learning
Representation Learning

Biography

I am an Assistant Professor at the Schulich School of Engineering's Department of Electrical and Software Engineering at the University of Calgary. I am also an adjunct professor at the Department of Computer Engineering and Information Technology of ÉTS and an adjunct professor at the Computer School of McGill. Before joining ÉTS, I was a postdoctoral fellow working with Professor Doina Precup at McGill/Mila. Before my postdoc, I was a researcher at Microsoft Research Montréal.

I received my Ph.D. from Polytechnique Montréal/Mila in 2016 under the supervision of Professor Chris Pal. During my Ph.D. studies, I worked on computer vision and deep learning applied to emotion recognition, object tracking and knowledge distillation.

Current Students

PhD - École de technologie suprérieure
PhD - Université de Montréal
Principal supervisor :
Collaborating researcher - McGill University
Co-supervisor :
PhD - École de technologie suprérieure
Principal supervisor :
PhD - École de technologie suprérieure
Principal supervisor :
PhD - McGill University
Co-supervisor :
Master's Research - École de technologie suprérieure
PhD - McGill University
Principal supervisor :

Publications

Simple Video Generation using Neural ODEs
Despite having been studied to a great extent, the task of conditional generation of sequences of frames, or videos, remains extremely chall… (see more)enging. It is a common belief that a key step towards solving this task resides in modelling accurately both spatial and temporal information in video signals. A promising direction to do so has been to learn latent variable models that predict the future in latent space and project back to pixels, as suggested in recent literature. Following this line of work and building on top of a family of models introduced in prior work, Neural ODE, we investigate an approach that models time-continuous dynamics over a continuous latent space with a differential equation with respect to time. The intuition behind this approach is that these trajectories in latent space could then be extrapolated to generate video frames beyond the time steps for which the model is trained. We show that our approach yields promising results in the task of future frame prediction on the Moving MNIST dataset with 1 and 2 digits.
Deep Learning for Detecting Extreme Weather Patterns
Mayur Mudigonda
Mayur Mudigonda, Prabhat Ram
Prabhat Ram
Karthik Kashinath
Evan Racah
Ankur Mahesh
Yunjie Liu
Jim Biard
Thorsten Kurth
Sookyung Kim
Burlen Loring
Travis O'Brien
K. Kunkel
Kenneth E. Kunkel
M. Wehner
Michael F. Wehner … (see 2 more)
W. Collins
William D. Collins
Accounting for Variance in Machine Learning Benchmarks
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (see more)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
Multi-Image Super-Resolution for Remote Sensing using Deep Recurrent Networks
Md Rifat Arefin
Alfredo Kalaitzis
Sookyung Kim
High-resolution satellite imagery is critical for various earth observation applications related to environment monitoring, geoscience, fore… (see more)casting, and land use analysis. However, the acquisition cost of such high-quality imagery due to the scarcity of providers and needs for high-frequency revisits restricts its accessibility in many fields. In this work, we present a data-driven, multi-image super resolution approach to alleviate these problems. Our approach is based on an end-to-end deep neural network that consists of an encoder, a fusion module, and a decoder. The encoder extracts co-registered highly efficient feature representations from low-resolution images of a scene. A Gated Re-current Unit (GRU)-based module acts as the fusion module, aggregating features into a combined representation. Finally, a decoder reconstructs the super-resolved image. The proposed model is evaluated on the PROBA-V dataset released in a recent competition held by the European Space Agency. Our results show that it performs among the top contenders and offers a new practical solution for real-world applications.
HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery
Michel Deudon
Alfredo Kalaitzis
Israel Goytom
Md Rifat Arefin
Zhichao Lin
Julien Cornebise
Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic res… (see more)ults, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet -- from deforestation, to human rights violations -- that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency's MFSR competition on real-world satellite imagery.
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction
Alaaeldin El-Nouby
Shikhar Sharma
Hannes Schulz
Layla El Asri
Graham W. Taylor
Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused… (see more) on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly more challenging than one-step generation tasks, as such a system must understand the contents of its generated images with respect to the feedback history, the current feedback, as well as the interactions among concepts present in the feedback history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, and apply simple transformations to existing objects. We believe our approach is an important step toward interactive generation. Code and data is available at: https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/.
An Empirical Study of Batch Normalization and Group Normalization in Conditional Computation
Batch normalization has been widely used to improve optimization in deep neural networks. While the uncertainty in batch statistics can act … (see more)as a regularizer, using these dataset statistics specific to the training set impairs generalization in certain tasks. Recently, alternative methods for normalizing feature activations in neural networks have been proposed. Among them, group normalization has been shown to yield similar, in some domains even superior performance to batch normalization. All these methods utilize a learned affine transformation after the normalization operation to increase representational power. Methods used in conditional computation define the parameters of these transformations as learnable functions of conditioning information. In this work, we study whether and where the conditional formulation of group normalization can improve generalization compared to conditional batch normalization. We evaluate performances on the tasks of visual question answering, few-shot learning, and conditional image generation.
Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish durin… (see more)g training, as the sequence length increases. Gradients can be attenuated by transition operators and are attenuated or dropped by activation functions. Canonical architectures like LSTM alleviate this issue by skipping information through a memory mechanism. We propose a new recurrent architecture (Non-saturating Recurrent Unit; NRU) that relies on a memory mechanism but forgoes both saturating activation functions and saturating gates, in order to further alleviate vanishing gradients. In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures.
Deep Learning recognizes weather and climate patterns
Karthik Kashinath
M. Prabhat
Mayur Mudigonda
Ankur Mahesh
Sookyung Kim
Yunjie Liu
B. Toms
Evan Racah
Jim Biard
K. Kunkel
Dean Nesbit Williams
Travis O'Brien
M. Wehner
W. Collins
Keep Drawing It: Iterative language-based image generation and editing
Alaaeldin El-Nouby
Shikhar Sharma
Hannes Schulz
Layla El Asri
Graham W. Taylor
Conditional text-to-image generation approaches commonly focus on generating a single image in a single step. One practical extension beyond… (see more) one-step generation is an interactive system that generates an image iteratively, conditioned on ongoing linguistic input / feedback. This is significantly more challenging as such a system must understand and keep track of the ongoing context and history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, apply simple transformations to existing objects, and correct previous mistakes. We believe our approach is an important step toward interactive generation.
ChatPainter: Improving Text to Image Generation using Dialogue
Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can c… (see more)ontain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the images correspond to which words in the captions. We show that adding a dialogue that further describes the scene leads to significant improvement in the inception score and in the quality of generated images on the MS COCO dataset.
FigureQA: An Annotated Figure Dataset for Visual Reasoning
Adam Atkinson
Ákos Kádár
Adam Trischler
We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are s… (see more)ynthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plot elements and examine characteristics like the maximum, the minimum, area-under-the-curve, smoothness, and intersection. To resolve, such questions often require reference to multiple plot elements and synthesis of information distributed spatially throughout a figure. To facilitate the training of machine learning systems, the corpus also includes side data that can be used to formulate auxiliary objectives. In particular, we provide the numerical data used to generate each figure as well as bounding-box annotations for all plot elements. We study the proposed visual reasoning task by training several models, including the recently proposed Relation Network as a strong baseline. Preliminary results indicate that the task poses a significant machine learning challenge. We envision FigureQA as a first step towards developing models that can intuitively recognize patterns from visual representations of data.