Portrait of Tegan Maharaj

Tegan Maharaj

Core Academic Member
Assistant Professor in Machine Learning, HEC Montréal, Department of Decision Science
Research Topics
Deep Learning
Dynamical Systems
Machine Learning Theory
Multimodal Learning
Representation Learning

Biography

I am an assistant professor at the Department of Decision Science at HEC Montréal.

The goal of my research is to contribute understanding and techniques to the growing science of responsible AI development, while usefully applying AI to high-impact ecological problems related to climate change, epidemiology, AI alignment and ecological impact assessments. My recent research has two themes: (1) using deep models for policy analysis and risk mitigation, and (2) designing data or unit test environments to empirically evaluate learning behaviour or simulate the deployment of AI systems. Please contact me if you are interested in collaborations in these areas.

I am generally interested in studying “what goes into” deep models—not only data, but also the broader learning environment (e.g., task design/specification, loss function and regularization) and the broader societal context of deployment (e.g., privacy considerations, trends and incentives, norms and human biases). I am concerned and passionate about AI ethics and safety, and the application of ML to environmental management, health and social welfare.

Current Students

Master's Research - Université de Montréal
Principal supervisor :
Master's Research - HEC Montréal
PhD - HEC Montréal

Publications

A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (see more)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
A Closer Look at Memorization in Deep Networks
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (see more)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Deep Nets Don't Learn via Memorization
We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by memorizing training data in spite of … (see more)overlyexpressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs. natural datasets, showing: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but shorter for random inputs, and (3) that DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g. dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.
Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations
We propose zoneout, a novel method for regularizing RNNs. At each timestep, zoneout stochastically forces some hidden units to maintain thei… (see more)r previous values. Like dropout, zoneout uses random noise to train a pseudo-ensemble, improving generalization. But by preserving instead of dropping hidden units, gradient information and state information are more readily propagated through time, as in feedforward stochastic depth networks. We perform an empirical investigation of various RNN regularizers, and find that zoneout gives significant performance improvements across tasks. We achieve competitive results with relatively simple models in character- and word-level language modelling on the Penn Treebank and Text8 datasets, and combining with recurrent batch normalization yields state-of-the-art results on permuted sequential MNIST.