Publications

Deep LDA-Pruned Nets for Efficient Facial Gender Classification
Qing Tian
James J. Clark
Many real-time tasks, such as human-computer interac-tion, require fast and efficient facial gender classification. Although deep CNN nets… (voir plus) have been very effective for a mul-titude of classification tasks, their high space and time de-mands make them impractical for personal computers and mobile devices without a powerful GPU. In this paper, we develop a 16-layer, yet lightweight, neural network which boosts efficiency while maintaining high accuracy. Our net is pruned from the VGG-16 model [35] starting from the last convolutional (conv) layer where we find neuron activations are highly uncorrelated given the gender. Through Fisher’s Linear Discriminant Analysis (LDA) [8], we show that this high decorrelation makes it safe to discard directly last conv layer neurons with high within-class variance and low between-class variance. Combined with either Support Vector Machines (SVM) or Bayesian classification, the reduced CNNs are capable of achieving comparable (or even higher) accuracies on the LFW and CelebA datasets than the original net with fully connected layers. On LFW, only four Conv5 3 neurons are able to maintain a comparably high recognition accuracy, which results in a reduction of total network size by a factor of 70X with a 11 fold speedup. Comparisons with a state-of-the-art pruning method [12] (as well as two smaller nets [20, 24]) in terms of accuracy loss and convolutional layers pruning rate are also provided.
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (voir plus)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
Alexander Tong
Guillaume Huguet
Dennis Shung
Amine Natik
Manik Kuchroo
Smita Krishnaswamy
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (voir plus)s in many domains. Further
Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth Mover's Distance
Alexander Tong
Guillaume Huguet
Dennis L. Shung
Amine Natik
Manik Kuchroo
Smita Krishnaswamy
In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (voir plus)s in many domains. Further
Enjeux juridiques propres au modèle émergent des patients accompagnateurs dans les milieux de soins au Québec (Legal Issues Arising from the Emerging Model of Accompanying Patients in the Quebec Healthcare System)
Léa Boutrouille
Marie-Pascale Pomey
Estimating the Impact of an Improvement to a Revenue Management System: An Airline Application
Greta Laage
William Hamilton
Airlines have been making use of highly complex Revenue Management Systems to maximize revenue for decades. Estimating the impact of changin… (voir plus)g one component of those systems on an important outcome such as revenue is crucial, yet very challenging. It is indeed the difference between the generated value and the value that would have been generated keeping business as usual, which is not observable. We provide a comprehensive overview of counterfactual prediction models and use them in an extensive computational study based on data from Air Canada to estimate such impact. We focus on predicting the counterfactual revenue and compare it to the observed revenue subject to the impact. Our microeconomic application and small expected treatment impact stand out from the usual synthetic control applications. We present accurate linear and deep-learning counterfactual prediction models which achieve respectively 1.1% and 1% of error and allow to estimate a simulated effect quite accurately.
Faults in deep reinforcement learning programs: a taxonomy and a detection approach
Amin Nikanjam
Mohammad Mehdi Morovati
Houssem Ben Braiek
Guest Editorial Explainable AI: Towards Fairness, Accountability, Transparency and Trust in Healthcare
Arash Shaban-Nejad
Martin Michalowski
John S. Brownstein
Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program)
Philippe Vincent‐lamarre
Koustuv Sinha
Vincent Larivière
Alina Beygelzimer
Florence D'alche-buc
E. Fox
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
Inspecting the Factuality of Hallucinated Entities in Abstractive Summarization
Meng Cao
Yue Dong
State-of-the-art abstractive summarization systems often generate hallucinations ; i.e., content that is not directly inferable from the sou… (voir plus)rce text. Despite being assumed incorrect, many of the hallucinated contents are consistent with world knowledge (factual hallucinations). Including these factual hallucinations into a summary can be beneficial in providing additional background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical re-sults suggest that our method vastly outperforms three strong baselines in both accuracy and F1 scores and has a strong correlation with human judgements on factuality classification tasks. Furthermore, our approach can provide insight into whether a particular hallucination is caused by the summarizer’s pre-training or fine-tuning step. 1
Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
Kartik Ahuja
Ethan Caballero
Dinghuai Zhang
Jean-Christophe Gagnon-Audet
The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address… (voir plus) out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due to the methods failing to capture the invariance? Or is the invariance principle itself insufficient? To answer these questions, we revisit the fundamental assumptions in linear regression tasks, where invariance-based approaches were shown to provably generalize OOD. In contrast to the linear regression tasks, we show that for linear classification tasks we need much stronger restrictions on the distribution shifts, or otherwise OOD generalization is impossible. Furthermore, even with appropriate restrictions on distribution shifts in place, we show that the invariance principle alone is insufficient. We prove that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not. We propose an approach that incorporates both of these principles and demonstrate its effectiveness in several experiments.