Learning to live with Dale's principle: ANNs with separate excitatory and inhibitory units
Jonathan Cornford
Damjan Kalajdzievski
Marco Leite
Amélie Lamarquette
Dimitri Michael Kullmann
The units in artificial neural networks (ANNs) can be thought of as abstractions of biological neurons, and ANNs are increasingly used in ne… (see more)uroscience research. However, there are many important differences between ANN units and real neurons. One of the most notable is the absence of Dale's principle, which ensures that biological neurons are either exclusively excitatory or inhibitory. Dale's principle is typically left out of ANNs because its inclusion impairs learning. This is problematic, because one of the great advantages of ANNs for neuroscience research is their ability to learn complicated, realistic tasks. Here, by taking inspiration from feedforward inhibitory interneurons in the brain we show that we can develop ANNs with separate populations of excitatory and inhibitory units that learn just as well as standard ANNs. We call these networks Dale's ANNs (DANNs). We present two insights that enable DANNs to learn well: (1) DANNs are related to normalization schemes, and can be initialized such that the inhibition centres and standardizes the excitatory activity, (2) updates to inhibitory neuron parameters should be scaled using corrections based on the Fisher Information matrix. These results demonstrate how ANNs that respect Dale's principle can be built without sacrificing learning performance, which is important for future work using ANNs as models of the brain. The results may also have interesting implications for how inhibitory plasticity in the real brain operates.
Predicting Infectiousness for Proactive Contact Tracing
Prateek Gupta
Nasim Rahaman
Martin Weiss
Tristan Deleu
Meng Qu
Victor Schmidt
Pierre-Luc St-Charles
Hannah Alsdurf
Olexa Bilaniuk
gaetan caron
pierre luc carrier
Joumana Ghosn
satya ortiz gagne
Bernhard Schölkopf … (see 3 more)
Abhinav Sharma
andrew williams
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (see more)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs
Meng Qu
Junkun Chen
Louis-Pascal Xhonneux
This paper studies learning logic rules for reasoning on knowledge graphs. Logic rules provide interpretable explanations when used for pred… (see more)iction as well as being able to generalize to other tasks, and hence are critical to learn. Existing methods either suffer from the problem of searching in a large search space (e.g., neural logic programming) or ineffective optimization due to sparse rewards (e.g., techniques based on reinforcement learning). To address these limitations, this paper proposes a probabilistic model called RNNLogic. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. We develop an EM-based algorithm for optimization. In each iteration, the reasoning predictor is updated to explore some generated logic rules for reasoning. Then in the E-step, we select a set of high-quality rules from all generated rules with both the rule generator and reasoning predictor via posterior inference; and in the M-step, the rule generator is updated with the rules selected in the E-step. Experiments on four datasets prove the effectiveness of RNNLogic.
Spatially Structured Recurrent Modules
Nasim Rahaman
Anirudh Goyal
Muhammad Waleed Gondal
Manuel Wüthrich
Stefan Bauer
Yash Sharma
Bernhard Schölkopf
Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise we… (see more)ll and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.
Attention Based Pruning for Shift Networks
Ghouthi Boukli Hacene
Carlos Lassance
Vincent Gripon
Matthieu Courbariaux
In many application domains such as computer vision, Convolutional Layers (CLs) are key to the accuracy of deep learning methods. However, i… (see more)t is often required to assemble a large number of CLs, each containing thousands of parameters, in order to reach state-of-the-art accuracy, thus resulting in complex and demanding systems that are poorly fitted to resource-limited devices. Recently, methods have been proposed to replace the generic convolution operator by the combination of a shift operation and a simpler
The patient advisor, an organizational resource as a lever for an enhanced oncology patient experience (PAROLE-onco): a longitudinal multiple case study protocol
Marie-Pascale Pomey
Michèle de Guise
Mado Desforges
Karine Bouchard
Cécile Vialaron
Louise Normandin
Monica Iliescu‐Nelea
Israël Fortin
Isabelle Ganache
Zeev Rosberger
Danielle Charpentier
L. Bélanger
Michel Dorval
Djahanchah Philip Ghadiri
Mélanie Lavoie-Tremblay
A. Boivin
Jean-François Pelletier
Nicolas Fernandez
Alain M. Danino
Abg-CoQA: Clarifying Ambiguity in Conversational Question Answering
Meiqi Guo
Mingda Zhang
Malihe Alikhani
Effective communication is about the dissemination of properly worded meaningful ideas/messages that are comprehensible to both sen… (see more)der and receiver and which ultimately can attract the desired response or feedback. For machines to engage in a conversation, it is therefore essential to enable them to clarify ambiguity and achieve a common ground. We introduce Abg-CoQA, a novel dataset for clarifying ambiguity in Conversational Question Answering systems. Our dataset contains 9k questions with answers where 1k questions are ambiguous, obtained from 4k text passages from five diverse domains. For ambiguous questions, a clarification conversational turn is collected. We evaluate strong language generation models and conversational question answering models on Abg-CoQA. The best-performing system achieves a BLEU-1 score of 12.9% on generating clarification question, which is 27.9 points behind human performance (40.8%); and a F1 score of 40.1% on question answering after clarification, which is 35.1 points behind human performance (75.2%), indicating there is ample room for improvement.
Accounting for Variance in Machine Learning Benchmarks
Xavier Bouthillier
Pierre Delaunay
Mirko Bronzi
Assya Trofimov
Brennan Nichyporuk
Justin Szeto
Naz Sepah
Edward Raff
Kanika Madan
Vikram Voleti
Vincent Michalski
Dmitriy Serdyuk
Gael Varoquaux
Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (see more)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.
Active Learning for Capturing Human Decision Policies in a Data Frugal Context
Loïc Grossetête
Alexandre Marois
Bénédicte Chatelais
Daniel Lafond
ADEPT: An Adjective-Dependent Plausibility Task
Ali Emami
Ian Porada
Kaheer Suleman
Adam Trischler
Adversarial Feature Desensitization
Mojtaba Faramarzi
Reza Bayat
Touraj Laleh
Adam Ibrahim
Kartik Ahuja
Neural networks are known to be vulnerable to adversarial attacks -- slight but carefully constructed perturbations of the inputs which can … (see more)drastically impair the network's performance. Many defense methods have been proposed for improving robustness of deep networks by training them on adversarially perturbed inputs. However, these models often remain vulnerable to new types of attacks not seen during training, and even to slightly stronger versions of previously seen attacks. In this work, we propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field. Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs. This is achieved through a game where we learn features that are both predictive and robust (insensitive to adversarial attacks), i.e. cannot be used to discriminate between natural and adversarial data. Empirical results on several benchmarks demonstrate the effectiveness of the proposed approach against a wide range of attack types and attack strengths. Our code is available at https://github.com/BashivanLab/afd.
An Analysis of the Adaptation Speed of Causal Models
Rémi LE PRIOL
Reza Babanezhad Harikandeh