Publications

Naming Autism in the Right Context
Andres Roman-Urrestarazu
Varun Warrier
Integrating Equity, Diversity, and Inclusion throughout the lifecycle of Artificial Intelligence in health
Milka Nyariro
Elham Emami
Health care systems are the infrastructures that are put together to deliver health and social services to the population at large. These or… (voir plus)ganizations are increasingly applying Artificial Intelligence (AI) to improve the efficiency and effectiveness of health and social care. Unfortunately, both health care systems and AI are confronted with a lack of Equity, Diversity, and Inclusion (EDI). This short paper focuses on the importance of integrating EDI concepts throughout the life cycle of AI in health. We discuss the risks that the lack of EDI in the design, development and implementation of AI-based tools might have on the already marginalized communities and populations in the healthcare setting. Moreover, we argue that integrating EDI principles and practice throughout the lifecycle of AI in health has an important role in achieving health equity for all populations. Further research needs to be conducted to explore how studies in AI-health have integrated.
Annotation Cost-Sensitive Deep Active Learning with Limited Data (Student Abstract)
Renaud Bernatchez
Flavie Lavoie-Cardinal
Estimating Social Influence from Observational Data
Caterina De Bacco
David Blei
We consider the problem of estimating social influence, the effect that a person's behavior has on the future behavior of their peers. The k… (voir plus)ey challenge is that shared behavior between friends could be equally explained by influence or by two other confounding factors: 1) latent traits that caused people to both become friends and engage in the behavior, and 2) latent preferences for the behavior. This paper addresses the challenges of estimating social influence with three contributions. First, we formalize social influence as a causal effect, one which requires inferences about hypothetical interventions. Second, we develop Poisson Influence Factorization (PIF), a method for estimating social influence from observational data. PIF fits probabilistic factor models to networks and behavior data to infer variables that serve as substitutes for the confounding latent traits. Third, we develop assumptions under which PIF recovers estimates of social influence. We empirically study PIF with semi-synthetic and real data from Last.fm, and conduct a sensitivity analysis. We find that PIF estimates social influence most accurately compared to related methods and remains robust under some violations of its assumptions.
A Generalized Bootstrap Target for Value-Learning, Efficiently Combining Value and Feature Predictions
Anthony GX-Chen
Veronica Chelu
Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning
Utku Evci
Vincent Dumoulin
Michael Curtis Mozer
3D Infomax improves GNNs for Molecular Property Prediction
Hannes Stärk
Gabriele Corso
Prudencio Tossou
Christian Dallago
Stephan Günnemann
Pietro Lio
Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D mol… (voir plus)ecular structure as input to learned models improves their predictions for many molecular properties. However, this information is infeasible to compute at the scale required by most real-world applications. We propose pre-training a model to understand the geometry of molecules given only their 2D molecular graph. Using methods from self-supervised learning, we maximize the mutual information between a 3D summary vector and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to inform downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of molecular properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Crucially, the learned representations can be effectively transferred between datasets with vastly different molecules.
Learning To Cut By Looking Ahead: Cutting Plane Selection via Imitation Learning
Max B. Paulus
Giulia Zarpellon
Andreas Krause
Chris J. Maddison
Cutting planes are essential for solving mixed-integer linear problems (MILPs), because they facilitate bound improvements on the optimal so… (voir plus)lution value. For selecting cuts, modern solvers rely on manually designed heuristics that are tuned to gauge the potential effectiveness of cuts. We show that a greedy selection rule explicitly looking ahead to select cuts that yield the best bound improvement delivers strong decisions for cut selection - but is too expensive to be deployed in practice. In response, we propose a new neural architecture (NeuralCut) for imitation learning on the lookahead expert. Our model outperforms standard baselines for cut selection on several synthetic MILP benchmarks. Experiments with a B&C solver for neural network verification further validate our approach, and exhibit the potential of learning methods in this setting.
Only tails matter: Average-Case Universality and Robustness in the Convex Regime
Leonardo Cunha
Fabian Pedregosa
Damien Scieur
Towards efficient representation identification in supervised learning
Kartik Ahuja
Divyat Mahajan
Vasilis Syrgkanis
Humans have a remarkable ability to disentangle complex sensory inputs (e.g., image, text) into simple factors of variation (e.g., shape, co… (voir plus)lor) without much supervision. This ability has inspired many works that attempt to solve the following question: how do we invert the data generation process to extract those factors with minimal or no supervision? Several works in the literature on non-linear independent component analysis have established this negative result; without some knowledge of the data generation process or appropriate inductive biases, it is impossible to perform this inversion. In recent years, a lot of progress has been made on disentanglement under structural assumptions, e.g., when we have access to auxiliary information that makes the factors of variation conditionally independent. However, existing work requires a lot of auxiliary information, e.g., in supervised classification, it prescribes that the number of label classes should be at least equal to the total dimension of all factors of variation. In this work, we depart from these assumptions and ask: a) How can we get disentanglement when the auxiliary information does not provide conditional independence over the factors of variation? b) Can we reduce the amount of auxiliary information required for disentanglement? For a class of models where auxiliary information does not ensure conditional independence, we show theoretically and experimentally that disentanglement (to a large extent) is possible even when the auxiliary information dimension is much less than the dimension of the true latent representation.
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Maxence Ernoult
Fabrice Normandin
Abhinav Moudgil
Sean Spinney
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to… (voir plus) scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32
VIM: Variational Independent Modules for Video Prediction
Rim Assouel
Lluis Castrejon
Nicolas Ballas
We introduce a variational inference model called VIM, for Variational Independent Modules, for sequential data that learns and infers laten… (voir plus)t representations as a set of objects and discovers modular causal mechanisms over these objects. These mechanisms - which we call modules - are independently parametrized, define the stochastic transitions of entities and are shared across entities. At each time step, our model infers from a low-level input sequence a high-level sequence of categorical latent variables to select which transition modules to apply to which high-level object. We evaluate this model in video prediction tasks where the goal is to predict multi-modal future events given previous observations. We demonstrate empirically that VIM can model 2D visual sequences in an interpretable way and is able to identify the underlying dynamically instantiated mechanisms of the generation process. We additionally show that the learnt modules can be composed at test time to generalize to out-of-distribution observations.