Publications

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers
Alex Lamb
Anirudh Goyal
A. Slowik
Michael Curtis Mozer
Philippe Beaudoin
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
Quantum Tensor Networks, Stochastic Processes, and Weighted Automata
Sandesh M. Adhikary
Siddarth Srinivasan
Jacob Miller
Byron Boots
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (voir plus)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Nicolas Loizou
Sharan Vaswani
Issam Hadj Laradji
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (voir plus) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.
A Study of Condition Numbers for First-Order Optimization
Charles Guille-Escuret
Baptiste Goujaud
Manuela Girotti
[Strengthening the culture of public health surveillance and population health monitoring].
Arnaud Chiolero
St'ephane Cullati
Public health surveillance is the systematic and ongoing collection, analysis and interpretation of data to produce information useful for d… (voir plus)ecision-making. With the development of data science, surveillance methods are evolving through access to big data. More data does not automatically mean more information. For example, the massive amounts of data on Covid-19 was not easily transformed in useful information for decision-making. Further, data scientists have often difficulties to make their analyses useful for decision-making. For the implementation of evidence-based and data-driven public health practice, the culture of public health surveillance and population health monitoring needs to be strengthened.
Price discounting as a hidden risk factor of energy drink consumption
Hiroshi Mamiya
Erica E. M. Moodie
Alexandra M. Schmidt
Yu Ma
Global consumption of caffeinated energy drinks (CED) has been increasing dramatically despite increasing evidence of their adverse health e… (voir plus)ffects. Temporary price discounting is a rarely investigated but potentially powerful food marketing tactic influencing purchasing of CED. Using grocery transaction records generated by food stores in Montreal, we investigated the association between price discounting and purchasing of CED across socio-economic status operationalized by education and income levels in store neighbourhood. The outcome, log-transformed weekly store-level sales of CED, was modelled as a function of store-level percent price discounting, store- and neighbourhood-level confounders, and an interaction term between discounting and each of tertile education and income in store neighbourhood. The model was separately fit to transactions from supermarkets, pharmacies, supercentres, and convenience stores. There were 18,743, 12,437, 3965, and 49,533 weeks of CED sales from supermarkets, pharmacies, supercentres, and convenience stores, respectively. Percent price discounting was positively associated with log sales of CED for all store types, and the interaction between education and discounting was prominent in supercentres: −0.039 [95% confidence interval (CI): −0.051, −0.028] and −0.039 [95% CI: −0.057, −0.021], for middle- and high-education neighbourhoods relative to low-education neighbourhoods, respectively. Relative to low-income areas, the associations of discounting and log CED sales in supercentres for neighbourhoods with middle- and high-income tertile were 0.022 [95% CI: 0.010, 0.033] and 0.015 (95% CI: −0.001, 0.031), respectively. Price discounting is an important driver of CED consumption and has a varying impact across community education and income.
Local Data Debiasing for Fairness Based on Generative Adversarial Training
François Bidet
Sébastien Gambs
Rosin Claude Ngueveu
Alain Tapp
The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness o… (voir plus)f the process and the possible resulting discrimination. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on their profile before releasing it. Finally, experiments on real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.
Continuing professional education of Iranian healthcare professionals in shared decision-making: lessons learned
Charo Rodriguez
Jordie Croteau
Alireza Sadeghpour
Amir-Mohammad Navali
France Légaré
Staying Ahead of the Epidemiologic Curve: Evaluation of the British Columbia Asthma Prediction System (BCAPS) During the Unprecedented 2018 Wildfire Season
Sarah B. Henderson
Kathryn T. Morrison
Kathleen E. McLean
Yue Ding
Jiayun Yao
Gavin Shaddick
Parallel inference of hierarchical latent dynamics in two-photon calcium imaging of neuronal populations
Luke Y. Prince
Colleen J Gillon
Dynamic latent variable modelling has provided a powerful tool for understanding how populations of neurons compute. For spiking data, such … (voir plus)latent variable modelling can treat the data as a set of point-processes, due to the fact that spiking dynamics occur on a much faster timescale than the computational dynamics being inferred. In contrast, for other experimental techniques, the slow dynamics governing the observed data are similar in timescale to the computational dynamics that researchers want to infer. An example of this is in calcium imaging data, where calcium dynamics can have timescales on the order of hundreds of milliseconds. As such, the successful application of dynamic latent variable modelling to modalities like calcium imaging data will rest on the ability to disentangle the deeper- and shallower-level dynamical systems’ contributions to the data. To-date, no techniques have been developed to directly achieve this. Here we solve this problem by extending recent advances using sequential variational autoencoders for dynamic latent variable modelling of neural data. Our system VaLPACa (Variational Ladders for Parallel Autoencoding of Calcium imaging data) solves the problem of disentangling deeper- and shallower-level dynamics by incorporating a ladder architecture that can infer a hierarchy of dynamical systems. Using some built-in inductive biases for calcium dynamics, we show that we can disentangle calcium flux from the underlying dynamics of neural computation. First, we demonstrate with synthetic calcium data that we can correctly disentangle an underlying Lorenz attractor from calcium dynamics. Next, we show that we can infer appropriate rotational dynamics in spiking data from macaque motor cortex after it has been converted into calcium fluorescence data via a calcium dynamics model. Finally, we show that our method applied to real calcium imaging data from primary visual cortex in mice allows us to infer latent factors that carry salient sensory information about unexpected stimuli. These results demonstrate that variational ladder autoencoders are a promising approach for inferring hierarchical dynamics in experimental settings where the measured variable has its own slow dynamics, such as calcium imaging data. Our new, open-source tool thereby provides the neuroscience community with the ability to apply dynamic latent variable modelling to a wider array of data modalities.
Comment on Starke et al.: “Computing schizophrenia: ethical challenges for machine learning in psychiatry”: From machine learning to student learning: pedagogical challenges for psychiatry – Corrigendum
Christophe Gauld
Jean‐Arthur Micoulaud‐Franchi
QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications
Mingjun Zhao
Shengli Yan
Xinwang Zhong
Qian Hao
Haolan Chen
Di Niu
Bowei Long
Wei-dong Guo