Publications

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Alex Lamb

Anirudh Goyal

A. Slowik

Michael Curtis Mozer

Philippe Beaudoin

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (voir plus)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

Quantum Tensor Networks, Stochastic Processes, and Weighted Automata

Sandesh M. Adhikary

Siddarth Srinivasan

Jacob Miller

Guillaume Rabusseau

Byron Boots

Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (voir plus)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence

Nicolas Loizou

Sharan Vaswani

Issam Hadj Laradji

Simon Lacoste-Julien

We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (voir plus) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

A Study of Condition Numbers for First-Order Optimization

Charles Guille-Escuret

Baptiste Goujaud

Manuela Girotti

Ioannis Mitliagkas

2021-03-18

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

arxiv.org

[Strengthening the culture of public health surveillance and population health monitoring].

Arnaud Chiolero

David Buckeridge

St'ephane Cullati

Public health surveillance is the systematic and ongoing collection, analysis and interpretation of data to produce information useful for d… (voir plus)ecision-making. With the development of data science, surveillance methods are evolving through access to big data. More data does not automatically mean more information. For example, the massive amounts of data on Covid-19 was not easily transformed in useful information for decision-making. Further, data scientists have often difficulties to make their analyses useful for decision-making. For the implementation of evidence-based and data-driven public health practice, the culture of public health surveillance and population health monitoring needs to be strengthened.

2021-03-17

Revue medicale suisse (publié)

pubmed.ncbi.nlm.nih.gov

Price discounting as a hidden risk factor of energy drink consumption

Hiroshi Mamiya

Erica E. M. Moodie

Alexandra M. Schmidt

Yu Ma

David Buckeridge

Global consumption of caffeinated energy drinks (CED) has been increasing dramatically despite increasing evidence of their adverse health e… (voir plus)ffects. Temporary price discounting is a rarely investigated but potentially powerful food marketing tactic influencing purchasing of CED. Using grocery transaction records generated by food stores in Montreal, we investigated the association between price discounting and purchasing of CED across socio-economic status operationalized by education and income levels in store neighbourhood. The outcome, log-transformed weekly store-level sales of CED, was modelled as a function of store-level percent price discounting, store- and neighbourhood-level confounders, and an interaction term between discounting and each of tertile education and income in store neighbourhood. The model was separately fit to transactions from supermarkets, pharmacies, supercentres, and convenience stores. There were 18,743, 12,437, 3965, and 49,533 weeks of CED sales from supermarkets, pharmacies, supercentres, and convenience stores, respectively. Percent price discounting was positively associated with log sales of CED for all store types, and the interaction between education and discounting was prominent in supercentres: −0.039 [95% confidence interval (CI): −0.051, −0.028] and −0.039 [95% CI: −0.057, −0.021], for middle- and high-education neighbourhoods relative to low-education neighbourhoods, respectively. Relative to low-income areas, the associations of discounting and log CED sales in supercentres for neighbourhoods with middle- and high-income tertile were 0.022 [95% CI: 0.010, 0.033] and 0.015 (95% CI: −0.001, 0.031), respectively. Price discounting is an important driver of CED consumption and has a varying impact across community education and income.

2021-03-16

Canadian Journal of Public Health (publié)

doi.org

Local Data Debiasing for Fairness Based on Generative Adversarial Training

Ulrich Aivodji

François Bidet

Sébastien Gambs

Rosin Claude Ngueveu

Alain Tapp

The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness o… (voir plus)f the process and the possible resulting discrimination. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on their profile before releasing it. Finally, experiments on real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.

2021-03-14

Algorithms (publié)

doi.org

arxiv.org

Continuing professional education of Iranian healthcare professionals in shared decision-making: lessons learned

Samira Abbasgholizadeh-Rahimi

Charo Rodriguez

Jordie Croteau

Alireza Sadeghpour

Amir-Mohammad Navali

France Légaré

2021-03-12

BMC Health Services Research (publié)

doi.org

Staying Ahead of the Epidemiologic Curve: Evaluation of the British Columbia Asthma Prediction System (BCAPS) During the Unprecedented 2018 Wildfire Season

Sarah B. Henderson

Kathryn T. Morrison

Kathleen E. McLean

Yue Ding

Jiayun Yao

Gavin Shaddick

David Buckeridge

2021-03-12

Frontiers in Public Health (publié)

doi.org

Parallel inference of hierarchical latent dynamics in two-photon calcium imaging of neuronal populations

Luke Y. Prince

Shahab Bakhtiari

Colleen J Gillon

Blake Richards

Dynamic latent variable modelling has provided a powerful tool for understanding how populations of neurons compute. For spiking data, such … (voir plus)latent variable modelling can treat the data as a set of point-processes, due to the fact that spiking dynamics occur on a much faster timescale than the computational dynamics being inferred. In contrast, for other experimental techniques, the slow dynamics governing the observed data are similar in timescale to the computational dynamics that researchers want to infer. An example of this is in calcium imaging data, where calcium dynamics can have timescales on the order of hundreds of milliseconds. As such, the successful application of dynamic latent variable modelling to modalities like calcium imaging data will rest on the ability to disentangle the deeper- and shallower-level dynamical systems’ contributions to the data. To-date, no techniques have been developed to directly achieve this. Here we solve this problem by extending recent advances using sequential variational autoencoders for dynamic latent variable modelling of neural data. Our system VaLPACa (Variational Ladders for Parallel Autoencoding of Calcium imaging data) solves the problem of disentangling deeper- and shallower-level dynamics by incorporating a ladder architecture that can infer a hierarchy of dynamical systems. Using some built-in inductive biases for calcium dynamics, we show that we can disentangle calcium flux from the underlying dynamics of neural computation. First, we demonstrate with synthetic calcium data that we can correctly disentangle an underlying Lorenz attractor from calcium dynamics. Next, we show that we can infer appropriate rotational dynamics in spiking data from macaque motor cortex after it has been converted into calcium fluorescence data via a calcium dynamics model. Finally, we show that our method applied to real calcium imaging data from primary visual cortex in mice allows us to infer latent factors that carry salient sensory information about unexpected stimuli. These results demonstrate that variational ladder autoencoders are a promising approach for inferring hierarchical dynamics in experimental settings where the measured variable has its own slow dynamics, such as calcium imaging data. Our new, open-source tool thereby provides the neuroscience community with the ability to apply dynamic latent variable modelling to a wider array of data modalities.

2021-03-08

bioRxiv (prépublication)

doi.org

Comment on Starke et al.: “Computing schizophrenia: ethical challenges for machine learning in psychiatry”: From machine learning to student learning: pedagogical challenges for psychiatry – Corrigendum

Christophe Gauld

Jean‐Arthur Micoulaud‐Franchi

Guillaume Dumas

2021-03-04

Psychological Medicine (publié)

doi.org

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Mingjun Zhao

Shengli Yan

Bang Liu

Xinwang Zhong

Qian Hao

Haolan Chen

Di Niu

Bowei Long

Wei-dong Guo

2021-03-01

Computer Speech & Language (publié)

doi.org

arxiv.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications