We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Beyond Correlation versus Causation: Multi-brain Neuroscience Needs Explanation
This work focuses on decision making for automated driving vehicles in interaction rich scenarios like traffic merges in a flexibly assertiv… (see more)e yet safe manner. We propose a Q-learning based approach, that takes in active intention inferences as additional inputs besides the directly observed state inputs. The outputs of Q-function are processed to select a decision by a modulation function, which can control how assertively or defensively the agent behaves.
2021-03-18
SICE Journal of Control Measurement and System Integration (published)
Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previ… (see more)ous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.
2021-03-18
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)
Modeling joint probability distributions over sequences has been studied from many perspectives. The physics community developed matrix prod… (see more)uct states, a tensor-train decomposition for probabilistic modeling, motivated by the need to tractably model many-body systems. But similar models have also been studied in the stochastic processes and weighted automata literature, with little work on how these bodies of work relate to each other. We address this gap by showing how stationary or uniform versions of popular quantum tensor network models have equivalent representations in the stochastic processes and weighted automata literature, in the limit of infinitely long sequences. We demonstrate several equivalence results between models used in these three communities: (i) uniform variants of matrix product states, Born machines and locally purified states from the quantum tensor networks literature, (ii) predictive state representations, hidden Markov models, norm-observable operator models and hidden quantum Markov models from the stochastic process literature,and (iii) stochastic weighted automata, probabilistic automata and quadratic automata from the formal languages literature. Such connections may open the door for results and methods developed in one area to be applied in another.
2021-03-18
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly used in the subgradient method. Although computing… (see more) the Polyak step-size requires knowledge of the optimal function values, this information is readily available for typical modern machine learning applications. Consequently, the proposed stochastic Polyak step-size (SPS) is an attractive choice for setting the learning rate for stochastic gradient descent (SGD). We provide theoretical convergence guarantees for SGD equipped with SPS in different settings, including strongly convex, convex and non-convex functions. Furthermore, our analysis results in novel convergence guarantees for SGD with a constant step-size. We show that SPS is particularly effective when training over-parameterized models capable of interpolating the training data. In this setting, we prove that SPS enables SGD to converge to the true solution at a fast rate without requiring the knowledge of any problem-dependent constants or additional computational overhead. We experimentally validate our theoretical results via extensive experiments on synthetic and real datasets. We demonstrate the strong performance of SGD with SPS compared to state-of-the-art optimization methods when training over-parameterized models.
2021-03-18
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (published)
Public health surveillance is the systematic and ongoing collection, analysis and interpretation of data to produce information useful for d… (see more)ecision-making. With the development of data science, surveillance methods are evolving through access to big data. More data does not automatically mean more information. For example, the massive amounts of data on Covid-19 was not easily transformed in useful information for decision-making. Further, data scientists have often difficulties to make their analyses useful for decision-making. For the implementation of evidence-based and data-driven public health practice, the culture of public health surveillance and population health monitoring needs to be strengthened.
Global consumption of caffeinated energy drinks (CED) has been increasing dramatically despite increasing evidence of their adverse health e… (see more)ffects. Temporary price discounting is a rarely investigated but potentially powerful food marketing tactic influencing purchasing of CED. Using grocery transaction records generated by food stores in Montreal, we investigated the association between price discounting and purchasing of CED across socio-economic status operationalized by education and income levels in store neighbourhood. The outcome, log-transformed weekly store-level sales of CED, was modelled as a function of store-level percent price discounting, store- and neighbourhood-level confounders, and an interaction term between discounting and each of tertile education and income in store neighbourhood. The model was separately fit to transactions from supermarkets, pharmacies, supercentres, and convenience stores. There were 18,743, 12,437, 3965, and 49,533 weeks of CED sales from supermarkets, pharmacies, supercentres, and convenience stores, respectively. Percent price discounting was positively associated with log sales of CED for all store types, and the interaction between education and discounting was prominent in supercentres: −0.039 [95% confidence interval (CI): −0.051, −0.028] and −0.039 [95% CI: −0.057, −0.021], for middle- and high-education neighbourhoods relative to low-education neighbourhoods, respectively. Relative to low-income areas, the associations of discounting and log CED sales in supercentres for neighbourhoods with middle- and high-income tertile were 0.022 [95% CI: 0.010, 0.033] and 0.015 (95% CI: −0.001, 0.031), respectively. Price discounting is an important driver of CED consumption and has a varying impact across community education and income.
The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness o… (see more)f the process and the possible resulting discrimination. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on their profile before releasing it. Finally, experiments on real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.
Staying Ahead of the Epidemiologic Curve: Evaluation of the British Columbia Asthma Prediction System (BCAPS) During the Unprecedented 2018 Wildfire Season
Dynamic latent variable modelling has provided a powerful tool for understanding how populations of neurons compute. For spiking data, such … (see more)latent variable modelling can treat the data as a set of point-processes, due to the fact that spiking dynamics occur on a much faster timescale than the computational dynamics being inferred. In contrast, for other experimental techniques, the slow dynamics governing the observed data are similar in timescale to the computational dynamics that researchers want to infer. An example of this is in calcium imaging data, where calcium dynamics can have timescales on the order of hundreds of milliseconds. As such, the successful application of dynamic latent variable modelling to modalities like calcium imaging data will rest on the ability to disentangle the deeper- and shallower-level dynamical systems’ contributions to the data. To-date, no techniques have been developed to directly achieve this. Here we solve this problem by extending recent advances using sequential variational autoencoders for dynamic latent variable modelling of neural data. Our system VaLPACa (Variational Ladders for Parallel Autoencoding of Calcium imaging data) solves the problem of disentangling deeper- and shallower-level dynamics by incorporating a ladder architecture that can infer a hierarchy of dynamical systems. Using some built-in inductive biases for calcium dynamics, we show that we can disentangle calcium flux from the underlying dynamics of neural computation. First, we demonstrate with synthetic calcium data that we can correctly disentangle an underlying Lorenz attractor from calcium dynamics. Next, we show that we can infer appropriate rotational dynamics in spiking data from macaque motor cortex after it has been converted into calcium fluorescence data via a calcium dynamics model. Finally, we show that our method applied to real calcium imaging data from primary visual cortex in mice allows us to infer latent factors that carry salient sensory information about unexpected stimuli. These results demonstrate that variational ladder autoencoders are a promising approach for inferring hierarchical dynamics in experimental settings where the measured variable has its own slow dynamics, such as calcium imaging data. Our new, open-source tool thereby provides the neuroscience community with the ability to apply dynamic latent variable modelling to a wider array of data modalities.