GPAI Report & Policy Guide: Towards Substantive Equality in AI
Join us at Mila on November 26 for the launch of the report and policy guide that outlines actionable recommendations for building inclusive AI ecosystems.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Smart about medications (SAM): a digital solution to enhance medication management following hospital discharge
RNN with Particle Flow for Probabilistic Spatio-temporal Forecasting
Soumyasundar Pal
Liheng Ma
Yingxue Zhang
M. Coates
Spatio-temporal forecasting has numerous applications in analyzing wireless, traffic, and financial networks. Many classical statistical mod… (see more)els often fall short in handling the complexity and high non-linearity present in time-series data. Recent advances in deep learning allow for better modelling of spatial and temporal dependencies. While most of these models focus on obtaining accurate point forecasts, they do not characterize the prediction uncertainty. In this work, we consider the time-series data as a random realization from a nonlinear state-space model and target Bayesian inference of the hidden states for probabilistic forecasting. We use particle flow as the tool for approximating the posterior distribution of the states, as it is shown to be highly effective in complex, high-dimensional settings. Thorough experimentation on several real world time-series datasets demonstrates that our approach provides better characterization of uncertainty while maintaining comparable accuracy to the state-of-the art point forecasting methods.
Purpose A major obstacle to the clinical implementation of quantitative MR is the lengthy acquisition time required to derive multi-contrast… (see more) parametric maps. We sought to reduce the acquisition time for quantitative susceptibility mapping (QSM) and macromolecular tissue volume (MTV) by acquiring both contrasts simultaneously by leveraging their redundancies. The Joint Virtual Coil concept with generalized autocalibrating partially parallel acquisitions (JVC-GRAPPA) was applied to reduce acquisition time further. Methods Three adult volunteers were imaged on a 3T scanner using a multi-echo 3D GRE sequence acquired at three head orientations. MTV, QSM, R2*, T1, and proton density maps were reconstructed. The same sequence (GRAPPA R=4) was performed in subject #1 with a single head orientation for comparison. Fully sampled data was acquired in subject #2, from which retrospective undersampling was performed (R=6 GRAPPA and R=9 JVC-GRAPPA). Prospective undersampling was performed in subject #3 (R=6 GRAPPA and R=9 JVC-GRAPPA) using gradient blips to shift k-space sampling in later echoes. Results Subject #1’s multi-orientation and single-orientation MTV maps were not significantly different based on RMSE. For subject #2, the retrospectively undersampled JVC-GRAPPA and GRAPPA generated similar results as fully sampled data. This approach was validated with the prospectively undersampled images in subject #3. Using QSM, R2*, and MTV, the contributions of myelin and iron content to susceptibility was estimated. Conclusion We have developed a novel strategy to simultaneously acquire data for the reconstruction of five intrinsically co-registered 1-mm isotropic resolution multi-parametric maps, with a scan time of 6 minutes using JVC-GRAPPA.
In this paper, we analyze and extend an online learning frame-work known as Context-Attentive Bandit, motivated by various practical applica… (see more)tions, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.
2021-06-06
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)
Building multi-domain AI agents is a challenging task and an open problem in the area of AI. Within the domain of dialog, the ability to orc… (see more)hestrate multiple independently trained dialog agents, or skills, to create a unified system is of particular significance. In this work, we study the task of online posterior dialog orchestration, where we define posterior orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. To account for the various costs associated with extracting skill features, we consider online posterior orchestration under a skill execution budget. We formalize this setting as Context Attentive Bandit with Observations (CABO), a variant of context attentive bandits, and evaluate it on proprietary conversational datasets.
2021-06-06
ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)
A major bottleneck in the real-world applications of machine learning models is their failure in generalizing to unseen domains whose data d… (see more)istribution is not i.i.d to the training domains. This failure often stems from learning non-generalizable features in the training domains that are spuriously correlated with the label of data. To address this shortcoming, there has been a growing surge of interest in learning good explanations that are hard to vary, which is studied under the notion of Out-of-Distribution (OOD) Generalization. The search for good explanations that are \textit{invariant} across different domains can be seen as finding local (global) minimas in the loss landscape that hold true across all of the training domains. In this paper, we propose a masking strategy, which determines a continuous weight based on the agreement of gradients that flow in each edge of network, in order to control the amount of update received by the edge in each step of optimization. Particularly, our proposed technique referred to as"Smoothed-AND (SAND)-masking", not only validates the agreement in the direction of gradients but also promotes the agreement among their magnitudes to further ensure the discovery of invariances across training domains. SAND-mask is validated over the Domainbed benchmark for domain generalization and significantly improves the state-of-the-art accuracy on the Colored MNIST dataset while providing competitive results on other domain generalization datasets.
We study how different output layers in a deep neural network learn and forget in continual learning settings. The following three factors… (see more) can affect catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layers may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of the output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, it turns out that changing parameterization is sufficient in order to achieve a significantly better performance, whithout introducing a continual-learning algorithm and instead using the standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios, and suggest a way of selecting the best type of output layer for a given scenario.