The Mila AI Policy Fellowship translates deep AI expertise into rigorous, public-interest policy. Read the newest publication Bridging the Expertise Gap: Knowledge Transfer Mechanisms for AI Regulation by Moritz von Knebel
This program supports AI startups at any time of the year. Benefit from cutting-edge resources and tailored support to accelerate your technology's development.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Dynamic planning of redundant robots within a set-based task-priority inverse kinematics framework.
This work presents the dynamic planning of redundant robots by merging a global and local planner. The global planner is implemented as a sa… (see more)mpling-based algorithm which works in the reduced-dimensionality of the robot workspace applying the Cartesian constraints only. The output trajectory is then checked within a framework of set-based task priority inverse kinematics verifying the fulfillment of the other task constraints. The inverse kinematics framework is used also in real-time as local motion control to ensure a reactive behaviour to address, e.g., mismatch between the apriori information and on-line perception acquisition. During the movement, the motion planner runs in background to adapt to changes in the environment or, in general, to continuously optimize the path. The proposed method is experimentally validated with a Kinova Jaco2 7 degrees of freedom manipulator.
2020-07-31
Conference on Control Technology and Applications (published)
Recently, a model of a decentralized control system with local and remote controllers connected over unreliable channels was presented in [… (see more)1]. The model has a nonclassical information structure that is not partially nested. Nonetheless, it is shown in [1] that the optimal control strategies are linear functions of the state estimate (which is a nonlinear function of the observations). Their proof is based on a fairly sophisticated dynamic programming argument. In this article, we present an alternative and elementary proof of the result which uses common information-based conditional independence and completion of squares.
2020-07-31
IEEE Transactions on Automatic Control (published)
An online reinforcement learning algorithm called renewal Monte Carlo (RMC) is presented. RMC works for infinite horizon Markov decision pro… (see more)cesses with a designated start state. RMC is a Monte Carlo algorithm that retains the key advantages of Monte Carlo—viz., simplicity, ease of implementation, and low bias—while circumventing the main drawbacks of Monte Carlo—viz., high variance and delayed updates. Given a parameterized policy
2020-07-31
IEEE Transactions on Automatic Control (published)
Identification of disease subtypes and corresponding biomarkers can substantially improve clinical diagnosis and treatment selection. Discov… (see more)ering these subtypes in noisy, high dimensional biomedical data is often impossible for humans and challenging for machines. We introduce a new approach to facilitate the discovery of disease subtypes: Instead of analyzing the original data, we train a diagnostic classifier (healthy vs. diseased) and extract instance-wise explanations for the classifier’s decisions. The distribution of instances in the explanation space of our diagnostic classifier amplifies the different reasons for belonging to the same class–resulting in a representation that is uniquely useful for discovering latent subtypes. We compare our ability to recover subtypes via cluster analysis on model explanations to classical cluster analysis on the original data. In multiple datasets with known ground-truth subclasses, particularly on UK Biobank brain imaging data and transcriptome data from the Cancer Genome Atlas, we show that cluster analysis on model explanations substantially outperforms the classical approach. While we believe clustering in explanation space to be particularly valuable for inferring disease subtypes, the method is more general and applicable to any kind of sub-type identification.
Introduction The need to streamline patient management for coronavirus disease-19 (COVID-19) has become more pressing than ever. Chest X-ray… (see more)s (CXRs) provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge the severity of COVID-19 lung infections (and pneumonia in general) that can be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. Methods Images from a public COVID-19 database were scored retrospectively by three blinded experts in terms of the extent of lung involvement as well as the degree of opacity. A neural network model that was pre-trained on large (non-COVID-19) chest X-ray datasets is used to construct features for COVID-19 images which are predictive for our task. Results This study finds that training a regression model on a subset of the outputs from this pre-trained chest X-ray model predicts our geographic extent score (range 0-8) with 1.14 mean absolute error (MAE) and our lung opacity score (range 0-6) with 0.78 MAE. Conclusions These results indicate that our model’s ability to gauge the severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. To enable follow up work, we make our code, labels, and data available online.
In recent years, the multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems a… (see more)nd information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning from less feedback. The multiarmed bandit field is currently experiencing a renaissance, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.
2020-07-18
2020 IEEE Congress on Evolutionary Computation (CEC) (published)
Regulation of gene expression drives protein interactions that govern synaptic wiring and neuronal activity. The resulting coordinated activ… (see more)ity among neuronal populations supports complex psychological processes, yet how gene expression shapes cognition and emotion remains unknown. Here we directly bridge the microscale and macroscale by mapping gene expression patterns to functional activation patterns across the cortical sheet. Applying unsupervised learning to the Allen Human Brain Atlas and Neurosynth databases, we identify a ventromedial-dorsolateral gradient of gene assemblies that separate affective and cognitive domains. This topographic molecular-psychological signature reflects the hierarchical organization of the neocortex, including systematic variations in cell type, myeloarchitecture, laminar differentiation, and intrinsic network affiliation. In addition, this molecular-psychological signature is related to individual differences in cognitive performance, strengthens over neurodevelopment, and can be replicated in two independent repositories. Collectively, our results reveal spatially covarying transcriptomic and cognitive architectures, highlighting the influence that molecular mechanisms exert on psychological processes.
Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep gen… (see more)erative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.
Multi-task reinforcement learning is a rich paradigm where information from previously seen environments can be leveraged for better perform… (see more)ance and improved sample-efficiency in new environments. In this work, we leverage ideas of common structure underlying a family of Markov decision processes (MDPs) to improve performance in the few-shot regime. We use assumptions of structure from Hidden-Parameter MDPs and Block MDPs to propose a new framework, HiP-BMDP, and approach for learning a common representation and universal dynamics model. To this end, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work. To demonstrate the efficacy of the proposed method, we empirically compare and show improvements against other multi-task and meta-reinforcement learning baselines.
The success of adversarial formulations in machine learning has brought renewed motivation for smooth games. In this work, we focus on the c… (see more)lass of stochastic Hamiltonian methods and provide the first convergence guarantees for certain classes of stochastic smooth games. We propose a novel unbiased estimator for the stochastic Hamiltonian gradient descent (SHGD) and highlight its benefits. Using tools from the optimization literature we show that SHGD converges linearly to the neighbourhood of a stationary point. To guarantee convergence to the exact solution, we analyze SHGD with a decreasing step-size and we also present the first stochastic variance reduced Hamiltonian method. Our results provide the first global non-asymptotic last-iterate convergence guarantees for the class of stochastic unconstrained bilinear games and for the more general class of stochastic games that satisfy a "sufficiently bilinear" condition, notably including some non-convex non-concave problems. We supplement our analysis with experiments on stochastic bilinear and sufficiently bilinear games, where our theory is shown to be tight, and on simple adversarial machine learning formulations.