Ce programme soutient les startups spécialisées en IA à tout moment de l'année. Bénéficiez de ressources de pointe et d'un accompagnement sur mesure pour accélérer le développement de votre technologie.
Développez des compétences fondamentales en intelligence artificielle (IA) responsable grâce à des cours autodirigés, animés par des expert·e·s de Mila reconnu·e·s à l’échelle internationale.
Le Fellowship Mila en politiques de l'IA transforme l'expertise approfondie en IA en politiques rigoureuses d'intérêt public. Découvrez la dernière publication Combler la disparité en matière d’expertise : mécanismes de transfert des connaissances pour la réglementation de l’IA par Moritz von Knebel.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Lecteur Multimédia
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Learning active tactile perception through belief-space control
Robot operating in an open world can encounter
novel objects with unknown physical properties, such as mass,
friction, or size. It is desira… (voir plus)ble to be able to sense those
property through contact-rich interaction, before performing
downstream tasks with the objects. We propose a method for
autonomously learning active tactile perception policies, by
learning a generative world model leveraging a differentiable
bayesian filtering algorithm, and designing an information-
gathering model predictive controller. We test the method on
three simulated tasks: mass estimation, height estimation and
toppling height estimation. Our method is able to discover
policies which gather information about the desired property
in an intuitive manner.
Computer-supported simulation enables a practical alternative for medical training purposes. This study investigates the co-occurrence of fa… (voir plus)cial-recognition-derived emotions and socially shared regulation of learning (SSRL) interactions in a medical simulation training context. Using transmodal analysis (TMA), we compare novice and expert learners’ affective and cognitive engagement patterns during collaborative virtual diagnosis tasks. Results reveal that expert learners exhibit strong associations between socio-cognitive interactions and high-arousal emotions (surprise, anger), suggesting focused, effortful engagement. In contrast, novice learners demonstrate stronger links between socio-cognitive processes and happiness or sadness, with less coherent SSRL patterns, potentially indicating distraction or cognitive overload. Transmodal analysis of multimodal data (facial expressions and discourse) highlights distinct regulatory strategies between groups, offering methodological and practical insights for computer-supported cooperative work (CSCW) in medical education. Our findings underscore the role of emotion-regulation dynamics in collaborative expertise development and suggest the need for tailored scaffolding to support novice learners’ socio-cognitive and affective engagement.
In this paper, we investigate optimal control of network-coupled subsystems, where the coupling between the dynamics of the subsystems is re… (voir plus)presented by the adjacency or Laplacian matrix of a directed graph. Under the assumption that the coupling matrix is normal and the cost coupling is compatible with the dynamics coupling, we use the spectral decomposition of the coupling matrix to decompose the overall system into at most n systems with noise coupled dynamics and decoupled cost, where n is the size of the network. Furthermore, the optimal control input at each subsystem can be computed by solving n1 decoupled Riccati equations where n1 (n1 ≤ n) denotes the number of distinct eigenvalues of the coupling matrix, where complex conjugate pairs are not double-counted. A salient feature of the result is that the solution complexity depends on the number of distinct eigenvalues of the coupling matrix rather than the size of the network. Therefore, the proposed solution framework provides a scalable method for synthesizing and implementing optimal control laws for large-scale network-coupled subsystems.
Building deep reinforcement learning (RL) agents that find a good policy with few samples has proven notoriously challenging. To achieve sam… (voir plus)ple efficiency, recent work has explored updating neural networks with large numbers of gradient steps for every new sample. While such high update-to-data (UTD) ratios have shown strong empirical performance, they also introduce instability to the training process. Previous approaches need to rely on periodic neural network parameter resets to address this instability, but restarting the training process is infeasible in many real-world applications and requires tuning the resetting interval. In this paper, we focus on one of the core difficulties of stable training with limited samples: the inability of learned value functions to generalize to unobserved on-policy actions. We mitigate this issue directly by augmenting the off-policy RL training process with a small amount of data generated from a learned world model. Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD) uses small amounts of generated data to stabilize high UTD training and achieve competitive performance on the most challenging tasks in the DeepMind control suite. Our experiments further highlight the importance of employing a good model to generate data, MAD-TD's ability to combat value overestimation, and its practical stability gains for continued learning.
While traditional High-Level Synthesis (HLS) converts “high-level” C-like programs into hardware automatically, producing high-performan… (voir plus)ce designs still requires hardware expertise. Optimizations such as data partitioning can have a large impact on performance since they directly affect data reuse patterns and the ability to reuse hardware. However, optimizing partitioning is a difficult process since minor changes in the parameter choices can lead to totally unpredictable performance.
Functional array-based languages have been proposed instead of C-based approaches, as they offer stronger performance guarantees. This article proposes to follow a similar approach and exposes a divide-and-conquer primitive at the algorithmic level to let users partition any arbitrary computation. The compiler is then free to explore different partition shapes to maximize both data and hardware reuse automatically. The main challenge remains that the impact of partitioning is only known much later in the compilation flow. This is due to the hard-to-predict effects of the many optimizations applied during compilation.
To solve this problem, the partitioning is expressed using a set of symbolic tunable parameters, introduced early in the compilation pipeline. A symbolic performance model is then used in the last compilation stage to predict performance based on the possible values of the tunable parameters. Using this approach, a design space exploration is conducted on an Intel Arria 10 Field Programmable Gate Arrays (FPGAs), and competitive performance is achieved on the classical VGG and TinyYolo neural networks.
One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many s… (voir plus)cenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.
With the increasing energy demand and the growing integration of renewable sources of energy, power systems face operational challenges such… (voir plus) as overloads, losses, and stability concerns, particularly as networks operate near their capacity limits. Flexible alternating current transmission system (FACTS) devices are essential to ensure reliable grid operations and enable the efficient integration of renewable energy. This work introduces a mixed-integer second-order cone programming (MISOCP) model for the multi-period scheduling of key FACTS devices in electric transmission systems. The proposed model integrates four key control mechanisms: (i) on-load tap changers (OLTCs) for voltage regulation via discrete taps; (ii) static synchronous compensators (STATCOMs) and (iii) shunt reactors for reactive power compensation; and (iv) thyristor-controlled series capacitors (TCSCs) for adjustable impedance and flow control. The objective is to minimize active power losses using a limited number of control actions while meeting physical and operational constraints at all times throughout the defined time horizon. To ensure tractability, the model employs a second-order cone relaxation of the power flow. Device-specific constraints are handled via binary expansion and linearization: OLTCs and shunt reactors are modelled with discrete variables, STATCOMs through reactive power bounds, and TCSCs using a reformulation-linearization technique (RLT). A multi-period formulation captures the sequential nature of decision making, ensuring consistency across time steps. The model is evaluated on the IEEE 9-bus, 30-bus, and RTS96 test systems, demonstrating its ability to reduce losses, with potential applicability to larger-scale grids.