Découvrez le dernier rapport d'impact de Mila, qui met en lumière les réalisations exceptionnelles des membres de notre communauté au cours de la dernière année.
Rapport et guide politique GPAI: Vers une réelle égalité en IA
Rejoignez-nous à Mila le 26 novembre pour le lancement du rapport et du guide politique qui présente des recommandations concrètes pour construire des écosystèmes d'IA inclusifs.
Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (voir plus) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.
With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing c… (voir plus)orporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, introduce a range of complexities that challenge the potential impact and in-practice desirability of wellbeing sensing technologies. Today, there is an inadequate empirical understanding of how everyday workers---including those who are impacted by, and impact the deployment of workplace technologies--envision its broader socio-ecological impacts. In this study, we conduct storyboard-driven interviews with 33 participants across three stakeholder groups: organizational governors, AI builders, and worker data subjects. Overall, our findings surface how workers envisioned wellbeing sensing technologies may lead to cascading impacts on their broader organizational culture, interpersonal relationships with colleagues, and individual day-to-day lives. Participants anticipated harms arising from ambiguity and misalignment around scaled notions of "worker wellbeing,'' underlying technical limitations to workplace-situated sensing, and assumptions regarding how social structures and relationships may shape the impacts and use of these technologies. Based on our findings, we discuss implications for designing worker-centered data-driven wellbeing technologies.
2023-10-04
Proceedings of the ACM on Human-Computer Interaction (publié)
The neuroanatomical substrates of autism and ADHD and their link to putative genomic underpinnings
Lisa M. Berg
Caroline Gurr
Johanna Leyhausen
Hanna Seelemeyer
Anke Bletsch
Tim Schaefer
Charlotte M. Pretzsch
Beth Oakley
Eva Loth
Dorothea L. Floris
Jan K. Buitelaar
Christian Beckmann
Tobias Banaschewski
Tony Charman
Emily J. H. Jones
Julian Tillmann
Chris H. Chatham
Thomas Bourgeron
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Danie Ahmad Ambrosino Auyeung Baron-Cohen Baumeister Böl
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Daniel Claudia Yvette Bhismadev Ineke Daisy Flavio Guillaume Sarah Jessica Vincent Pilar David Lindsay Hannah Joerg Rosemary Mark H. Prantik Meng-Chuan Xavier Liogier Michael V. David J. René Andre Luke Maarten Andreas Carolin Nico Laurence Marianne Bob Gahan Antonio M. Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad … (voir 58 de plus)
Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML mod… (voir plus)el is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.
Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (voir plus) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both hu… (voir plus)mans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including fore… (voir plus)casting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.