« L’étude de la synchronisation intercérébrale renouvelle le regard sur nos cerveaux »
François Lassagne
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization
Dinghuai Zhang
Ricky T. Q. Chen
Cheng-Hao Liu
We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine lear… (voir plus)ning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional"flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.
Local Search GFlowNets
Minsu Kim
Taeyoung Yun
Dinghuai Zhang
Sungsoo Ahn
Jinkyoo Park
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their re… (voir plus)wards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via backtracking and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme, which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: https://github.com/dbsxodud-11/ls_gfn.
Local Search GFlowNets
Minsu Kim
Taeyoung Yun
Dinghuai Zhang
Sungsoo Ahn
Jinkyoo Park
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
Raj Ghugare
Santiago Miret
Adriana Hugessen
Mariano Phielipp
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (voir plus) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.
Sensing Wellbeing in the Workplace, Why and For Whom? Envisioning Impacts with Organizational Stakeholders
Anna Kawakami
Shreya Chowdhary
Shamsi T. Iqbal
Q. Vera Liao
Jina Suh
Koustuv Saha
With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing c… (voir plus)orporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, introduce a range of complexities that challenge the potential impact and in-practice desirability of wellbeing sensing technologies. Today, there is an inadequate empirical understanding of how everyday workers---including those who are impacted by, and impact the deployment of workplace technologies--envision its broader socio-ecological impacts. In this study, we conduct storyboard-driven interviews with 33 participants across three stakeholder groups: organizational governors, AI builders, and worker data subjects. Overall, our findings surface how workers envisioned wellbeing sensing technologies may lead to cascading impacts on their broader organizational culture, interpersonal relationships with colleagues, and individual day-to-day lives. Participants anticipated harms arising from ambiguity and misalignment around scaled notions of "worker wellbeing,'' underlying technical limitations to workplace-situated sensing, and assumptions regarding how social structures and relationships may shape the impacts and use of these technologies. Based on our findings, we discuss implications for designing worker-centered data-driven wellbeing technologies.
SUMMIT: Scaffolding Open Source Software Issue Discussion Through Summarization
Saskia Gilmer
Avinash Bhat
Shuvam Shah
Kevin Cherry
Jinghui Cheng
The neuroanatomical substrates of autism and ADHD and their link to putative genomic underpinnings
Lisa M. Berg
Caroline Gurr
Johanna Leyhausen
Hanna Seelemeyer
Anke Bletsch
Tim Schaefer
Charlotte M. Pretzsch
Beth Oakley
Eva Loth
Dorothea L. Floris
Jan K. Buitelaar
Christian Beckmann
Tobias Banaschewski
Tony Charman
Emily J. H. Jones
Julian Tillmann
Chris H. Chatham
Thomas Bourgeron
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Daniel Claudia Yvette Bhismadev Ineke Daisy Flavio Guillaume Sarah Jessica Vincent Pilar David Lindsay Hannah Joerg Rosemary Mark H. Prantik Meng-Chuan Xavier Liogier Michael V. David J. René Andre Luke Maarten Andreas Carolin Nico Laurence Marianne Bob Gahan Antonio M. Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Danie Ahmad Ambrosino Auyeung Baron-Cohen Baumeister Böl … (voir 58 de plus)
Jumana Ahmad
Sara Ambrosino
Bonnie Auyeung
Simon Baron-Cohen
Sarah Baumeister
Sven Bölte
Carsten Bours
Michael Brammer
Daniel Brandeis
Claudia Brogna
Yvette de Bruijn
Bhismadev Chakrabarti
Ineke Cornelissen
Daisy Crawley
Flavio Dell’Acqua
Sarah Durston
Jessica Faulkner
Vincent Frouin
Pilar Garcés
David Goyard
Lindsay Ham
Hannah Hayward
Joerg F. Hipp
Rosemary Holt
Mark Johnson
Prantik Kundu
Meng-Chuan Lai
Xavier Liogier D’ardhuy
Michael V. Lombardo
David J. Lythgoe
René Mandl
Andre Marquand
Luke Mason
Maarten Mennes
Andreas Meyer-Lindenberg
Carolin Moessnang
Nico Bast
Laurence O’Dwyer
Marianne Oldehinkel
Bob Oranje
Gahan Pandina
Antonio Persico
Barbara Ruggeri
Amber N. V. Ruigrok
Jessica Sabet
Roberto Sacco
Antonia San José Cáceres
Emily Simonoff
Will Spooren
Roberto Toro
Heike Tost
Jack Waldman
Steve C. R. Williams
Caroline Wooldridge
Marcel P. Zwiers
Declan Murphy
Christine Ecker
Data Cleaning and Machine Learning: A Systematic Literature Review
Pierre-Olivier Cot'e
Amin Nikanjam
Nafisa Ahmed
Dmytro Humeniuk
Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML mod… (voir plus)el is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.
Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder
Anjali Chawla
Doruk Cakmakci
Wenmin Zhang
Malosree Maitra
Reza Rahimian
Haruka Mitsuhashi
MA Davoli
Jenny Yang
Gary Gang Chen
Ryan Denniston
Deborah Mash
Naguib Mechawar
Matthew Suderman
Corina Nagy
Gustavo Turecki
Learning Reliable Logical Rules with SATNet
Zhaoyu Li
Jinpei Guo
Yuhe Jiang
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks
Luca Scimeca
Alexander Rubinstein
Armand Mihai Nicolicioiu
Damien Teney
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (voir plus) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.