Publications

Local Search GFlowNets

Minsu Kim

Taeyoung Yun

Emmanuel Bengio

Dinghuai Zhang

Sungsoo Ahn

Jinkyoo Park

Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their re… (voir plus)wards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search, which focuses on exploiting high-rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via backtracking and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme, which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: https://github.com/dbsxodud-11/ls_gfn.

2023-10-04

ArXiv (prépublication)

doi.org

arxiv.org

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Raj Ghugare

Santiago Miret

Adriana Hugessen

Mariano Phielipp

Glen Berseth

Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However,… (voir plus) RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.

2023-10-04

ArXiv (prépublication)

doi.org

arxiv.org

Sensing Wellbeing in the Workplace, Why and For Whom? Envisioning Impacts with Organizational Stakeholders

Anna Kawakami

Shreya Chowdhary

Shamsi T. Iqbal

Q. Vera Liao

Alexandra Olteanu

Jina Suh

Koustuv Saha

With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing c… (voir plus)orporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, introduce a range of complexities that challenge the potential impact and in-practice desirability of wellbeing sensing technologies. Today, there is an inadequate empirical understanding of how everyday workers---including those who are impacted by, and impact the deployment of workplace technologies--envision its broader socio-ecological impacts. In this study, we conduct storyboard-driven interviews with 33 participants across three stakeholder groups: organizational governors, AI builders, and worker data subjects. Overall, our findings surface how workers envisioned wellbeing sensing technologies may lead to cascading impacts on their broader organizational culture, interpersonal relationships with colleagues, and individual day-to-day lives. Participants anticipated harms arising from ambiguity and misalignment around scaled notions of "worker wellbeing,'' underlying technical limitations to workplace-situated sensing, and assumptions regarding how social structures and relationships may shape the impacts and use of these technologies. Based on our findings, we discuss implications for designing worker-centered data-driven wellbeing technologies.

2023-10-04

Proceedings of the ACM on Human-Computer Interaction (publié)

doi.org

arxiv.org

SUMMIT: Scaffolding Open Source Software Issue Discussion Through Summarization

Saskia Gilmer

Avinash Bhat

Shuvam Shah

Kevin Cherry

Jinghui Cheng

Jin Guo

2023-10-04

Proceedings of the ACM on Human-Computer Interaction (publié)

doi.org

arxiv.org

The neuroanatomical substrates of autism and ADHD and their link to putative genomic underpinnings

Lisa M. Berg

Caroline Gurr

Johanna Leyhausen

Hanna Seelemeyer

Anke Bletsch

Tim Schaefer

Charlotte M. Pretzsch

Beth Oakley

Eva Loth

Dorothea L. Floris

Jan K. Buitelaar

Christian Beckmann

Tobias Banaschewski

Tony Charman

Emily J. H. Jones

Julian Tillmann

Chris H. Chatham

Thomas Bourgeron

Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Danie Ahmad Ambrosino Auyeung Baron-Cohen Baumeister Böl

Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Daniel Claudia Yvette Bhismadev Ineke Daisy Flavio Guillaume Sarah Jessica Vincent Pilar David Lindsay Hannah Joerg Rosemary Mark H. Prantik Meng-Chuan Xavier Liogier Michael V. David J. René Andre Luke Maarten Andreas Carolin Nico Laurence Marianne Bob Gahan Antonio M. Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad … (voir 58 de plus)

Jumana Ahmad

Sara Ambrosino

Bonnie Auyeung

Simon Baron-Cohen

Sarah Baumeister

Sven Bölte

Carsten Bours

Michael Brammer

Daniel Brandeis

Claudia Brogna

Yvette de Bruijn

Bhismadev Chakrabarti

Ineke Cornelissen

Daisy Crawley

Flavio Dell’Acqua

Guillaume Dumas

Sarah Durston

Jessica Faulkner

Vincent Frouin

Pilar Garcés

David Goyard

Lindsay Ham

Hannah Hayward

Joerg F. Hipp

Rosemary Holt

Mark Johnson

Prantik Kundu

Meng-Chuan Lai

Xavier Liogier D’ardhuy

Michael V. Lombardo

David J. Lythgoe

René Mandl

Andre Marquand

Luke Mason

Maarten Mennes

Andreas Meyer-Lindenberg

Carolin Moessnang

Nico Bast

Laurence O’Dwyer

Marianne Oldehinkel

Bob Oranje

Gahan Pandina

Antonio Persico

Barbara Ruggeri

Amber N. V. Ruigrok

Jessica Sabet

Roberto Sacco

Antonia San José Cáceres

Emily Simonoff

Will Spooren

Roberto Toro

Heike Tost

Jack Waldman

Steve C. R. Williams

Caroline Wooldridge

Marcel P. Zwiers

Declan Murphy

Christine Ecker

2023-10-04

Molecular Autism (publié)

doi.org

Data Cleaning and Machine Learning: A Systematic Literature Review

Pierre-Olivier Cot'e

Amin Nikanjam

Nafisa Ahmed

Dmytro Humeniuk

Foutse Khomh

Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML mod… (voir plus)el is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.

2023-10-03

ArXiv (prépublication)

doi.org

arxiv.org

Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder

Anjali Chawla

Doruk Cakmakci

Wenmin Zhang

Malosree Maitra

Reza Rahimian

Haruka Mitsuhashi

MA Davoli

Jenny Yang

Gary Gang Chen

Ryan Denniston

Deborah Mash

Naguib Mechawar

Matthew Suderman

Yue Li

Corina Nagy

Gustavo Turecki

2023-10-03

bioRxiv (prépublication)

doi.org

Learning Reliable Logical Rules with SATNet

Zhaoyu Li

Jinpei Guo

Yuhe Jiang

Xujie Si

2023-10-03

ArXiv (prépublication)

doi.org

arxiv.org

Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks

Luca Scimeca

Alexander Rubinstein

Armand Nicolicioiu

Damien Teney

Yoshua Bengio

Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (voir plus) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.

2023-10-03

ArXiv (prépublication)

doi.org

arxiv.org

Aberrant functional brain network organization is associated with relapse during 1‐year follow‐up in alcohol‐dependent patients

Justin Böhmer

Pablo Reinhardt

Maria Garbusow

Michael Marxen

Michael N. Smolka

Ulrich S. Zimmermann

Andreas Heinz

Danilo Bzdok

Eva Friedel

Johann D. Kruschwitz

Henrik Walter

2023-10-02

Addiction Biology (publié)

doi.org

GraphText: Graph Reasoning in Text Space

Jianan Zhao

Le Zhuo

Yikang Shen

Meng Qu

Kai Liu

Michael Bronstein

Zhaocheng Zhu

Jian Tang

Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both hu… (voir plus)mans and other LLMs. However, despite their impressive achievements, LLMs have not made significant advancements in the realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging to transform them into natural language that LLMs understand. In this paper, we bridge this gap with a novel framework, GraphText, that translates graphs into natural language. GraphText derives a graph-syntax tree for each graph that encapsulates both the node attributes and inter-node relationships. Traversal of the tree yields a graph text sequence, which is then processed by an LLM to treat graph tasks as text generation tasks. Notably, GraphText offers multiple advantages. It introduces training-free graph reasoning: even without training on graph data, GraphText with ChatGPT can achieve on par with, or even surpassing, the performance of supervised-trained graph neural networks through in-context learning (ICL). Furthermore, GraphText paves the way for interactive graph reasoning, allowing both humans and LLMs to communicate with the model seamlessly using natural language. These capabilities underscore the vast, yet-to-be-explored potential of LLMs in the domain of graph machine learning.

2023-10-02

ArXiv (prépublication)

doi.org

arxiv.org

Imitation Learning from Observation through Optimal Transport

Wei-Di Chang

Scott Fujimoto

David Meger

Gregory Dudek

2023-10-02

ArXiv (prépublication)

doi.org