Publications

Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero-Soriano
Jakob Verbeek
Contrastive learning has emerged as an efficient framework to learn multimodal representations. CLIP, a seminal work in this area, achieved … (see more)impressive results by training on paired image-text data using the contrastive loss. Recent work claims improvements over CLIP using additional non-contrastive losses inspired from self-supervised learning. However, it is sometimes hard to disentangle the contribution of these additional losses from other implementation details, e.g., data augmentation or regularization techniques, used to train the model. To shed light on this matter, in this paper, we first propose, implement and evaluate several baselines obtained by combining contrastive learning with recent advances in self-supervised learning. In particular, we use the loss functions that were proven successful for visual self-supervised learning to align image and text modalities. We find that these baselines outperform a basic implementation of CLIP. However, when a stronger training recipe is employed, the advantage disappears. Indeed, we find that a simple CLIP baseline can also be improved substantially, up to a 25% relative improvement on downstream zero-shot tasks, by using well-known training techniques that are popular in other subfields. Moreover, we discover that it is enough to apply image and text augmentations to make up for most of the improvement attained by prior works. With our improved training recipe for CLIP, we obtain state-of-the-art performance on four standard datasets, and consistently outperform prior work (up to +4% on the largest dataset), while being substantially simpler. The code is available at https://github.com/facebookresearch/clip-rocket
« L’étude de la synchronisation intercérébrale renouvelle le regard sur nos cerveaux »
François Lassagne
Posterior Sampling of the Initial Conditions of the Universe from Non-linear Large Scale Structures using Score-Based Generative Models
Matthew Ho
Shirley Ho
Benjamin Wandelt
Reconstructing the initial conditions of the universe is a key problem in cosmology. Methods based on simulating the forward evolution of th… (see more)e universe have provided a way to infer initial conditions consistent with present-day observations. However, due to the high complexity of the inference problem, these methods either fail to sample a distribution of possible initial density fields or require significant approximations in the simulation model to be tractable, potentially leading to biased results. In this work, we propose the use of score-based generative models to sample realizations of the early universe given present-day observations. We infer the initial density field of full high-resolution dark matter N-body simulations from the present-day density field and verify the quality of produced samples compared to the ground truth based on summary statistics. The proposed method is capable of providing plausible realizations of the early universe density field from the initial conditions posterior distribution marginalized over cosmological parameters and can sample orders of magnitude faster than current state-of-the-art methods.
Sensing Wellbeing in the Workplace, Why and For Whom? Envisioning Impacts with Organizational Stakeholders
Anna Kawakami
Shreya Chowdhary
Shamsi T. Iqbal
Q. Vera Liao
A.R. Olteanu
Jina Suh
Koustuv Saha
With the heightened digitization of the workplace, alongside the rise of remote and hybrid work prompted by the pandemic, there is growing c… (see more)orporate interest in using passive sensing technologies for workplace wellbeing. Existing research on these technologies often focus on understanding or improving interactions between an individual user and the technology. Workplace settings can, however, introduce a range of complexities that challenge the potential impact and in-practice desirability of wellbeing sensing technologies. Today, there is an inadequate empirical understanding of how everyday workers---including those who are impacted by, and impact the deployment of workplace technologies--envision its broader socio-ecological impacts. In this study, we conduct storyboard-driven interviews with 33 participants across three stakeholder groups: organizational governors, AI builders, and worker data subjects. Overall, our findings surface how workers envisioned wellbeing sensing technologies may lead to cascading impacts on their broader organizational culture, interpersonal relationships with colleagues, and individual day-to-day lives. Participants anticipated harms arising from ambiguity and misalignment around scaled notions of "worker wellbeing,'' underlying technical limitations to workplace-situated sensing, and assumptions regarding how social structures and relationships may shape the impacts and use of these technologies. Based on our findings, we discuss implications for designing worker-centered data-driven wellbeing technologies.
SUMMIT: Scaffolding Open Source Software Issue Discussion Through Summarization
Saskia Gilmer
Avinash Bhat
Shuvam Shah
Kevin Cherry
Jinghui Cheng
Jin L.C. Guo
The neuroanatomical substrates of autism and ADHD and their link to putative genomic underpinnings
Lisa M. Berg
Caroline Gurr
Johanna Leyhausen
Hanna Seelemeyer
Anke Bletsch
Tim Schaefer
Charlotte M. Pretzsch
Beth Oakley
Eva Loth
Dorothea L. Floris
Jan K. Buitelaar
Christian Beckmann
Tobias Banaschewski
Tony Charman
Emily J. H. Jones
Julian Tillmann
Chris H. Chatham
Thomas Bourgeron
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Daniel Claudia Yvette Bhismadev Ineke Daisy Flavio Guillaume Sarah Jessica Vincent Pilar David Lindsay Hannah Joerg Rosemary Mark H. Prantik Meng-Chuan Xavier Liogier Michael V. David J. René Andre Luke Maarten Andreas Carolin Nico Laurence Marianne Bob Gahan Antonio M. Barbara Amber Jessica Roberto Antonia San José Emily Will Roberto Heike Jack Steve C. R. Caroline Marcel P. Ahmad
Jumana Sara Bonnie Simon Sarah Sven Carsten Michael Danie Ahmad Ambrosino Auyeung Baron-Cohen Baumeister Böl … (see 58 more)
Jumana Ahmad
Sara Ambrosino
Bonnie Auyeung
Simon Baron-Cohen
Sarah Baumeister
Sven Bölte
Carsten Bours
Michael Brammer
Daniel Brandeis
Claudia Brogna
Yvette de Bruijn
Bhismadev Chakrabarti
Ineke Cornelissen
Daisy Crawley
Flavio Dell’Acqua
Sarah Durston
Jessica Faulkner
Vincent Frouin
Pilar Garcés
David Goyard
Lindsay Ham
Hannah Hayward
Joerg F. Hipp
Rosemary Holt
Mark Johnson
Prantik Kundu
Meng-Chuan Lai
Xavier Liogier D’ardhuy
Michael V. Lombardo
David J. Lythgoe
René Mandl
Andre Marquand
Luke Mason
Maarten Mennes
Andreas Meyer-Lindenberg
Carolin Moessnang
Nico Bast
Larry O’Dwyer
Marianne Oldehinkel
Bob Oranje
Gahan Pandina
Antonio Persico
Barbara Ruggeri
Amber N. V. Ruigrok
Jessica Sabet
Roberto Sacco
Antonia San José Cáceres
Emily Simonoff
Will Spooren
Roberto Toro
Heike Tost
Jack Waldman
Steve C. R. Williams
Caroline Wooldridge
Marcel P. Zwiers
Declan Murphy
Christine Ecker
Differential Chromatin Architecture and Risk Variants in Deep Layer Excitatory Neurons and Grey Matter Microglia Contribute to Major Depressive Disorder
Anjali Chawla
Wenmin Zhang
Malosree Maitra
Reza Rahimian
Haruka Mitsuhashi
MA Davoli
Jenny Yang
Gary Gang Chen
Ryan Denniston
Deborah Mash
Naguib Mechawar
Matthew Suderman
Yuemei Li
Corina Nagy
Gustavo Turecki
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks
Alexander Rubinstein
Armand Mihai Nicolicioiu
Damien Teney
Spurious correlations in the data, where multiple cues are predictive of the target labels, often lead to shortcut learning phenomena, where… (see more) a model may rely on erroneous, easy-to-learn, cues while ignoring reliable ones. In this work, we propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs). We discover that DPMs have the inherent capability to represent multiple visual cues independently, even when they are largely correlated in the training data. We leverage this characteristic to encourage model diversity and empirically show the efficacy of the approach with respect to several diversification objectives. We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
Aberrant functional brain network organization is associated with relapse during 1-year follow-up in alcohol-dependent patients
Justin Böhmer
Pablo Reinhardt
Maria Garbusow
Michael Marxen
Michael N. Smolka
U. Zimmermann
Andreas Heinz
Eva Friedel
Johann Kruschwitz
Henrik Walter
Alcohol dependence (AD) is a debilitating disease associated with high relapse rates even after long periods of abstinence. Thus, elucidatin… (see more)g neurobiological substrates of relapse risk is fundamental for the development of novel targeted interventions that could promote long-lasting abstinence. In the present study, we analyzed resting-state functional magnetic resonance imaging (rsfMRI) data from a sample of recently detoxified AD patients (n = 93) who were followed-up for 12 months after rsfMRI assessment. Specifically, we employed graph theoretic analyses to compare functional brain network topology and functional connectivity between future relapsers (REL, n = 59), future abstainers (ABS, n = 28) and age and gender matched controls (CON, n = 83). Our results suggest increased whole-brain network segregation, decreased global network integration and overall blunted connectivity strength in REL compared to CON. Conversely, we found evidence for a comparable network architecture in ABS relative to CON. At the nodal level, REL exhibited decreased integration and decoupling between multiple brain systems compared to CON, encompassing regions associated with higher-order executive functions, sensory and reward processing. Among AD patients, increased coupling between nodes implicated in reward valuation and salience attribution constitutes a particular risk factor for future relapse. Importantly, aberrant network organization in REL was consistently associated with shorter abstinence duration during follow-up, portending to a putative neural signature of relapse risk in AD. Future research should further evaluate the potential diagnostic value of the identified changes in network topology and functional connectivity for relapse prediction at the individual subject level.
AI and Catastrophic Risk
Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning
Jensen Gao
Siddharth Reddy
Anca Dragan
Sergey Levine
Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command… (see more) signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
Comparison of Radiologists and Deep Learning for US Grading of Hepatic Steatosis
Sara‐Ivana Calce
Pamela Boustros
Cassandra Larocque-Rigney
Laurent Patry-Beaudoin
Yi Hui Luo
Emre Aslan
John Marinos
Talal Alamri
Kim‐Nhien Vu
Jessica Murphy-Lavallée
Jean-Sébastien Billiard
Emmanuel Montagnon
Hongliang Li
Samuel Kadoury
Bich Nguyen
Michael Chassé
Guy Cloutier
An Tang
Background Screening for nonalcoholic fatty liver disease (NAFLD) is suboptimal due to the subjective interpretation of US images. Purpose T… (see more)o evaluate the agreement and diagnostic performance of radiologists and a deep learning model in grading hepatic steatosis in NAFLD at US, with biopsy as the reference standard. Materials and Methods This retrospective study included patients with NAFLD and control patients without hepatic steatosis who underwent abdominal US and contemporaneous liver biopsy from September 2010 to October 2019. Six readers visually graded steatosis on US images twice, 2 weeks apart. Reader agreement was assessed with use of κ statistics. Three deep learning techniques applied to B-mode US images were used to classify dichotomized steatosis grades. Classification performance of human radiologists and the deep learning model for dichotomized steatosis grades (S0, S1, S2, and S3) was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. Results The study included 199 patients (mean age, 53 years ± 13 [SD]; 101 men). On the test set (n = 52), radiologists had fair interreader agreement (0.34 [95% CI: 0.31, 0.37]) for classifying steatosis grades S0 versus S1 or higher, while AUCs were between 0.49 and 0.84 for radiologists and 0.85 (95% CI: 0.83, 0.87) for the deep learning model. For S0 or S1 versus S2 or S3, radiologists had fair interreader agreement (0.30 [95% CI: 0.27, 0.33]), while AUCs were between 0.57 and 0.76 for radiologists and 0.73 (95% CI: 0.71, 0.75) for the deep learning model. For S2 or lower versus S3, radiologists had fair interreader agreement (0.37 [95% CI: 0.33, 0.40]), while AUCs were between 0.52 and 0.81 for radiologists and 0.67 (95% CI: 0.64, 0.69) for the deep learning model. Conclusion Deep learning approaches applied to B-mode US images provided comparable performance with human readers for detection and grading of hepatic steatosis. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Tuthill in this issue.