Publications

Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
Xingfang Wu
Heng Li
Nobukazu Yoshioka
Hironori Washizaki
Specific inhibition and disinhibition in the higher-order structure of a cortical connectome
Michael W. Reimann
Daniela Egas Santander
András Ecker
Neuronal network activity is thought to be structured around the activation of assemblies, or low-dimensional manifolds describing states of… (voir plus) activity. Both views describe neurons acting not independently, but in concert, likely facilitated by strong recurrent excitation between them. The role of inhibition in these frameworks – if considered at all – is often reduced to blanket inhibition with no specificity with respect to which excitatory neurons are targeted. We analyzed the structure of excitation and inhibition in the MICrONS 1mm3 dataset, an electron microscopic reconstruction of a piece of cortical tissue. We found that excitation was structured around a feed-forward flow in non-random motifs of seven or more neurons. This revealed a structure of information flow from a small number of sources to a larger number of potential targets that became only visible when larger motifs were considered instead of individual pairs. Inhibitory neurons targeted and were targeted by neurons in specific sequential positions of these motifs. Additionally, disynaptic inhibition was strongest between target motifs excited by the same group of source neurons, implying competition between them. The structure of this inhibition was also highly specific and symmetrical, contradicting the idea of non-specific blanket inhibition. None of these trends are detectable in only pairwise connectivity, demonstrating that inhibition is specifically structured by these large motifs. Further, we found that these motifs represent higher order connectivity patterns which are present, but to a lesser extent in a recently released, detailed computational model, and not at all in a distance-dependent control. These findings have important implications for how synaptic plasticity reorganizes neocortical connectivity to implement learning and for the specific role of inhibition in this process.
When Nash Meets Stackelberg
Gabriele Dragotto
Felipe Feijoo
Sriram Sankaranarayanan
Capture the Flag: Uncovering Data Insights with Large Language Models
Issam Hadj Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. Howev… (voir plus)er, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a"capture the flag"principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community.
CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data
Louis Mullie
Jonathan Afilalo
Patrick Archambault
Rima Bouchakri
Kip Brown
Yiorgos Alexandros Cavayas
Alexis F Turgeon
Denis Martineau
François Lamontagne
Martine Lebrasseur
Renald Lemieux
Jeffrey Li
Michaël Sauthier
Pascal St-Onge
An Tang
William Witteman
Michaël Chassé
Extended Lyman-alpha emission towards the SPT2349-56 protocluster at $z=4.3$
Yordanka Apostolovski
Manuel Aravena
Timo Anguita
Matthieu Béthermin
James R. Burgoyne
Scott Chapman
C. Breuck
Anthony R Gonzalez
Max Gronke
Lucia Guaita
Ryley Hill
Sreevani Jarugula
E. Johnston
M. Malkan
Desika Narayanan
Cassie Reuter
Manuel Solimano
Justin Spilker
Nikolaus Sulzenauer … (voir 5 de plus)
Joaquin Vieira
Joaquin Daniel Vieira
David Vizgan
Axel Wei
Axel Weiß
Deep spectroscopic surveys with the Atacama Large Millimeter/submillimeter Array (ALMA) have revealed that some of the brightest infrared so… (voir plus)urces in the sky correspond to concentrations of submillimeter galaxies (SMGs) at high redshift. Among these, the SPT2349-56 protocluster system is amongst the most extreme examples given its high source density and integrated star formation rate. We conducted a deep Lyman-alpha line emission survey around SPT2349-56 using the Multi-Unit Spectroscopic Explorer (MUSE) at the Very Large Telescope (VLT) in order to characterize this uniquely dense environment. Taking advantage of the deep three-dimensional nature of this survey, we performed a sensitive search for Lyman-alpha emitters (LAEs) toward the core and northern extension of the protocluster, which correspond to the brightest infrared regions in this field. Using a smoothed narrowband image extracted from the MUSE datacube around the protocluster redshift, we searched for possible extended structures. We identify only three LAEs at
Towards Machines that Trust: AI Agents Learn to Trust in the Trust Game
Ardavan S. Nobandegani
Thomas Shultz
Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theore… (voir plus)tical analysis of the
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team Google Rohan Anil
Sebastian Borgeaud
Yonghui Wu
Jean-Baptiste Alayrac
Jiahui Yu
Radu Soricut
J. Schalkwyk
Andrew M. Dai
Anja Hauth
Katie Millican
David Silver
Slav Petrov
Melvin Johnson
Ioannis Antonoglou
Julian Schrittwieser
Amelia Glaese
Jilin Chen
Emily Pitler
Timothy P. Lillicrap
Angeliki Lazaridou … (voir 480 de plus)
Orhan Firat
James L. Molloy
Michael Acheson Isard
Paul R. Barham
Tom Hennigan
Benjamin Lee
Malcolm Reynolds
Yuanzhong Xu
Ryan Doherty
Eli Collins
Clemens Meyer
Eliza Rutherford
Erica Moreira
Kareem W. Ayoub
Megha Goel
George Tucker
Enrique Piqueras
M. Krikun
Iain Barr
Nikolay Savinov
Ivo Danihelka
Becca Roelofs
Anais White
Anders Johan Andreassen
Tamara von Glehn
Lakshman N. Yagati
Mehran Kazemi
Lucas Gonzalez
Misha Khalman
Jakub Sygnowski
Alexandre Fréchette
Charlotte Smith
Laura Culp
Lev Proleev
Yi Luan
Xi Chen
James Lottes
Nathan Schucher
Federico Lebron
Alban Rrustemi
Natalie Clay
Phil Crone
Tomas Kocisky
Jeffrey Zhao
Bartek Perz
Dian Yu
Heidi Howard
Adam E. Bloniarz
Jack W. Rae
Han Lu
Laurent Sifre
Marcello Maggioni
Fred Alcober
Dan Garrette
Megan Barnes
Shantanu Thakoor
Jacob Austin
Gabriel Barth-Maron
William Wong
Rishabh Joshi
Rahma Chaabouni
Deeni Fatiha
Arun Ahuja
Ruibo Liu
Yunxuan Li
Sarah Cogan
Jeremy Chen
Chao Jia
Chenjie Gu
Qiao Zhang
Jordan Grimstad
Ale Jakse Hartman
Martin J. Chadwick
Gaurav Singh Tomar
Xavier Garcia
Evan Senter
Emanuel Taropa
Thanumalayan Sankaranarayana Pillai
Jacob Devlin
Michael Laskin
Diego de Las Casas
Dasha Valter
Connie Tao
Lorenzo Blanco
Adrià Puigdomènech Badia
David Reitter
Mianna Chen
Jenny Brennan
Clara E. Rivera
Sergey Brin
Shariq N Iqbal
Gabriela Surita
Jane Labanowski
Abhishek Rao
Stephanie Winkler
Emilio Parisotto
Yiming Gu
Kate Olszewska
Yujing Zhang
Ravichandra Addanki
Antoine Miech
Annie Louis
Laurent El Shafey
Denis Teplyashin
Geoff Brown
Elliot Catt
Nithya Attaluri
Jan Balaguer
Jackie Xiang
Pidong Wang
Zoe C. Ashwood
Anton Briukhov
Albert Webson
Sanjay Ganapathy
Smit Sanghavi
Ajay Kannan
Ming-Wei Chang
Axel Stjerngren
Josip Djolonga
Yuting Sun
Ankur Bapna
Matthew Aitchison
Pedram Pejman
Henryk Michalewski
Tianhe Yu
Cindy Wang
J Christopher Love
Junwhan Ahn
Dawn Bloxwich
Kehang Han
Peter Conway Humphreys
Thibault Sellam
James Bradbury
Varun Godbole
Sina Samangooei
Bogdan Damoc
Alex Kaskasoli
S'ebastien M. R. Arnold
Vijay Vasudevan
Shubham Agrawal
Jason Riesa
Dmitry Lepikhin
Richard Tanburn
Srivatsan Srinivasan
Hyeontaek Lim
Sarah Hodkinson
Pranav Shyam
Johan Ferret
Steven Hand
Ankush Garg
T. Paine
Jian Li
Yujia Li
Minh Giang
Alexander Neitz
Zaheer Abbas
Sarah York
Machel Reid
Elizabeth Cole
Aakanksha Chowdhery
Dipanjan Das
Dominika Rogozi'nska
Vitaly Nikolaev
Pablo G. Sprechmann
Zachary Nado
Lukáš Žilka
Flavien Prost
Luheng He
Marianne Monteiro
Gaurav Mishra
Christoper A. Welty
Joshua Newlan
Dawei Jia
Miltiadis Allamanis
Clara Huiyi Hu
Raoul de Liedekerke
Justin Gilmer
Carl Saroufim
Shruti Rijhwani
Shaobo Hou
Disha Shrivastava
Anirudh Baddepudi
Alex Goldin
Adnan Ozturel
Albin Cassirer
Yunhan Xu
Daniel Sohn
Devendra Singh Sachan
Reinald Kim Amplayo
Craig Swanson
Dessie Petrova
Shashi Narayan
Arthur Guez
Siddhartha Brahma
Jessica Landon
Miteyan Patel
Ruizhe Zhao
Kevin Villela
Luyu Wang
Wenhao Jia
Matthew Rahtz
Mai Gim'enez
Legg Yeung
Hanzhao Lin
James Keeling
Petko Georgiev
Diana Mincu
Boxi Wu
Salem Haykal
Rachel Saputro
Kiran N. Vodrahalli
James Qin
Zeynep Cankara
Abhanshu Sharma
Nicholas Fernando
Will Hawkins
Behnam Neyshabur
Solomon Kim
Adrian Hutter
Priyanka Agrawal
Alex Castro-Ros
George van den Driessche
Tao Wang
Fan Yang
Shuo-yiin Chang
Paul Komarek
Ross McIlroy
Mario Luvci'c
Guodong Zhang
Wael Farhan
Michael Sharman
Paul Natsev
Paul Michel
Yong Cheng
Yamini Bansal
Siyuan Qiao
Kris Cao
Siamak Shakeri
Christina Butterfield
Justin Chung
Paul Kishan Rubenstein
Shivani Agrawal
Arthur Mensch
Kedar Soparkar
Karel Lenc
Timothy Chung
Aedan Pope
Lorenzo Maggiore
Jackie Kay
Priya Jhakra
Shibo Wang
Joshua Maynez
Mary Phuong
Taylor Tobin
Andrea Tacchetti
Maja Trebacz
Kevin Robinson
Yash Katariya
Sebastian Riedel
Paige Bailey
Kefan Xiao
Nimesh Ghelani
Lora Aroyo
Ambrose Slone
Neil Houlsby
Xuehan Xiong
Zhen Yang
Elena Gribovskaya
Jonas Adler
Mateo Wirth
Lisa Lee
Music Li
Thais Kagohara
Jay Pavagadhi
Sophie Bridgers
Anna Bortsova
Sanjay Ghemawat
Zafarali Ahmed
Tianqi Liu
Richard Powell
Vijay Bolina
Mariko Iinuma
Polina Zablotskaia
James Besley
Da-Woon Chung
Timothy Dozat
Ramona Comanescu
Xiance Si
Jeremy Greer
Guolong Su
M. Polacek
Raphael Lopez Kaufman
Simon Tokumine
Hexiang Hu
Elena Buchatskaya
Yingjie Miao
Mohamed Elhawaty
Aditya Siddhant
Nenad Tomašev
Jinwei Xing
Christina Greer
Helen Miller
Shereen Ashraf
Aurko Roy
Zizhao Zhang
Ada Ma
Angelos Filos
Milos Besta
Rory Blevins
Ted Klimenko
Chih-Kuan Yeh
Soravit Changpinyo
Jiaqi Mu
Oscar Chang
Mantas Pajarskas
Carrie Muir
Vered Cohen
Charline Le Lan
Krishna S Haridasan
Amit Marathe
Steven Hansen
Sholto Douglas
Rajkumar Samuel
Mingqiu Wang
Sophia Austin
Chang Lan
Jiepu Jiang
Justin Chiu
Jaime Alonso Lorenzo
Lars Lowe Sjosund
S'ebastien Cevey
Zach Gleicher
Thi Avrahami
Anudhyan Boral
Hansa Srinivasan
Vittorio Selo
Rhys May
Konstantinos Aisopos
L'eonard Hussenot
Livio Baldini Soares
Kate Baumli
Michael B. Chang
Adria Recasens
Benjamin Caine
Alexander Pritzel
Filip Pavetic
Fabio Pardo
Anita Gergely
Justin Frye
Vinay Venkatesh Ramasesh
Dan Horgan
Kartikeya Badola
Nora Kassner
Subhrajit Roy
Ethan Dyer
V'ictor Campos
Alex Tomala
Yunhao Tang
Dalia El Badawy
Elspeth White
Basil Mustafa
Oran Lang
Abhishek Jindal
Sharad Mandyam Vikram
Zhitao Gong
Sergi Caelles
Ross Hemsley
Gregory Thornton
Fangxiaoyu Feng
Wojciech Stokowiec
Ce Zheng
Phoebe Thacker
cCauglar Unlu
Zhishuai Zhang
Mohammad Saleh
James Svensson
Maxwell L. Bileschi
Piyush Pramod Patil
Ankesh Anand
Roman Ring
Katerina Tsihlas
Arpi Vezer
Marco Selvi
Toby Shevlane
Mikel Rodriguez
Tom Kwiatkowski
Samira Daruki
Keran Rong
Allan Dafoe
Nicholas Fitzgerald
Keren Gu-Lemberg
Mina Khan
Lisa Anne Hendricks
Marie Pellat
Vladimir Feinberg
James Cobon-Kerr
Tara N. Sainath
Maribeth Rauh
Sayed Hadi Hashemi
Richard Ives
Yana Hasson
YaGuang Li
Eric Noland
Yuan Cao
Nathan Byrd
Le Hou
Qingze Wang
Thibault Sottiaux
Michela Paganini
Jean-Baptiste Lespiau
Alexandre Moufarek
Samer Hassan
Kaushik Shivakumar
Joost Van Amersfoort
Amol Mandhane
Pratik M. Joshi
Anirudh Goyal
Matthew Tung
Andy Brock
Hannah Rachel Sheahan
Vedant Misra
Cheng Li
Nemanja Raki'cevi'c
Mostafa Dehghani
Fangyu Liu
Sid Mittal
Junhyuk Oh
Seb Noury
Eren Sezener
Fantine Huot
Matthew Lamm
Nicola De Cao
Charlie Chen
Gamaleldin Elsayed
Ed Huai-hsin Chi
Mahdis Mahdieh
Ian F. Tenney
Nan Hua
Ivan Petrychenko
Patrick Kane
Dylan Scandinaro
Rishub Jain
Jonathan Uesato
Romina Datta
Adam Sadovsky
Oskar Bunyan
Dominik Rabiej
Shimu Wu
John Zhang
Gautam Vasudevan
Edouard Leurent
Mahmoud Alnahlawi
Ionut-Razvan Georgescu
Nan Wei
Ivy Zheng
Betty Chan
Pam G Rabinovitch
Piotr Stańczyk
Ye Zhang
David Steiner
Subhajit Naskar
Michael Azzam
Matthew Johnson
Adam Paszke
Chung-Cheng Chiu
Jaume Sanchez Elias
Afroz Mohiuddin
Faizan Muhammad
Jin Miao
Andrew Lee
Nino Vieillard
Sahitya Potluri
Jane Park
Elnaz Davoodi
Jiageng Zhang
Jeff Stanway
Drew Garmon
Abhijit Karmarkar
Zhe Dong
Studying the Practices of Testing Machine Learning Software in the Wild
Moses Openja
Armstrong Foundjem
Zhen Ming Jiang
Mouna Abidi
Ahmed E. Hassan
Background: We are witnessing an increasing adoption of machine learning (ML), especially deep learning (DL) algorithms in many software sys… (voir plus)tems, including safety-critical systems such as health care systems or autonomous driving vehicles. Ensuring the software quality of these systems is yet an open challenge for the research community, mainly due to the inductive nature of ML software systems. Traditionally, software systems were constructed deductively, by writing down the rules that govern the behavior of the system as program code. However, for ML software, these rules are inferred from training data. Few recent research advances in the quality assurance of ML systems have adapted different concepts from traditional software testing, such as mutation testing, to help improve the reliability of ML software systems. However, it is unclear if any of these proposed testing techniques from research are adopted in practice. There is little empirical evidence about the testing strategies of ML engineers. Aims: To fill this gap, we perform the first fine-grained empirical study on ML testing practices in the wild, to identify the ML properties being tested, the followed testing strategies, and their implementation throughout the ML workflow. Method: First, we systematically summarized the different testing strategies (e.g., Oracle Approximation), the tested ML properties (e.g., Correctness, Bias, and Fairness), and the testing methods (e.g., Unit test) from the literature. Then, we conducted a study to understand the practices of testing ML software. Results: In our findings: 1) we identified four (4) major categories of testing strategy including Grey-box, White-box, Black-box, and Heuristic-based techniques that are used by the ML engineers to find software bugs. 2) We identified 16 ML properties that are tested in the ML workflow.
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing
Shengchao Liu
Weili Nie
Chengpeng Wang
Jiarui Lu
Zhuoran Qiao
Ling Liu
Chaowei Xiao
Animashree Anandkumar
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize … (voir plus)the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
Addressing Sample Inefficiency in Multi-View Representation Learning
Kumar Krishna Agrawal
Arna Ghosh
Pseudo-random Instance Generators in C++ for Deterministic and Stochastic Multi-commodity Network Design Problems
Eric Larsen
Serge Bisaillon
Jean-François Cordeau
Network design problems constitute an important family of combinatorial optimization problems for which numerous exact and heuristic algorit… (voir plus)hms have been developed over the last few decades. Two central problems in this family are the multi-commodity, capacitated, fixed charge network design problem (MCFNDP) and its stochastic counterpart, the two-stage MCFNDP with recourse. These are standard problems that often serve as work benches for devising and testing models and algorithms in stylized but close-to-realistic settings. The purpose of this paper is to introduce two flexible, high-speed generators capable of simulating a wide range of settings for both the deterministic and stochastic MCFNDPs. We hope that, by facilitating systematic experimentation with new and larger sets of instances, these generators will lead to a more thorough assessment of the performance achieved by exact and heuristic solution methods in both deterministic and stochastic settings. We also hope that making these generators available will promote the reproducibility and comparability of published research.