
SVRG meets AdaGrad: painless variance reduction
Benjamin Dubois-Taine
Sharan Vaswani
Reza Babanezhad Harikandeh
Mark Schmidt
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Teven Le Scao
Angela Fan
Christopher Akiki
Ellie Pavlick
Suzana Ili'c
Daniel Hesslow
Roman Castagn'e
Alexandra Luccioni
François Yvon
Matthias Gall'e
J. Tow
Alexander M. Rush
Stella Biderman
Albert Webson
Pawan Sasanka Ammanamanchi
Thomas Wang
Benoı̂t Sagot
Niklas Muennighoff
Albert Villanova del Moral
Olatunji Ruwase … (see 372 more)
Rachel Bawden
Stas Bekman
Angelina McMillan-Major
Iz Beltagy
Huu Nguyen
Lucile Saulnier
Samson Tan
Pedro Ortiz Suarez
Victor Sanh
Hugo Laurençon
Yacine Jernite
Julien Launay
Margaret Mitchell
Colin Raffel
Aaron Gokaslan
Adi Simhi
Aitor Soroa
Alham Fikri Aji
Amit Alfassy
Anna Rogers
Ariel Kreisberg Nitzav
Canwen Xu
Chenghao Mou
Chris Emezue
Christopher Klamm
Colin D. Leong
Daniel Van Strien
Dragomir R. Radev
Eduardo González Ponferrada
Efrat Levkovizh
Ethan Kim
Eyal Bar Natan
Francesco De Toni
Gérard Dupont
Germán Kruszewski
Giada Pistilli
Hady Elsahar
Hamza Benyamina
Hieu Tran
Ian W. Yu
Idris Abdulmumin
Isaac L. Johnson
Itziar Gonzalez-Dios
Javier de la Rosa
Jenny Chim
Jesse Dodge
Jian Zhu
Jonathan Chang
Jörg Frohberg
Josephine L. Tobing
J. Bhattacharjee
Khalid Almubarak
Kimbo Chen
Kyle Lo
Leandro Von Werra
Leon Weber
Long Phan
Loubna Ben allal
Ludovic Tanguy
Manan Dey
Manuel Romero Muñoz
Maraim Masoud
Mar'ia Grandury
Mario Šaško
Max Huang
Maximin Coavoux
Mayank Singh
Mike Tian-Jian Jiang
Vu Minh Chien
Mohammad Ali Jauhar
Mustafa Ghaleb
Nishant Subramani
Nora Kassner
Nurulaqilla Khamis
Olivier Nguyen
Omar Espejel
Ona de Gibert
Paulo Villegas
Peter Henderson
Pierre Colombo
Priscilla A. Amuok
Quentin Lhoest
Rheza Harliman
Rishi Bommasani
Roberto Luis L'opez
Rui Ribeiro
Salomey Osei
Sampo Pyysalo
Sebastian Nagel
Shamik Bose
Shamsuddeen Hassan Muhammad
Shanya Sharma Sharma
Shayne Longpre
Somaieh Nikpoor
S. Silberberg
Suhas Pai
Sydney Zink
Tiago Timponi Torrent
Timo Schick
Tristan Thrush
Valentin Danchev
Vassilina Nikoulina
Veronika Laippala
Violette Lepercq
Vrinda Prabhu
Zaid Alyafeai
Zeerak Talat
Arun Raja
Benjamin Heinzerling
Chenglei Si
Elizabeth E Salesky
Sabrina J. Mielke
Wilson Y. Lee
Abheesht Sharma
Andrea Santilli
Antoine Chaffin
Arnaud Stiegler
Debajyoti Datta
Eliza Szczechla
Gunjan Chhablani
Han Wang
Harshit Pandey
Hendrik. Strobelt
Jason Alan Fries
Jos Rozen
Leo Gao
Lintang A. Sutawika
M. Saiful Bari
Maged S. Al-shaibani
Matteo Manica
Nihal V. Nayak
Ryan Teehan
Samuel Albanie
Sheng Shen
Srulik Ben-David
Stephen H. Bach
Taewoon Kim
T. Bers
Thibault F'evry
Trishala Neeraj
Urmish Thakker
Vikas Raunak
Xiang Tang
Zheng-Xin Yong
Zhiqing Sun
Shaked Brody
Y. Uri
Hadar Tojarieh
Adam Roberts
Hyung Won Chung
Jaesung Tae
Jason Phang
Ofir Press
Conglong Li
D. Narayanan
Hatim Bourfoune
Jared Casper
Jeff Rasley
Max Ryabinin
Mayank Mishra
Minjia Zhang
Mohammad Shoeybi
Myriam Peyrounette
Nicolas Patry
Nouamane Tazi
Omar Sanseviero
Patrick von Platen
Pierre Cornette
Pierre Franccois Lavall'ee
R'emi Lacroix
Samyam Rajbhandari
Sanchit Gandhi
Shaden Smith
St'ephane Requena
Suraj Patil
Tim Dettmers
Ahmed Baruwa
Amanpreet Singh
Anastasia Cheveleva
Anne-Laure Ligozat
Arjun Subramonian
Aur'elie N'ev'eol
Charles Lovering
Dan Garrette
D. Tunuguntla
Ehud Reiter
Ekaterina Taktasheva
E. Voloshina
Eli Bogdanov
Genta Indra Winata
Hailey Schoelkopf
Jan-Christoph Kalo
Jekaterina Novikova
Jessica Zosa Forde
Xiangru Tang
Jungo Kasai
Ken Kawamura
Liam Hazan
Marine Carpuat
Miruna-adriana Clinciu
Najoung Kim
Newton Cheng
O. Serikov
Omer Antverg
Oskar van der Wal
Rui Zhang
Ruochen Zhang
Sebastian Gehrmann
Shachar Mirkin
S. Pais
Tatiana Shavrina
Thomas Scialom
Tian Yun
Tomasz Limisiewicz
Verena Teresa Rieser
Vitaly Protasov
V. Mikhailov
Yada Pruksachatkun
Yonatan Belinkov
Zachary Bamberger
Zdenvek Kasner
Zdeněk Kasner
A. Pestana
Amir Feizpour
Ammar Khan
Amy Faranak
A. Santos
Anthony Hevia
Antigona Unldreaj
Arash Aghagol
Arezoo Abdollahi
Aycha Tammour
Azadeh Hajihosseini
Bahareh Behroozi
Benjamin A. Ajibade
B. Saxena
Carlos Muñoz Ferrandis
Danish Contractor
D. Lansky
Davis David
Douwe Kiela
Duong Anh Nguyen
Edward Chwee Kheng. Tan
Emi Baylor
Ezinwanne Ozoani
F. Mirza
Frankline Ononiwu
Habib Rezanejad
H.A. Jones
Indrani Bhattacharya
Irene Solaiman
Irina Sedenko
Isar Nejadgholi
J. Passmore
Joshua Seltzer
Julio Bonis Sanz
Karen Fort
Livia Macedo Dutra
Mairon Samagaio
Maraim Elbadri
Margot Mieskes
Marissa Kumar Gerchick
Martha Akinlolu
Michael McKenna
Mike Qiu
M. Ghauri
Mykola Burynok
Nafis Abrar
Nazneen Fatema Rajani
Nour Elkott
N. Fahmy
Olanrewaju Samuel
Ran An
R. Kromann
Ryan Hao
Samira Hassan Alizadeh
Sarmad Shubber
Silas L. Wang
Sourav Roy
Sylvain Viguier
Thanh-Cong Le
Tobi Oyebade
T. Le
Yoyo Yang
Zach Nguyen
Abhinav R. Kashyap
Alfredo Palasciano
Alison Callahan
Anima Shukla
Antonio Miranda-Escalada
Ayush Kumar Singh
Benjamin Beilharz
Bo Wang
Caio Matheus Fonseca De Brito
Chenxi Zhou
Chirag Jain
Chuxin Xu
Cl'ementine Fourrier
Daniel Le'on Perin'an
Daniel Molano
Dian Yu
Enrique Manjavacas
Fabio Barth
Florian Fuhrimann
Gabriel Altay
Giyaseddin Bayrak
Gully Burns
Helena U. Vrabec
I. Bello
Isha Dash
J. Kang
John Michael Giorgi
Jonas Golde
J. Posada
Karthi Sivaraman
Lokesh Bulchandani
Lu Liu
Luisa Shinzato
Madeleine Hahn de Bykhovetz
Maiko Takeuchi
Marc Pamies
M. A. Castillo
Marianna Nezhurina
Mario Sanger
Matthias Samwald
Michael Joseph Cullan
Michael Weinberg
Michiel De Wolf
Mina Mihaljcic
Minna Liu
Moritz Freidank
Myungsun Kang
Natasha Seelam
Nathan Dahlberg
Nicholas Michio Broad
Nikolaus Muellner
Pascale Fung
Patricia Haller
Ramya Chandrasekhar
Patrick Haller
Renata Eisenberg
Robert Martin
Rodrigo Canalli
Rosaline Su
Ruisi Su
Samuel Cahyawijaya
Samuele Garda
Shlok S Deshmukh
Shubhanshu Mishra
Sid Kiblawi
Simon Ott
Sinee Sang-aroonsiri
Srishti Kumar
Stefan Schweter
Sushil Pratap Bharati
Tanmay Laud
Th'eo Gigant
Tomoya Kainuma
Wojciech Kusa
Yanis Labrak
Yashasvi Bajaj
Yash Venkatraman
Yifan Xu
Ying Xu
Yu Xu
Z. Tan
Zhongli Xie
Zifan Ye
Mathilde Le Bras
Younes Belkada
Thomas Wolf
Flaky Performances when Pretraining on Relational Databases
Shengchao Liu
David Vazquez
Pierre-Andre Noel
Knowledge Distillation for Federated Learning: a Practical Guide
Alessio Mora
Irene Tenison
Paolo Bellavista
Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. This paves th… (see more)e way for stronger privacy guarantees when building predictive models. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits: (i) Clients must implement the same model architecture; (ii) Transmitting model weights and model updates implies high communication cost, which scales up with the number of model parameters; (iii) In presence of non-IID data distributions, parameter-averaging aggregation schemes perform poorly due to client model drifts. Federated adaptations of regular Knowledge Distillation (KD) can solve and/or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we provide a review of KD-based algorithms tailored for specific FL issues.
A debriefing tool to acquire non-technical skills in trauma courses
Fabio Botelho
Jason M. Harley
Natalie Yanchar
Simone Abib
Ilana Bank
Multi-Head Adapter Routing for Cross-Task Generalization
Lucas Caccia
Edoardo Ponti
Zhan Su
Matheus Pereira
Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before f… (see more)ew-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (
PaReco: patched clones and missed patches among the divergent variants of a software family
Poedjadevie Kadjel Ramkisoen
John Businge
Brent van Bladel
Alexandre Decan
Serge Demeyer
Coen De Roover
Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. Howev… (see more)er, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.
Bayesian learning of Causal Structure and Mechanisms with GFlowNets and Variational Bayes
Mizu Nishikawa-Toomey
Tristan Deleu
Jithendaraa Subramanian
Bayesian causal structure learning aims to learn a posterior distribution over directed acyclic graphs (DAGs), and the mechanisms that defin… (see more)e the relationship between parent and child variables. By taking a Bayesian approach, it is possible to reason about the uncertainty of the causal model. The notion of modelling the uncertainty over models is particularly crucial for causal structure learning since the model could be unidentifiable when given only a finite amount of observational data. In this paper, we introduce a novel method to jointly learn the structure and mechanisms of the causal model using Variational Bayes, which we call Variational Bayes-DAG-GFlowNet (VBG). We extend the method of Bayesian causal structure learning using GFlowNets to learn not only the posterior distribution over the structure, but also the parameters of a linear-Gaussian model. Our results on simulated data suggest that VBG is competitive against several baselines in modelling the posterior over DAGs and mechanisms, while offering several advantages over existing methods, including the guarantee to sample acyclic graphs, and the flexibility to generalize to non-linear causal mechanisms.
Existing eHealth Solutions for Older Adults Living With Neurocognitive Disorders (Mild and Major) or Dementia and Their Informal Caregivers: Protocol for an Environmental Scan
Ambily Jose
Maxime Sasseville
Samantha Dequanter
Ellen Gorus
Anik Giguère
Anne Bourbonnais
Ronald Buyl
Marie-Pierre Gagnon
Background Dementia is one of the main public health priorities for current and future societies worldwide. Over the past years, eHealth sol… (see more)utions have added numerous promising solutions to enhance the health and wellness of people living with dementia-related cognitive problems and their primary caregivers. Previous studies have shown that an environmental scan identifies the knowledge-to-action gap meaningfully. This paper presents the protocol of an environmental scan to monitor the currently available eHealth solutions targeting dementia and other neurocognitive disorders against selected attributes. Objective This study aims to identify the characteristics of currently available eHealth solutions recommended for older adults with cognitive problems and their informal caregivers. To inform the recommendations regarding eHealth solutions for these people, it is important to obtain a comprehensive view of currently available technologies and document their outcomes and conditions of success. Methods We will perform an environmental scan of available eHealth solutions for older adults with cognitive impairment or dementia and their informal caregivers. Potential solutions will be initially identified from a previous systematic review. We will also conduct targeted searches for gray literature on Google and specialized websites covering the regions of Canada and Europe. Technological tools will be scanned based on a preformatted extraction grid. The relevance and efficiency based on the selected attributes will be assessed. Results We will prioritize relevant solutions based on the needs and preferences identified from a qualitative study among older adults with cognitive impairment or dementia and their informal caregivers. Conclusions This environmental scan will identify eHealth solutions that are currently available and scientifically appraised for older adults with cognitive impairment or dementia and their informal caregivers. This knowledge will inform the development of a decision support tool to assist older adults and their informal caregivers in their search for adequate eHealth solutions according to their needs and preferences based on trustable information. International Registered Report Identifier (IRRID) DERR1-10.2196/41015
Representational ethical model calibration
Robert Carruthers
Isabel Straw
James K. Ruffle
Daniel Herron
Amy Nelson
Delmiro Fernandez-Reyes
Geraint Rees
Parashkev Nachev
Spectral Regularization: an Inductive Bias for Sequence Modeling
Kaiwen Hou
Hou Rabusseau
Adult neurogenesis acts as a neural regularizer
Lina M. Tran
Adam Santoro
Lulu Liu
Sheena A. Josselyn
Paul W. Frankland