Publications

On the Privacy of Selection Mechanisms with Gaussian Noise
Jonathan Lebensold
Borja Balle
Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding … (voir plus)noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand, when instantiated using Gaussian noise, standard analyses only yield approximate DP guarantees despite the fact that the outputs of these mechanisms lie in a discrete space. In this work, we revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise and show that, under the additional assumption that the underlying queries are bounded, it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold. The resulting bounds are tight and depend on closed-form expressions that can be numerically evaluated using standard methods. Empirically we find these lead to tighter privacy accounting in the high privacy, low data regime. Further, we propose a simple privacy filter for composing pure ex-post DP guarantees, and use it to derive a fully adaptive Gaussian Sparse Vector Technique mechanism. Finally, we provide experiments on mobility and energy consumption datasets demonstrating that our Sparse Vector Technique is practically competitive with previous approaches and requires less hyper-parameter tuning.
Weight-Sharing Regularization
Mehran Shakerinava
Motahareh Sohrabi
Simon Lacoste-Julien
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G Parameswaran
Effects of gene dosage on cognitive ability: A function-based association study across brain and non-brain processes
Guillaume Huguet
Thomas Renne
Cécile Poulain
Alma Dubuc
Kuldeep Kumar
Sayeh Kazem
Worrawat Engchuan
Omar Shanta
Elise Douard
Catherine Proulx
Martineau Jean-Louis
Zohra Saci
Josephine Mollon
Laura Schultz
Emma E M Knowles
Simon R. Cox
David Porteous
Gail Davies
Paul Redmond
Sarah E. Harris … (voir 10 de plus)
Gunter Schumann
Aurélie Labbe
Zdenka Pausova
Tomas Paus
Stephen W Scherer
Jonathan Sebat
Laura Almasy
David C. Glahn
Sébastien Jacquemont
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar
Abulhair Saparov
Javier Rando
Daniel Paleka
Miles Turpin
Peter Hase
Ekdeep Singh Lubana
Erik Jenner
Stephen Casper
Oliver Sourbut
Benjamin L. Edelman
Zhaowei Zhang
Mario Gunther
Anton Korinek
Jose Hernandez-Orallo
Lewis Hammond
Eric J Bigelow
Alexander Pan
Lauro Langosco
Tomasz Korbak … (voir 18 de plus)
Heidi Zhang
Ruiqi Zhong
Sean 'o H'eigeartaigh
Gabriel Recchia
Giulio Corsi
Alan Chan
Markus Anderljung
Lilian Edwards
Danqi Chen
Samuel Albanie
Jakob Nicolaus Foerster
Florian Tramèr
He He
Atoosa Kasirzadeh
Yejin Choi
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are o… (voir plus)rganized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose
Government Interventions to Avert Future Catastrophic AI Risks
Improving microbial phylogeny with citizen science within a mass-market video game
Roman Sarrazin-Gendron
Parham Ghasemloo Gheidari
Alexander Butyaev
Timothy Keding
Eddie Cai
Jiayue Zheng
Renata Mutalova
Julien Mounthanyvong
Yuxue Zhu
Elena Nazarova
Chrisostomos Drogaris
Kornél Erhart
David Bélanger
Amélie Brouillette
Michael Bouffard
Gabriel Richard
Joshua Davidson
Randy Pitchford
Mathieu Falaise
Sébastien Caisse … (voir 14 de plus)
Vincent Fiset
Steven Hebert
Daniel McDonald
Dan Hewitt
Rob Knight
Jonathan Huot
Attila Szantner
Seung Kim
Jérôme Waldispühl
Jonathan Moreau-Genest
David Najjab
Steve Prince
Ludger Saintélien
CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
David Budaghyan
Charles Onu
Arsenii Gorin
Cem Subakan
This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is … (voir plus)a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 participants, 11 of whom improved the baseline performance. The top-performing system achieved a significant improvement scoring 25.8% equal error rate, which is still far from the performance of state-of-the-art adult speaker verification systems. Therefore, we believe there is room for further research on this dataset, potentially extending beyond the verification task.
Resource-Efficient Separation Transformer
Luca Della Libera
Cem Subakan
Samuele Cornell
Frédéric Lepoutre
François Grondin
Transformers have recently achieved state-of-the-art performance in speech separation. These models, however, are computationally demanding … (voir plus)and require a lot of learnable parameters. This paper explores Transformer-based speech separation with a reduced computational cost. Our main contribution is the development of the Resource-Efficient Separation Transformer (RE-SepFormer), a self-attention-based architecture that reduces the computational burden in two ways. First, it uses non-overlapping blocks in the latent space. Second, it operates on compact latent summaries calculated from each chunk. The RE-SepFormer reaches a competitive performance on the popular WSJ0-2Mix and WHAM! datasets in both causal and non-causal settings. Remarkably, it scales significantly better than the previous Transformer-based architectures in terms of memory and inference time, making it more suitable for processing long mixtures.
SKILL: Similarity-aware Knowledge distILLation for Speech Self-Supervised Learning
Luca Zampierin
Ghouthi Boukli Hacene
Bac Nguyen
Towards Practical Tool Usage for Continually Learning LLMs
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Sarath Chandar
Large language models (LLMs) show an innate skill for solving language based tasks. But insights have suggested an inability to adjust for i… (voir plus)nformation or task-solving skills becoming outdated, as their knowledge, stored directly within their parameters, remains static in time. Tool use helps by offloading work to systems that the LLM can access through an interface, but LLMs that use them still must adapt to nonstationary environments for prolonged use, as new tools can emerge and existing tools can change. Nevertheless, tools require less specialized knowledge, therefore we hypothesize they are better suited for continual learning (CL) as they rely less on parametric memory for solving tasks and instead focus on learning when to apply pre-defined tools. To verify this, we develop a synthetic benchmark and follow this by aggregating existing NLP tasks to form a more realistic testing scenario. While we demonstrate scaling model size is not a solution, regardless of tool usage, continual learning techniques can enable tool LLMs to both adapt faster while forgetting less, highlighting their potential as continual learners.
Why People Contribute Software Documentation
Deeksha M. Arya
Martin P. Robillard