Portrait de Thomas Mesnard n'est pas disponible

Thomas Mesnard

Alumni

Publications

Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (voir plus). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Côme Fiegel
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM’s policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Aleksandar Botev
Soham De
Samuel L. Smith
Anushan Fernando
George-Cristian Muraru
Ruba Haroun
Leonard Berrada
Pier Giuseppe Sessa
Robert Dadashi
L'eonard Hussenot
Johan Ferret
Sertan Girgin
Olivier Bachem
Alek Andreev
Kathleen Kenealy
Cassidy Hardin
Surya Bhupatiraju
Shreya Pathak … (voir 43 de plus)
Laurent Sifre
Morgane Rivière
Mihir Kale
J Christopher Love
Juliette Love
Pouya Dehghani Tafti
Armand Joulin
Noah Fiedel
Evan Senter
Yutian Chen 0001
Srivatsan Srinivasan
David Mark Budden
Arnaud Doucet
Sharad Mandyam Vikram
Adam Paszke
Trevor Gale
Sebastian Borgeaud
Charlie Chen
Andy Brock
Antonia Paterson
Jenny Brennan
Meg Risdal
Raj Gundluru
N. Devanathan
Paul Mooney
Nilay Chauhan
Phil Culliton
Luiz GUStavo Martins
Elisa Bandy
David W. Huntsperger
Glenn Cameron
Arthur Zucker
Tris Brian Warkentin
Ludovic Peran
Minh Giang
Zoubin Ghahramani
Clément Farabet
Koray Kavukcuoglu
Demis Hassabis
Raia Hadsell
Yee Whye Teh
Nando de Frietas
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurr… (voir plus)ences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Hindsight Credit Assignment
Anna Harutyunyan
Will Dabney
Mohammad Gheshlaghi Azar
Bilal Piot
Nicolas Heess
Hado van Hasselt
Greg Wayne
Satinder Singh
Remi Munos
Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks
João Sacramento
Walter Senn
Generalization of Equilibrium Propagation to Vector Field Dynamics
The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons wo… (voir plus)uld need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent networks that addresses both these issues. In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism. Our learning method generalizes Equilibrium Propagation to vector field dynamics, relaxing the requirement of an energy function. As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function, but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights. We show experimentally that our algorithm optimizes the objective function.
Extending the Framework of Equilibrium Propagation to General Dynamics