Thomas Mesnard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Francesco Visin … (voir 190 de plus)

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Rishabh Agarwal

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Charline Le Lan

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jean Pouget-Abadie

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (voir plus). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-24

ArXiv (prépublication)

Nash Learning from Human Feedback

Remi Munos

Michal Valko

Daniele Calandriello

Mohammad Gheshlaghi Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Côme Fiegel

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM’s policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.

2024-04-30

ICML.cc/2024/Conference (spotlight)

proceedings.mlr.press

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Aleksandar Botev

Soham De

Samuel L. Smith

Anushan Fernando

George-Cristian Muraru

Ruba Haroun

Leonard Berrada

Razvan Pascanu

Pier Giuseppe Sessa

Robert Dadashi

L'eonard Hussenot

Johan Ferret

Sertan Girgin

Olivier Bachem

Alek Andreev

Kathleen Kenealy

Cassidy Hardin

Surya Bhupatiraju

Shreya Pathak … (voir 43 de plus)

Laurent Sifre

Morgane Rivière

Mihir Kale

J Christopher Love

Juliette Love

Pouya Dehghani Tafti

Armand Joulin

Noah Fiedel

Evan Senter

Yutian Chen 0001

Srivatsan Srinivasan

Guillaume Desjardins

David Mark Budden

Arnaud Doucet

Sharad Mandyam Vikram

Adam Paszke

Trevor Gale

Sebastian Borgeaud

Charlie Chen

Andy Brock

Antonia Paterson

Jenny Brennan

Meg Risdal

Raj Gundluru

N. Devanathan

Paul Mooney

Nilay Chauhan

Phil Culliton

Luiz GUStavo Martins

Elisa Bandy

David W. Huntsperger

Glenn Cameron

Arthur Zucker

Tris Brian Warkentin

Ludovic Peran

Minh Giang

Zoubin Ghahramani

Clément Farabet

Koray Kavukcuoglu

Demis Hassabis

Raia Hadsell

Yee Whye Teh

Nando de Frietas

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurr… (voir plus)ences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

2024-03-31

arXiv (publié)

Hindsight Credit Assignment

Anna Harutyunyan

Will Dabney

Mohammad Gheshlaghi Azar

Bilal Piot

Nicolas Heess

Hado van Hasselt

Greg Wayne

Satinder Singh

Doina Precup

Remi Munos

2018-12-31

Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (publié)

Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks

Gaetan Vignoud

João Sacramento

Walter Senn

Yoshua Bengio

2018-09-04

2018 Conference on Cognitive Computational Neuroscience (publié)

Generalization of Equilibrium Propagation to Vector Field Dynamics

The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons wo… (voir plus)uld need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent networks that addresses both these issues. In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism. Our learning method generalizes Equilibrium Propagation to vector field dynamics, relaxing the requirement of an energy function. As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function, but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights. We show experimentally that our algorithm optimizes the objective function.

2018-08-13

ArXiv (prépublication)