Portrait of Rishabh Agarwal

Rishabh Agarwal

Associate Industry Member
Adjunct Professor, McGill University, School of Computer Science
Google DeepMind
Research Topics
Deep Learning
Large Language Models (LLM)
Reinforcement Learning

Biography

I am a research scientist in the Google DeepMind Team in Montréal. I am also an Adjunct Professor at McGill University and an Associate Industry Member at Mila - Quebec Artificial Intelligence Institute. I finished my PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, I spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, I graduated in Computer Science and Engineering from IIT Bombay.

My research work mainly revolves around language models and deep reinforcement learning (RL), and includes an outstanding paper award at NeurIPS.

Current Students

PhD - Université de Montréal
Principal supervisor :

Publications

Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Sarah Perrin
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Thomas Mesnard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Francesco Visin … (see 190 more)
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Charline Le Lan
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jean Pouget-Abadie
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling wi… (see more)th a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is computationally inefficient. Inspired by classical deep RL literature, we propose separating generation and learning in RLHF. This enables asynchronous generation of new samples while simultaneously training on old samples, leading to faster training and more compute-optimal scaling. However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model. To understand the challenges in this regime, we investigate a fundamental question: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? Among several RLHF algorithms we tested, we find that online DPO is most robust to off-policy data, and robustness increases with the scale of the policy model. We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost, giving rise to a trade-off. Finally, we verify the scalability of asynchronous RLHF by training LLaMA 3.1 8B on an instruction-following task 40% faster than a synchronous run while matching final performance.
Faster, More Efficient RLHF through Off-Policy Asynchronous Learning
To achieve state-of-the-art chatbots, large language models are finetuned with reinforcement learning (RL), frequently to optimize human fee… (see more)dback (RLHF). This process is computationally expensive and can take weeks. Offline approaches, like DPO, learn on a static dataset and are efficient but not performant. The dominant paradigm, online and on-policy---synchronously generating from the model, labelling with a reward model, and learning on feedback from the model's own outputs---is performant but not efficient. Following prior work in the generall deep RL setting, we propose separating the actor and learner in RLHF. This enables the asynchronously generation of new samples while learning on prior samples, thus leading to overall faster training and better scaling. But this requires a novel regime for RLHF, online but off-policy: learning on samples from a previous version of our model. We ask a fundamental question: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? We find that a contrastive loss, Online DPO, is most robust to off-policy data and that robustness increases with the scale of the policy model. We show even further compute optimizations but demonstrate that they come at a performance cost, giving rise to a trade-off. Finally, we verify our design choices by training LLaMA 3.1 8B with RLHF as a helpful chatbot in half the time of a synchronous run while matching final performance.
Generating Complex Question Decompositions in the Face of Distribution Shifts.
Kelvin Han
Claire Gardent
Marah Ihab Abdin
Jyoti Aneja
Hany Hassan Awadalla
Ammar Ahmed Awadallah
Ahmad Awan
Nguyen Bach
Amit Bahree
Arash Bakhtiari
Jianmin Bao
Harkirat Singh Behl
Alon Benhaim
Misha Bilenko
Johan Bjorck
Sébastien Bubeck
Martin Cai
Qin Cai
Vishrav Chaudhary
Dong Chen … (see 342 more)
Weizhu Chen
Yen-Chun Chen 0001
Yi-ling Chen
Hao Cheng
Parul Chopra
Xiyang Dai
Matthew Dixon
Ronen Eldan
Victor Fragoso
Jianfeng Gao
Mei Gao
Min Gao
Amit Garg
Allison Del Giorno
Abhishek Goswami
S. Gunasekar
Emman Haider
Jun-heng Hao
Russell J. Hewett
Wen-Wei Hu
Jamie Huynh
Dan Iter
Sam Ade Jacobs
Mojan Javaheripi
Xin Jin
Nikos Karampatziakis
Piero Kauffmann
Mahoud Khademi
Dongwoo Kim
Young Jin Kim
Lev Kurilenko
James R. Lee
Yin Tat Lee
Yuanzhi Li
Yunsheng Li
Chen Liang
Lars Lidén
Xihui
Zeqi Lin
Ce Lin
Liyuan Liu
Mengchen Liu
Liu Weishung
Xiaodong Liu
Chong Liu
Piyush Luo
Ali Madan
David Mahmoudzadeh
Matt Majercak
Caio Mazzola
César Teodoro
Arindam Mendes
Hardik Mitra
Anh Modi
Brandon Nguyen
Norick Barun
Daniel Patra
Thomas Perez-Becker
Portet Reid
Heyang Pryzant
Marko Qin
Liliang Radmilac
Gustavo Ren
Corby de Rosa
Sambudha Rosset
Roy Olatunji
Olli Ruwase
Amin Saarikivi
Adil Saied
Michael Salim
Shital Santacroce
Ning Shah
Shang Hiteshi
Yelong Sharma
Swadheen Shen
Xia Shukla
Masahiro Song
Andrea Tanaka
Praneetha Tupini
Michael Wu
Bin Wyatt
Can Xiao
Jiahang Xu
Weijiang Xu
Jilong Xu
Sonali Xue
Fan Yadav
Jianwei Yang
Yifan Yang
Ziyi Yang
Donghan Yang
Yu Lu
Chenruidong Yuan
Cyril Zhang
Jianwen Zhang
Zhang
Li Lyna
Yi Zhang
Yue Zhang
Yunan Zhang 0001
Zhang Xiren
Zhou
Phi-3
Priyanka Agrawal
Chris Alberti
Fantine Huot
Joshua Maynez
Ji Ma
Kuzman Ganchev
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
Hangyu Lin
Bharat Venkitesh
Madeline Smith
Jon Ander Campos
Yi Chern Tan
Kelly Marchisio
Max Bartolo
Sebastian Ruder
Acyr F. Locatelli
Julia Kreutzer
Nick Frosst
Aidan Gomez
Phil Blunsom
Marzieh Fadaee
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
Rewon Child
Aditya Ramesh
Daniel M. Ziegler
Jeffrey Wu
Clemens Winter
Chris Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
Benjamin Chess
J. Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei Gemma Team
Morgane Rivière
Shreya Pathak Pier
Giuseppe Sessa
Cassidy Hardin
Surya Bhupati-raju
L'eonard Hussenot
Thomas Mesnard
Bobak Shahriari
Alexandre Ramé
Johan Ferret
Peter Liu
Pouya Dehghani Tafti
Abe Friesen
Michelle Casbon
Sabela Ramos
Ravin Kumar
Charline Le Lan
Sammy Jerome
Anton Tsitsulin
Nino Vieillard
Piotr Stańczyk
Sertan Girgin
Nikola Momchev
Matt Hoffman
Shantanu Thakoor
Jean-Bastien Grill
Behnam Neyshabur
Olivier Bachem
Alanna Wal-ton
Aliaksei Severyn
Alicia Parrish
Aliya Ah-mad
Allen Hutchison
Alvin Abdagic
Amanda Carl
Amy Shen
Andy Brock
Andy Coenen
Anthony Laforge
Antonia Paterson
Ben Bastian
Bilal Piot
Boxi Wu
Brandon Royal
Charlie Chen
Chintu Kumar
Chris Perry
Christoper A. Welty
Christopher A. Choquette-Choo
Danila Sinopalnikov
David Wein-berger
Dimple Vijaykumar
Dominika Rogozi´nska
D. Herbison
Elisa Bandy
Emma Wang
Eric Noland
Erica Moreira
Evan Senter
Evgenii Elty-shev
Francesco Visin
Gabriel Rasskin
Gary Wei
Glenn Cameron
Gus Martins
Hadi Hashemi
Hanna Klimczak-Pluci´nska
Harleen Batra
Harsh Dhand
Ivan Nardini
Jacinda Mein
Jack Zhou
James Svens-son
Jeff Stanway
Jetha Chan
J. Zhou
Joana Carrasqueira
Joana Iljazi
Jocelyn Becker
Joe Fer-nandez
Joost Van Amersfoort
Josh Gordon
Josh Lipschultz
Joshua Newlan
Junsong Ji
Kareem Mo-hamed
Kartikeya Badola
Kat Black
Katie Mil-lican
Keelin McDonell
Kelvin Nguyen
Kiranbir Sodhia
Kish Greene
Lars Lowe Sjoesund
Lauren Usui
Laurent Sifre
L. Heuermann
Leti-cia Lago
Lilly McNealus
Livio Baldini
Soares Logan
Lucas Kilpatrick
Luciano Dixon
Martins Machel
Manvinder Reid
Mark Singh
Martin Görner Iverson
Mateo Wirth Mat Velloso
Matt Davi-dow
Matt Miller
Matthew Rahtz
Matthew Watson
Meg Risdal
Mehran Kazemi
Michael Moynihan
Ming Zhang
Minsuk Kahng
Minwoo Park
Mofi Rahman
Mohit Khatwani
Natalie Dao
Nenshad Bardoliwalla
N. Devanathan
Neta Dumai
Nilay Chauhan
O. Wahltinez
Pankil Botarda
Parker Barnes
Paul R. Barham
Paul Michel
Peng-chong Jin
Petko Georgiev
Phil Culliton
Pradeep Kup-pala
Ramona Comanescu
Ramona Merhej
Reena Jana
R. Rokni
Ryan Mullins
Samaneh Saadat
S. M. Carthy
Sarah Cogan
Sarah Perrin
S'ebastien M. R. Arnold
Se-bastian Krause
Shengyang Dai
S. Garg
Shruti Sheth
S. Ronstrom
Susan Chan
Timothy Jordan
Bing Yu
Tom Eccles
Tom Hennigan
Tomas Kocisky
Tulsee Doshi
Vihan Jain
Vikas Yadav
Vilobh Meshram
Vishal Dharmadhikari
Warren Barkley
Wei Wei
Wenming Ye
Woohyun Han
Woosuk Kwon
Xiang Xu
Zhe Shen
Zhitao Gong
Zichuan Wei
Victor Cotruta
Phoebe Kirk
Anand Rao
Minh Giang
Ludovic Peran
Tris Brian Warkentin
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
D. Sculley
Jeanine Banks
Anca Dragan