Publications

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-25

ArXiv (preprint)

Gemma 3 Technical Report

Gemma Team Aishwarya Kamath

Johan Ferret

Shreya Pathak

Nino Vieillard

Ramona Merhej

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-25

ArXiv (preprint)

Gemma 3 Technical Report

Gemma Team Aishwarya Kamath

Johan Ferret

Shreya Pathak

Nino Vieillard

Ramona Merhej

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-25

ArXiv (preprint)

Gemma 3 Technical Report

Gemma Team Aishwarya Kamath

Johan Ferret

Shreya Pathak

Nino Vieillard

Ramona Merhej

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

2025-03-25

ArXiv (preprint)

Gemma 3 Technical Report

Gemma Team Aishwarya Kamath

Johan Ferret

Shreya Pathak

Nino Vieillard

Ramona Merhej

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-25

ArXiv (preprint)

MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation

Hyunwoo Kim

Itai Lang

Noam Aigerman

Thibault Groueix

Vladimir Kim

Rana Hanocka

We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each conce… (see more)pt is expressed. Conveniently, the concepts can be defined as either text queries, e.g.,"a dog"and"a turtle,"or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives. Our project page is at https://threedle.github.io/MeshUp.

2025-03-25

2025 International Conference on 3D Vision (3DV) (published)

A scalable gene network model of regulatory dynamics in single cells

Paul Bertin

Joseph D Viviano

Alejandro Tejada-Lapuerta

Weixu Wang

Stefan Bauer

Fabian J. Theis

Yoshua Bengio

2025-03-25

ArXiv (preprint)

A scalable gene network model of regulatory dynamics in single cells

Paul Bertin

Joseph D Viviano

Alejandro Tejada-Lapuerta

Weixu Wang

Stefan Bauer

Fabian J. Theis

Yoshua Bengio

2025-03-25

ArXiv (preprint)

Capturing Individual Human Preferences with Reward Features

Andr'e Barreto

Vincent Dumoulin

Yiran Mao

Nicolas Perez-Nieves

Bobak Shahriari

Yann Dauphin

Doina Precup

Hugo Larochelle

2025-03-21

ArXiv (preprint)

Capturing Individual Human Preferences with Reward Features

Andre Barreto

Vincent Dumoulin

Yiran Mao

Nicolas Perez-Nieves

Bobak Shahriari

Yann Dauphin

Doina Precup

Hugo Larochelle

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argu… (see more)e that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

2025-03-21

ArXiv (preprint)