Portrait of Thomas Mesnard is unavailable

Thomas Mesnard

Alumni

Publications

Gemma 3 Technical Report
Gemma Team Aishwarya Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
Ramona Merhej
Tatiana Matejovicova
Alexandre Ram'e
Morgane Rivière
Louis Rouillard
Geoffrey Cideron
Jean-Bastien Grill
Sabela Ramos
Edouard Yvinec
Michelle Casbon
Etienne Pot
Ivo Penchev
Gael Liu
Kathleen Kenealy
Lucas Beyer
Xiaohai Zhai
Anton Tsitsulin
Róbert Busa-Fekete
Alex Feng
Noveen Sachdeva
Benjamin Coleman
Yi Gao
Basil Mustafa
Iain Barr
Emilio Parisotto
David Tian
Matan Eyal
Colin Cherry
Jan-Thorsten Peter
Danila Sinopalnikov
Surya Bhupatiraju
Mehran Kazemi
Dan Malkin
Ravin Kumar
David Vilar
Idan Brusilovsky
Jiaming Luo
Andreas Steiner
Abe Friesen
Abhanshu Sharma
Abheesht Sharma
Adi Mayrav Gilady
Adrian Goedeckemeyer
Alaa Saade
Alexander Kolesnikov
Alexei Bendebury
Alvin Abdagic
Amit Vadi
Andr'as Gyorgy
André Susano Pinto
Anil Das
Ankur Bapna
Antoine Miech
Antoine Yang
Antonia Paterson
Ashish Shenoy
Ayan Chakrabarti
Bilal Piot
Boxi Wu
Bobak Shahriari
Bryce Petrini
Charlie Chen
Christopher A. Choquette-Choo
CJ Carey
Cormac Brick
Daniel Deutsch
Danielle Eisenbud
Dee Cattle
Derek Cheng
Dimitris Paparas
Divyashree Shivakumar Sreepathihalli
Doug Reid
Dustin Tran
Dustin Zelle
Eric Noland
Erwin Huizenga
Eugene Kharitonov
Frederick Liu
Gagik Amirkhanyan
Glenn Cameron
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harman Singh
Harsh Mehta
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
Jetha Chan
Joe Stanton
J. Michael Wieting
Jonathan Lai
Jordi Orbay
Joe Fernandez
Joshua Newlan
Junsong Ji
Jyotinder Singh
Kat Black
Kathy Yu
Kevin Hui
Kiran N. Vodrahalli
Klaus Greff
Linhai Qiu
Marcella Valentine
Marina Coelho
Marvin Ritter
Matt Hoffman
Matthew Watson
Mayank Chaturvedi
Michael Moynihan
Min Ma
Nabila Babar
Natasha Noy
Nathan Byrd
Nick Roy
Nikola Momchev
Nilay Chauhan
Oskar Bunyan
Pankil Botarda
Paul Caron
Paul Kishan Rubenstein
Phil Culliton
Philipp Schmid
Pier Giuseppe Sessa
Pingmei Xu
Piotr Stańczyk
Pouya Dehghani Tafti
Rakesh Shivanna
Renjie Wu
Renke Pan
R. Rokni
Rob Willoughby
Rohith Vallu
Ryan Mullins
Sammy Jerome
Sara Smoot
Sertan Girgin
Shariq Iqbal
Shashir Reddy
Shruti Sheth
Siim Põder
Sijal Bhatnagar
S. Panyam
Sivan Eiger
Susan Zhang
Tianqi Liu
Trevor Yacovone
T. Liechty
Uday Kalra
Utku Evci
Vedant Misra
Vincent Roseberry
Vladimir Feinberg
Vlad Kolesnikov
Woohyun Han
Woosuk Kwon
X. T. Chen
Yinlam Chow
Yuvein Zhu
Zichuan Wei
Z. Egyed
Victor Cotruta
Minh Giang
Phoebe Kirk
Anand Rao
Jessica Lo
Erica Moreira
Luiz GUStavo Martins
Omar Sanseviero
Lucas Gonzalez
Zach Gleicher
Tris Brian Warkentin
Seyed Vahab Mirrokni
Evan Senter
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
Yossi Matias
D. Sculley
Slav Petrov
Noah Fiedel
Noam M. Shazeer
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Jean-Baptiste Alayrac
Rohan Anil
Dmitry Lepikhin
Sebastian Borgeaud
Olivier Bachem
Armand Joulin
Alek Andreev
Cassidy Hardin
Robert Dadashi
L'eonard Hussenot
Evaluating Numeracy of Language Models as a Natural Language Inference Task
Rahmad Mahendra
Damiano Spina
Lawrence Cavedon
Karin Verspoor
Zhangir Azerbayev
Hailey Schoelkopf
Keiran Paster
Marco Dos Santos
Stephen Marcus McAleer
Al-bert Q. Jiang
Jia Deng
Stella Biderman
Sean Welleck. 2024
Llemma
Taylor Berg-Kirkpatrick
Daniel Spokoyny. 2020
Samuel R. Bowman
Gabor Angeli
Christopher Potts
Christopher D. Manning. 2015 … (see 480 more)
Tom Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
Rewon Child
Aditya Ramesh
Daniel M. Ziegler
Jeffrey Wu
Clemens Winter
Chris Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
Benjamin Chess
J. Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei. 2020
Samuel Cahyawijaya
Holy Lovenia
Alham Fikri Aji
Genta Indra Winata
Bryan Wilie
Fajri Koto
Christian Wibisono
Ade Romadhony
Karissa Vincentio
Jennifer Santoso
David Moel-jadi
Cahya Wirawan
Frederikus Hudi
Muham-mad Satrio Wicaksono
Ivan Halim Parmonangan
Ika Al-fina
Ilham Firdausi Putra
Samsul Rahmadani
Yulianti Oenang
Ali Akbar Septiandri
James Jaya
Kaustubh Dhole
Arie Suryani
Rifki Afina
Dan Putri
Keith Su
Made Nindyatama Stevens
Muhammad Nityasya
Ryan Adilazuarda
R. Hadiwijaya
Diandaru Tiezheng
Vito Yu
Wenliang Ghifari
Yan Dai
Xu Dyah
Haryo Damapuspita
Cuk Wibowo
Ich-wanul Tho
Karo Karo
T. Fatyanosa
Ziwei Ji
Graham Neubig
Timothy Baldwin
Zheng Cai
Maosong Cao
Haojiong Chen
Kai Chen
Keyu Chen
Xin Chen
Xun Chen
Ze-yu Chen
Zhi Chen
Pei Chu
Xiaoyi Dong
Haodong Duan
Qi Fan
Zhaoye Fei
Yan Gao
Jiaye Ge
Chenya Gu
Yuzhe Gu
Tao Gui
Aijia Guo
Qipeng Guo
Conghui He
Yingfan Hu
Ting Huang
T. Jiang
Penglong Jiao
Hongwei Liu
Jiangning Liu
Jiawei Hong
Kaiwen Liu
Kuikun Liu
Xiaoran Liu
Chen Lv
Haijun Lv
Kai Lv 0001
Li Ma
Runyuan Ma
Zerun Ma
Wenchang Ning
Linke Ouyang
Jiantao Qiu
Yuan Qu
Fukai Shang
Yunfan Shao
Hyung Won
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
William Fedus
Yunxuan Li
Xuezhi Wang
Mostafa Dehghani
Siddhartha Brahma
Alex Webson
Shixiang Shane
Zhuyun Gu
Menghua Dai
Xinyun Suzgun
Aakanksha Chen
Alex Chowdhery
Marie Castro-Ros
Kevin Pellat
Dasha Robinson
Sharan Valter
Gaurav Narang
Adams Mishra
Y. YuVincent
Yanping Zhao
Andrew Huang
Dai
Kevin Clark
Minh-Thang Luong
Quoc V. Le
Christopher D. Manning. 2020
Electra
Karl Cobbe
Vineet Kosaraju
Mo Bavarian
Heewoo Jun
Lukasz Kaiser
Matthias Plappert
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Xiao Bi
Deli Chen
Guanting Chen
Shanhuang Chen
Damai Dai
Cheng Deng
Honghui Ding
Kai Dong
Qiushi Du
Zhe Fu
Huazuo Gao
Kaige Gao
Wenjun Gao
Ruiqi Ge
Kang Guan
Daya Guo
Jianzhong Guo
Guangbo Hao
Zhewen Hao
Ying He
Panpan Wenjie Hu
Didem Foss
Dingkang Wang
Duc Le
Dustin Hol-land
Edward Dowling
Eissa Jamil
Elaine Mont-gomery
Eleonora Presani
Emily Hahn
Emily Wood
Erik Brinkman
Esteban Arcaute
Evan Dunbar
Evan Smothers
Fei Sun
Felix Kreuk
Feng Tian
Firat Ozgenel
Francesco Caggioni
F. Guzm’an
Frank J. Kanayet
Frank Seide
Gabriela Medina Florez
Gabriella Schwarz
Gada Badeer
Georgia Swee
Gil Halpern
G. Thattai
Grant Herman
G. Sizov
Guangyi Zhang
Guna Lakshmi-narayanan
Hamid Shojanazeri
Han Zou
Hannah Wang
Han Zha
Haroun Habeeb
Harrison Rudolph
Helen Suk
Henry Aspegren
Hunter Goldman
Igor Molybog
Igor Tufanov
Irina-Elena Veliche
Itai Gat
Jake Weissman
James Geboski
James Kohli
Japhet Asher
Jean-Baptiste Gaya
Jeff Marcus
Jeff Tang
Jennifer Chan
Jenny Zhen
Jeremy Reizen-stein
J. Teboul
Jessica Zhong
Jian Jin
Jingyi Yang
Joe Cummings
Jon Carvill
Jon Shepard
J. McPhie
Jonathan Torres
Josh Ginsburg
Junjie Wang
Kai Wu
U. KamHou
Karan Saxena
Karthik Prasad
Kartikay Khandelwal
Katayoun Zand
Kathy Matosich
Kaushik Veeraragha-van
Kelly Michelena
Keqian Li
Kun Huang
Kushal Chawla
Kushal Lakhotia
Kyle Huang
Lailin Chen
Lakshya Garg
A. Lavender
Leandro Silva
Lee Bell
Lei Zhang
Liangpeng Guo
Licheng Yu
Liron Moshkovich
Luca Wehrstedt
Madian Khabsa
Manav Avalani
Manish Bhatt
Maria Tsim-poukelli
Martynas Mankus
Matan Hasson
Matthias Lennie
Matthias Reso
Maxim Groshev
Maxim Naumov
Maya Lathi
Meghan Keneally
Michal Seltzer
Michal Valko
Michelle Restrepo
Mihir Patel
Mik Vyatskov
Mikayel Samvelyan
Mike Clark
Mike Macey
Mike Wang
Miquel Jubert
Mo Metanat
Mohammad Rastegari
Munish Bansal
Nandhini Santhanam
Natascha Parks
Natasha White
Navyata Bawa
Nayan Singhal
Nick Egebo
Nicolas Usunier
Nikolay Pavlovich
Laptev Ning
Ning Dong
Norman Zhang
Oleg Cheng
Olivia Chernoguz
Omkar Hart
Ozlem Salpekar
Parkin Kalinli
Parth Kent
Paul Parekh
Pa-van Saab
Pedro Balaji
Philip Rittner
Pierre Bontrager
Piotr Roux
Polina Dollár
P. Zvyagina
Pritish Yuvraj
Qian Liang
Rachad Alao
Rachel Rodriguez
Rafi Ayub
Raghotham Murthy
Raghu Nayani
Rahul Mitra
Raymond Li
Rebekkah Hogan
Robin Battey
Rocky Wang
Rohan Mah-eswari
Russell Howes
Ruty Rinott
Sai Jayesh
Bondu Samyak
Sara Datta
Sara Chugh
Sargun Hunt
Sasha Dhillon
Satadru Sidorov
Saurabh Pan
Verma Seiji
Sharadh Yamamoto
Shaun Ramaswamy
Sheng Lind-say
Sheng Feng
Shengxin Cindy Lin
Shiva Zha
Shuqiang Shankar
Sinong Zhang
Wang Sneha
Soji Agarwal
Soumith Sajuyigbe
Chintala Stephanie
Stephen Max
Steve Chen
Steve Kehoe
Sudarshan Satterfield
S. Govindaprasad
Gupta Sung-Bae
Sunny Cho
Suraj Virk
Subramanian Sy
Sy Choudhury
Tal Goldman
T. Remez
Tamara Glaser
Thilo Best
Thomas Kohler
Tianhe Robinson
Tianjun Li
Tim Zhang
Tim Matthews
Tzook Chou
Varun Shaked
Victoria Vontimitta
Victoria Ajayi
Vijai Montanez
Vinay Satish Mohan
Vishal Kumar
Vlad Mangla
Ionescu
Vlad Andrei
V. Poenaru
Vlad T. Mihailescu
Wei Ivanov
Wenchen Li
Wen-wen Wang
Wes Jiang
Bouaziz
Yilin Zhang
Yossi Adi
Youngjin Nam
Yu Wang
Yuchen Hao
Yundi Qian
Yuzi He
Zach Rait
Zachary DeVito
Zef Rosnbrick
Zhaoduo Wen
Zhenyu Yang
Zhiwei Zhao. 2024
The Llama
Gemma Team
Cassidy Hardin
Robert Dadashi
Surya Bhupatiraju
Shreya Pathak
L. Sifre
Morgane Rivière
Mihir Kale
Pouya Christo-pher Love
Dehghani Tafti
L'eonard Hussenot
Aakanksha Chowdhery
Adam Roberts
Aditya Barua
Alex Botev
Alex Castro-Ros
Ambrose Slone
Amélie Héliou
A. Tacchetti
Anna Bulanova
Antonia Paterson
Beth Tsai
Bobak Shahriari
Le Lan
Christopher A. Choquette-Choo
Clé-ment Crepy
Daniel Matthew Cer
Daphne Ippolito
David Reid
Elena Buchatskaya
Eric Ni
Eric Noland
Geng Yan
George Tucker
George-Christian Muraru
Grigory Rozhdestvenskiy
Henryk Michalewski
Ian Ten-ney
Ivan Grishchenko
Jacob Austin
James Keel-ing
Jane Labanowski
Jean-Baptiste Lespiau
Jeff Stanway
Jenny Brennan
Jeremy Chen
Johan Fer-ret
Justin Chiu
Justin Mao-jones
Kather-ine Lee
Kathy Yu
Katie Millican
Lars Lowe Sjoesund
Lisa Lee
Lucas Dixon
Machel Reid
Maciej Mikuła
Mateo Wirth
Michael Sharman
Nikolai Chinaev
Nithum Thain
Olivier Bachem
Oscar Chang
O. Wahltinez
Paige Bailey
Paul Michel
Petko Yotov Pier
Giuseppe Sessa
Rahma Chaabouni
Ramona Comanescu
Reena Jana
Rohan Anil
While recent advancements in large language models (LLMs) have enhanced their capabilities to solve mathematical problems, other aspects of … (see more)numeracy remain underexplored. In this paper, we propose a benchmark to evaluate the ability of language models to perform basic numeracy tasks. We frame numeracy as a Natural Language Inference (NLI) task to assess the models’ ability to understand both numbers and language contexts. We evaluate 49 language models (LMs), including fine-tuned LMs on NLI datasets, instruction-tuned LLMs, and specialized math-LLMs. Our findings reveal three main insights: (1) LLMs only clearly outperform smaller LMs in arithmetic tasks, indicating that mathematical reasoning cannot be generalized to other numeracy skills such as number comparison and normalization; (2) while most language models achieve fair to good accuracy for NLI entailment cases, they still struggle to predict contradiction and neutral cases; and (3) the robustness of language models’ numeracy capabilities needs improvement, particularly in understanding the semantics and pragmatics of numbers in linguistic contexts.
Generating Complex Question Decompositions in the Face of Distribution Shifts.
Kelvin Han
Claire Gardent
Marah Ihab Abdin
Jyoti Aneja
Hany Hassan Awadalla
Ammar Ahmed Awadallah
Ahmad Awan
Nguyen Bach
Amit Bahree
Arash Bakhtiari
Jianmin Bao
Harkirat Singh Behl
Alon Benhaim
Misha Bilenko
Johan Bjorck
Sébastien Bubeck
Martin Cai
Qin Cai
Vishrav Chaudhary
Dong Chen … (see 342 more)
Weizhu Chen
Yen-Chun Chen 0001
Yi-ling Chen
Hao Cheng
Parul Chopra
Xiyang Dai
Matthew Dixon
Ronen Eldan
Victor Fragoso
Jianfeng Gao
Mei Gao
Min Gao
Amit Garg
Allison Del Giorno
Abhishek Goswami
S. Gunasekar
Emman Haider
Jun-heng Hao
Russell J. Hewett
Wen-Wei Hu
Jamie Huynh
Dan Iter
Sam Ade Jacobs
Mojan Javaheripi
Xin Jin
Nikos Karampatziakis
Piero Kauffmann
Mahoud Khademi
Dongwoo Kim
Young Jin Kim
Lev Kurilenko
James R. Lee
Yin Tat Lee
Yuanzhi Li
Yunsheng Li
Chen Liang
Lars Lidén
Xihui
Zeqi Lin
Ce Lin
Liyuan Liu
Mengchen Liu
Liu Weishung
Xiaodong Liu
Chong Liu
Piyush Luo
Ali Madan
David Mahmoudzadeh
Matt Majercak
Caio Mazzola
César Teodoro
Arindam Mendes
Hardik Mitra
Anh Modi
Brandon Nguyen
Norick Barun
Daniel Patra
Thomas Perez-Becker
Portet Reid
Heyang Pryzant
Marko Qin
Liliang Radmilac
Gustavo Ren
Corby de Rosa
Sambudha Rosset
Roy Olatunji
Olli Ruwase
Amin Saarikivi
Adil Saied
Michael Salim
Shital Santacroce
Ning Shah
Shang Hiteshi
Yelong Sharma
Swadheen Shen
Xia Shukla
Masahiro Song
Andrea Tanaka
Praneetha Tupini
Michael Wu
Bin Wyatt
Can Xiao
Jiahang Xu
Weijiang Xu
Jilong Xu
Sonali Xue
Fan Yadav
Jianwei Yang
Yifan Yang
Ziyi Yang
Donghan Yang
Yu Lu
Chenruidong Yuan
Cyril Zhang
Jianwen Zhang
Zhang
Li Lyna
Yi Zhang
Yue Zhang
Yunan Zhang 0001
Zhang Xiren
Zhou
Phi-3
Priyanka Agrawal
Chris Alberti
Fantine Huot
Joshua Maynez
Ji Ma
Kuzman Ganchev
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
Hangyu Lin
Bharat Venkitesh
Madeline Smith
Jon Ander Campos
Yi Chern Tan
Kelly Marchisio
Max Bartolo
Sebastian Ruder
Acyr F. Locatelli
Nick Frosst
Aidan Gomez
Phil Blunsom
Marzieh Fadaee
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
Rewon Child
Aditya Ramesh
Daniel M. Ziegler
Jeffrey Wu
Clemens Winter
Chris Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
Benjamin Chess
J. Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei Gemma Team
Morgane Rivière
Shreya Pathak Pier
Giuseppe Sessa
Cassidy Hardin
Surya Bhupati-raju
L'eonard Hussenot
Bobak Shahriari
Alexandre Ramé
Johan Ferret
Peter Liu
Pouya Dehghani Tafti
Abe Friesen
Michelle Casbon
Sabela Ramos
Ravin Kumar
Sammy Jerome
Anton Tsitsulin
Nino Vieillard
Piotr Stańczyk
Sertan Girgin
Nikola Momchev
Matt Hoffman
Shantanu Thakoor
Jean-Bastien Grill
Behnam Neyshabur
Olivier Bachem
Alanna Wal-ton
Aliaksei Severyn
Alicia Parrish
Aliya Ah-mad
Allen Hutchison
Alvin Abdagic
Amanda Carl
Amy Shen
Andy Brock
Andy Coenen
Anthony Laforge
Antonia Paterson
Ben Bastian
Bilal Piot
Boxi Wu
Brandon Royal
Charlie Chen
Chintu Kumar
Chris Perry
Christoper A. Welty
Christopher A. Choquette-Choo
Danila Sinopalnikov
David Wein-berger
Dimple Vijaykumar
Dominika Rogozi´nska
D. Herbison
Elisa Bandy
Emma Wang
Eric Noland
Erica Moreira
Evan Senter
Evgenii Elty-shev
Gabriel Rasskin
Gary Wei
Glenn Cameron
Gus Martins
Hadi Hashemi
Hanna Klimczak-Pluci´nska
Harleen Batra
Harsh Dhand
Ivan Nardini
Jacinda Mein
Jack Zhou
James Svens-son
Jeff Stanway
Jetha Chan
J. Zhou
Joana Carrasqueira
Joana Iljazi
Jocelyn Becker
Joe Fer-nandez
Joost Van Amersfoort
Josh Gordon
Josh Lipschultz
Joshua Newlan
Junsong Ji
Kareem Mo-hamed
Kat Black
Katie Mil-lican
Keelin McDonell
Kelvin Nguyen
Kiranbir Sodhia
Kish Greene
Lars Lowe Sjoesund
Lauren Usui
Laurent Sifre
L. Heuermann
Leti-cia Lago
Lilly McNealus
Livio Baldini
Soares Logan
Lucas Kilpatrick
Luciano Dixon
Martins Machel
Manvinder Reid
Mark Singh
Martin Görner Iverson
Mateo Wirth Mat Velloso
Matt Davi-dow
Matt Miller
Matthew Rahtz
Matthew Watson
Meg Risdal
Mehran Kazemi
Michael Moynihan
Ming Zhang
Minsuk Kahng
Minwoo Park
Mofi Rahman
Mohit Khatwani
Natalie Dao
Nenshad Bardoliwalla
N. Devanathan
Neta Dumai
Nilay Chauhan
O. Wahltinez
Pankil Botarda
Parker Barnes
Paul R. Barham
Paul Michel
Peng-chong Jin
Petko Georgiev
Phil Culliton
Pradeep Kup-pala
Ramona Comanescu
Ramona Merhej
Reena Jana
R. Rokni
Ryan Mullins
Samaneh Saadat
S. M. Carthy
Sarah Cogan
S'ebastien M. R. Arnold
Se-bastian Krause
Shengyang Dai
S. Garg
Shruti Sheth
S. Ronstrom
Susan Chan
Timothy Jordan
Bing Yu
Tom Eccles
Tom Hennigan
Tomas Kocisky
Tulsee Doshi
Vihan Jain
Vikas Yadav
Vilobh Meshram
Vishal Dharmadhikari
Warren Barkley
Wei Wei
Wenming Ye
Woohyun Han
Woosuk Kwon
Xiang Xu
Zhe Shen
Zhitao Gong
Zichuan Wei
Victor Cotruta
Phoebe Kirk
Anand Rao
Minh Giang
Ludovic Peran
Tris Brian Warkentin
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
D. Sculley
Jeanine Banks
Anca Dragan
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
Surya Bhupatiraju
L'eonard Hussenot
Bobak Shahriari
Alexandre Ram'e
Johan Ferret
Peter Liu
Pouya Dehghani Tafti
Abe Friesen
Michelle Casbon
Sabela Ramos
Ravin Kumar
Sammy Jerome
Anton Tsitsulin
Nino Vieillard … (see 175 more)
Piotr Stańczyk
Sertan Girgin
Nikola Momchev
Matt Hoffman
Shantanu Thakoor
Jean-Bastien Grill
Behnam Neyshabur
Alanna Walton
Aliaksei Severyn
Alicia Parrish
Aliya Ahmad
Allen Hutchison
Alvin Abdagic
Amanda Carl
Amy Shen
Andy Brock
Andy Coenen
Anthony Laforge
Antonia Paterson
Ben Bastian
Bilal Piot
Boxi Wu
Brandon Royal
Charlie Chen
Chintu Kumar
Chris Perry
Christoper A. Welty
Christopher A. Choquette-Choo
Danila Sinopalnikov
David Weinberger
Dimple Vijaykumar
Dominika Rogozi'nska
D. Herbison
Elisa Bandy
Emma Wang
Eric Noland
Erica Moreira
Evan Senter
Evgenii Eltyshev
Gabriel Rasskin
Gary Wei
Glenn Cameron
Gus Martins
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harleen Batra
Harsh Dhand
Ivan Nardini
Jacinda Mein
Jack Zhou
James Svensson
Jeff Stanway
Jetha Chan
Jin Zhou
Joana Carrasqueira
Joana Iljazi
Jocelyn Becker
Joe Fernandez
Joost Van Amersfoort
Josh Gordon
Josh Lipschultz
Joshua Newlan
Junsong Ji
Kareem Mohamed
Kat Black
Katie Millican
Keelin McDonell
Kelvin Nguyen
Kiranbir Sodhia
Kish Greene
Lars Lowe Sjoesund
Lauren Usui
Laurent Sifre
L. Heuermann
Leti-cia Lago
Lilly McNealus
Livio Baldini Soares
Logan Kilpatrick
Lucas Dixon
Luciano Martins
Machel Reid
Manvinder Singh
Mark Iverson
Martin Gorner
Mat Velloso
Mateo Wirth
Matt Davidow
Matt Miller
Matthew Rahtz
Matthew Watson
Meg Risdal
Mehran Kazemi
Michael Moynihan
Ming Zhang
Minsuk Kahng
Minwoo Park
Mofi Rahman
Mohit Khatwani
Natalie Dao
Nenshad Bardoliwalla
N. Devanathan
Neta Dumai
Nilay Chauhan
O. Wahltinez
Pankil Botarda
Parker Barnes
Paul R. Barham
Paul Michel
Peng-chong Jin
Petko Georgiev
Phil Culliton
Pradeep Kuppala
Ramona Comanescu
Ramona Merhej
Reena Jana
R. Rokni
Ryan Mullins
Samaneh Saadat
S. M. Carthy
S'ebastien M. R. Arnold
Se-bastian Krause
Shengyang Dai
S. Garg
Shruti Sheth
S. Ronstrom
Susan Chan
Timothy Jordan
Bing Yu
Tom Eccles
Tom Hennigan
Tomas Kocisky
Tulsee Doshi
Vihan Jain
Vikas Yadav
Vilobh Meshram
Vishal Dharmadhikari
Warren Barkley
Wei Wei
Wenming Ye
Woohyun Han
Woosuk Kwon
Xiang Xu
Zhe Shen
Zhitao Gong
Zichuan Wei
Victor Cotruta
Phoebe Kirk
Anand Rao
Minh Giang
Ludovic Peran
Tris Brian Warkentin
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
D. Sculley
Jeanine Banks
Anca Dragan
Slav Petrov
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Sebastian Borgeaud
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2… (see more) billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
Surya Bhupatiraju
L'eonard Hussenot
Bobak Shahriari
Alexandre Ram'e
Johan Ferret
Peter Liu
Pouya Dehghani Tafti
Abe Friesen
Michelle Casbon
Sabela Ramos
Ravin Kumar
Sammy Jerome
Anton Tsitsulin
Nino Vieillard … (see 175 more)
Piotr Stańczyk
Sertan Girgin
Nikola Momchev
Matt Hoffman
Shantanu Thakoor
Jean-Bastien Grill
Behnam Neyshabur
Alanna Walton
Aliaksei Severyn
Alicia Parrish
Aliya Ahmad
Allen Hutchison
Alvin Abdagic
Amanda Carl
Amy Shen
Andy Brock
Andy Coenen
Anthony Laforge
Antonia Paterson
Ben Bastian
Bilal Piot
Boxi Wu
Brandon Royal
Charlie Chen
Chintu Kumar
Chris Perry
Christoper A. Welty
Christopher A. Choquette-Choo
Danila Sinopalnikov
David Weinberger
Dimple Vijaykumar
Dominika Rogozi'nska
D. Herbison
Elisa Bandy
Emma Wang
Eric Noland
Erica Moreira
Evan Senter
Evgenii Eltyshev
Gabriel Rasskin
Gary Wei
Glenn Cameron
Gus Martins
Hadi Hashemi
Hanna Klimczak-Pluci'nska
Harleen Batra
Harsh Dhand
Ivan Nardini
Jacinda Mein
Jack Zhou
James Svensson
Jeff Stanway
Jetha Chan
Jin Zhou
Joana Carrasqueira
Joana Iljazi
Jocelyn Becker
Joe Fernandez
Joost Van Amersfoort
Josh Gordon
Josh Lipschultz
Joshua Newlan
Junsong Ji
Kareem Mohamed
Kat Black
Katie Millican
Keelin McDonell
Kelvin Nguyen
Kiranbir Sodhia
Kish Greene
Lars Lowe Sjoesund
Lauren Usui
Laurent Sifre
L. Heuermann
Leti-cia Lago
Lilly McNealus
Livio Baldini Soares
Logan Kilpatrick
Lucas Dixon
Luciano Martins
Machel Reid
Manvinder Singh
Mark Iverson
Martin Gorner
Mat Velloso
Mateo Wirth
Matt Davidow
Matt Miller
Matthew Rahtz
Matthew Watson
Meg Risdal
Mehran Kazemi
Michael Moynihan
Ming Zhang
Minsuk Kahng
Minwoo Park
Mofi Rahman
Mohit Khatwani
Natalie Dao
Nenshad Bardoliwalla
N. Devanathan
Neta Dumai
Nilay Chauhan
O. Wahltinez
Pankil Botarda
Parker Barnes
Paul R. Barham
Paul Michel
Peng-chong Jin
Petko Georgiev
Phil Culliton
Pradeep Kuppala
Ramona Comanescu
Ramona Merhej
Reena Jana
R. Rokni
Ryan Mullins
Samaneh Saadat
S. M. Carthy
S'ebastien M. R. Arnold
Se-bastian Krause
Shengyang Dai
S. Garg
Shruti Sheth
S. Ronstrom
Susan Chan
Timothy Jordan
Ting Yu
Tom Eccles
Tom Hennigan
Tomas Kocisky
Tulsee Doshi
Vihan Jain
Vikas Yadav
Vilobh Meshram
Vishal Dharmadhikari
Warren Barkley
Wei Wei
Wenming Ye
Woohyun Han
Woosuk Kwon
Xiang Xu
Zhe Shen
Zhitao Gong
Zichuan Wei
Victor Cotruta
Phoebe Kirk
Anand Rao
Minh Giang
Ludovic Peran
Tris Brian Warkentin
Eli Collins
Joelle Barral
Zoubin Ghahramani
Raia Hadsell
D. Sculley
Jeanine Banks
Anca Dragan
Slav Petrov
Oriol Vinyals
Jeffrey Dean
Demis Hassabis
Koray Kavukcuoglu
Clément Farabet
Elena Buchatskaya
Sebastian Borgeaud
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2… (see more) billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Côme Fiegel
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM's policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Côme Fiegel
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM’s policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Aleksandar Botev
Soham De
Samuel L. Smith
Anushan Fernando
George-Cristian Muraru
Ruba Haroun
Leonard Berrada
Pier Giuseppe Sessa
Robert Dadashi
L'eonard Hussenot
Johan Ferret
Sertan Girgin
Olivier Bachem
Alek Andreev
Kathleen Kenealy
Cassidy Hardin
Surya Bhupatiraju
Shreya Pathak … (see 43 more)
Laurent Sifre
Morgane Rivière
Mihir Kale
J Christopher Love
Juliette Love
Pouya Dehghani Tafti
Armand Joulin
Noah Fiedel
Evan Senter
Yutian Chen 0001
Srivatsan Srinivasan
David Mark Budden
Arnaud Doucet
Sharad Mandyam Vikram
Adam Paszke
Trevor Gale
Sebastian Borgeaud
Charlie Chen
Andy Brock
Antonia Paterson
Jenny Brennan
Meg Risdal
Raj Gundluru
N. Devanathan
Paul Mooney
Nilay Chauhan
Phil Culliton
Luiz GUStavo Martins
Elisa Bandy
David W. Huntsperger
Glenn Cameron
Arthur Zucker
Tris Brian Warkentin
Ludovic Peran
Minh Giang
Zoubin Ghahramani
Clément Farabet
Koray Kavukcuoglu
Demis Hassabis
Raia Hadsell
Yee Whye Teh
Nando de Frietas
We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurr… (see more)ences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.
Nash Learning from Human Feedback
R'emi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.
Nash Learning from Human Feedback
Remi Munos
Michal Valko
Daniele Calandriello
Mohammad Gheshlaghi Azar
Mark Rowland
Zhaohan Daniel Guo
Yunhao Tang
Matthieu Geist
Andrea Michi
Marco Selvi
Sertan Girgin
Nikola Momchev
Olivier Bachem
Daniel J Mankowitz
Bilal Piot
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.