Guillaume Desjardins

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Aleksandar Botev

Soham De

Samuel L. Smith

Anushan Fernando

George-Cristian Muraru

Ruba Haroun

Leonard Berrada

Razvan Pascanu

Pier Giuseppe Sessa

Robert Dadashi

L'eonard Hussenot

Johan Ferret

Sertan Girgin

Olivier Bachem

Alek Andreev

Kathleen Kenealy

Thomas Mesnard

Cassidy Hardin

Surya Bhupatiraju

Shreya Pathak … (see 43 more)

Laurent Sifre

Morgane Rivière

Mihir Kale

J Christopher Love

Juliette Love

Pouya Dehghani Tafti

Armand Joulin

Noah Fiedel

Evan Senter

Yutian Chen 0001

Srivatsan Srinivasan

Guillaume Desjardins

David Mark Budden

Arnaud Doucet

Sharad Mandyam Vikram

Adam Paszke

Trevor Gale

Sebastian Borgeaud

Charlie Chen

Andy Brock

Antonia Paterson

Jenny Brennan

Meg Risdal

Raj Gundluru

N. Devanathan

Paul Mooney

Nilay Chauhan

Phil Culliton

Luiz GUStavo Martins

Elisa Bandy

David W. Huntsperger

Glenn Cameron

Arthur Zucker

Tris Brian Warkentin

Ludovic Peran

Minh Giang

Zoubin Ghahramani

Clément Farabet

Koray Kavukcuoglu

Demis Hassabis

Raia Hadsell

Yee Whye Teh

Nando de Frietas

We introduce RecurrentGemma, a family of open language models which uses Google's novel Griffin architecture. Griffin combines linear recurr… (see more)ences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide two sizes of models, containing 2B and 9B parameters, and provide pre-trained and instruction tuned variants for both. Our models achieve comparable performance to similarly-sized Gemma baselines despite being trained on fewer tokens.

2024-04-10

ArXiv (preprint)

doi.org

arxiv.org

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Soham De

Samuel L. Smith

Anushan Fernando

Aleksandar Botev

George Cristian-Muraru

Albert Gu

Ruba Haroun

Leonard Berrada

Yutian Chen 0001

Srivatsan Srinivasan

Guillaume Desjardins

Arnaud Doucet

David Mark Budden

Yee Whye Teh

Razvan Pascanu

Nando de Freitas

Caglar Gulçehre

2024-02-28

ArXiv (preprint)

arxiv.org

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou

Guillaume Alain

Amjad Almahairi

Christof Angermueller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin Bayer

Anatoly Belikov

Alexander Belopolsky

Josh Bleecher Snyder

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

Pierre-Luc Carrier

Paul Christiano

Myriam Côté

Yann N. Dauphin

Julien Demouth

Sander Dieleman

Samira Ebrahimi Kahou

Ziye Fan

Mathieu Germain

Matt Graham

Balázs Hidasi

Arjun Jain

Kai Jia

Mikhail Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

Sean Lee

Simon Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

Jesse A. Livezey

Cory Lorenz

Jeremiah Lowin

Qianli Ma

Pierre-Antoine Manzagol

Robert T. McGibbon

Mehdi Mirza

Alberto Orlandi

Christopher Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew Rocklin

Adriana Romero

Markus Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John Schulman

Gabriel Schwartz

Iulian Vlad Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Ramana Subramanyam

Gijs van Tulder

Sebastian Urban

Dustin J. Webb

Matthew Willson

Lijun Xue

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2015-12-31

arXiv (preprint)

doi.org

arxiv.org

Theano: Deep Learning on GPUs with Python

Frédéric Bastien

Pascal Lamblin

Ian G Goodfellow

In this paper, we present Theano 1 , a framework in the Python programming language for defining, optimizing and evaluating expressions invo… (see more)lving high-level operations on tensors. Theano offers most of NumPy’s functionality, but adds automatic symbolic differentiation, GPU support, and faster expression evaluation. Theano is a general mathematical tool, but it was developed with the goal of facilitating research in deep learning. The Deep Learning Tutorials 2 introduce recent advances in deep learning, and showcase how Theano

2011-12-31

(published)

www.semanticscholar.org

Theano: A CPU and GPU Math Compiler in Python

James Bergstra

Olivier Breuleux

Frédéric Bastien

Pascal Lamblin

Theano is a compiler for mathematical expressions in Python that combines the convenience of NumPy's syntax with the speed of optimized nati… (see more)ve machine language. The user composes mathematical expressions in a high-level description that mimics NumPy's syntax and semantics, while being statically typed and functional (as opposed to imperative). These expressions allow Theano to provide symbolic differentiation. Before performing computation, Theano optimizes the choice of expressions, translates them into C++ (or CUDA for GPU), compiles them into dynamically loaded Python modules, all automatically. Common machine learn- ing algorithms implemented with Theano are from 1:6 to 7:5 faster than competitive alternatives (including those implemented with C/C++, NumPy/SciPy and MATLAB) when compiled for the CPU and between 6:5 and 44 faster when compiled for the GPU. This paper illustrates how to use Theano, outlines the scope of the compiler, provides benchmarks on both CPU and GPU processors, and explains its overall design.

2009-12-31

Proceedings of the Python in Science Conference (published)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Guillaume Desjardins

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Guillaume Desjardins

Publications