Francesco Visin

Gemma 3 Technical Report

Gemma Team Aishwarya Kamath

Johan Ferret

Shreya Pathak

Nino Vieillard

Ramona Merhej

Sarah Perrin

Tatiana Matejovicova

Alexandre Ram'e

Morgane Rivière

Louis Rouillard

Thomas Mesnard

Geoffrey Cideron

Jean-Bastien Grill

Sabela Ramos

Edouard Yvinec

Michelle Casbon

Etienne Pot

Ivo Penchev

Gael Liu

Francesco Visin … (see 190 more)

Kathleen Kenealy

Lucas Beyer

Xiaohai Zhai

Anton Tsitsulin

Róbert Busa-Fekete

Alex Feng

Noveen Sachdeva

Benjamin Coleman

Yi Gao

Basil Mustafa

Iain Barr

Emilio Parisotto

David Tian

Matan Eyal

Colin Cherry

Jan-Thorsten Peter

Danila Sinopalnikov

Surya Bhupatiraju

Rishabh Agarwal

Mehran Kazemi

Dan Malkin

Ravin Kumar

David Vilar

Idan Brusilovsky

Jiaming Luo

Andreas Steiner

Abe Friesen

Abhanshu Sharma

Abheesht Sharma

Adi Mayrav Gilady

Adrian Goedeckemeyer

Alaa Saade

Alexander Kolesnikov

Alexei Bendebury

Alvin Abdagic

Amit Vadi

Andr'as Gyorgy

André Susano Pinto

Anil Das

Ankur Bapna

Antoine Miech

Antoine Yang

Antonia Paterson

Ashish Shenoy

Ayan Chakrabarti

Bilal Piot

Boxi Wu

Bobak Shahriari

Bryce Petrini

Charlie Chen

Charline Le Lan

Christopher A. Choquette-Choo

CJ Carey

Cormac Brick

Daniel Deutsch

Danielle Eisenbud

Dee Cattle

Derek Cheng

Dimitris Paparas

Divyashree Shivakumar Sreepathihalli

Doug Reid

Dustin Tran

Dustin Zelle

Eric Noland

Erwin Huizenga

Eugene Kharitonov

Frederick Liu

Gagik Amirkhanyan

Glenn Cameron

Hadi Hashemi

Hanna Klimczak-Pluci'nska

Harman Singh

Harsh Mehta

Harshal Tushar Lehri

Hussein Hazimeh

Ian Ballantyne

Idan Szpektor

Ivan Nardini

Jean Pouget-Abadie

Jetha Chan

Joe Stanton

J. Michael Wieting

Jonathan Lai

Jordi Orbay

Joe Fernandez

Joshua Newlan

Junsong Ji

Jyotinder Singh

Kat Black

Kathy Yu

Kevin Hui

Kiran N. Vodrahalli

Klaus Greff

Linhai Qiu

Marcella Valentine

Marina Coelho

Marvin Ritter

Matt Hoffman

Matthew Watson

Mayank Chaturvedi

Michael Moynihan

Min Ma

Nabila Babar

Natasha Noy

Nathan Byrd

Nick Roy

Nikola Momchev

Nilay Chauhan

Oskar Bunyan

Pankil Botarda

Paul Caron

Paul Kishan Rubenstein

Phil Culliton

Philipp Schmid

Pier Giuseppe Sessa

Pingmei Xu

Piotr Stańczyk

Pouya Dehghani Tafti

Rakesh Shivanna

Renjie Wu

Renke Pan

R. Rokni

Rob Willoughby

Rohith Vallu

Ryan Mullins

Sammy Jerome

Sara Smoot

Sertan Girgin

Shariq Iqbal

Shashir Reddy

Shruti Sheth

Siim Põder

Sijal Bhatnagar

S. Panyam

Sivan Eiger

Susan Zhang

Tianqi Liu

Trevor Yacovone

T. Liechty

Uday Kalra

Utku Evci

Vedant Misra

Vincent Roseberry

Vladimir Feinberg

Vlad Kolesnikov

Woohyun Han

Woosuk Kwon

X. T. Chen

Yinlam Chow

Yuvein Zhu

Zichuan Wei

Z. Egyed

Victor Cotruta

Minh Giang

Phoebe Kirk

Anand Rao

Jessica Lo

Erica Moreira

Luiz GUStavo Martins

Omar Sanseviero

Lucas Gonzalez

Zach Gleicher

Tris Brian Warkentin

Seyed Vahab Mirrokni

Evan Senter

Eli Collins

Joelle Barral

Zoubin Ghahramani

Raia Hadsell

Yossi Matias

D. Sculley

Slav Petrov

Noah Fiedel

Noam M. Shazeer

Oriol Vinyals

Jeffrey Dean

Demis Hassabis

Koray Kavukcuoglu

Clément Farabet

Elena Buchatskaya

Jean-Baptiste Alayrac

Rohan Anil

Dmitry Lepikhin

Sebastian Borgeaud

Olivier Bachem

Armand Joulin

Alek Andreev

Cassidy Hardin

Robert Dadashi

L'eonard Hussenot

We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters… (see more). This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.

2025-03-24

ArXiv (preprint)

doi.org

arxiv.org

PixelVAE: A Latent Variable Model for Natural Images

Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representatio… (see more)n and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64 × 64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

2016-12-31

ICLR (Poster) (published)

openreview.net

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

Francesco Visin

Adriana Romero

Kyunghyun Cho

Matteo Matteucci

Marco Ciccone

Kyle Kastner

Yoshua Bengio

Aaron Courville

We propose a structured prediction architecture, which exploits the local generic features extracted by Convolutional Neural Networks and th… (see more)e capacity of Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed architecture, called ReSeg, is based on the recently introduced ReNet model for image classification. We modify and extend it to perform the more challenging task of semantic segmentation. Each ReNet layer is composed of four RNN that sweep the image horizontally and vertically in both directions, encoding patches or activations, and providing relevant global information. Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features. Upsampling layers follow ReNet layers to recover the original image resolution in the final predictions. The proposed ReSeg architecture is efficient, flexible and suitable for a variety of semantic segmentation tasks. We evaluate ReSeg on several widely-used semantic segmentation datasets: Weizmann Horse, Oxford Flower, and CamVid; achieving state-of-the-art performance. Results show that ReSeg can act as a suitable architecture for semantic segmentation tasks, and may have further applications in other structured prediction problems. The source code and model hyperparameters are available on https://github.com/fvisin/reseg.

2016-06-30

2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

doi.org

arxiv.org

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou

Guillaume Alain

Amjad Almahairi

Christof Angermueller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin Bayer

Anatoly Belikov

Alexander Belopolsky

Josh Bleecher Snyder

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

Pierre-Luc Carrier

Paul Christiano

Myriam Côté

Yann N. Dauphin

Julien Demouth

Sander Dieleman

Samira Ebrahimi Kahou

Ziye Fan

Mathieu Germain

Matt Graham

Balázs Hidasi

Arjun Jain

Kai Jia

Mikhail Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

Sean Lee

Simon Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

Jesse A. Livezey

Cory Lorenz

Jeremiah Lowin

Qianli Ma

Pierre-Antoine Manzagol

Robert T. McGibbon

Mehdi Mirza

Alberto Orlandi

Christopher Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew Rocklin

Adriana Romero

Markus Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John Schulman

Gabriel Schwartz

Iulian Vlad Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Ramana Subramanyam

Gijs van Tulder

Sebastian Urban

Dustin J. Webb

Matthew Willson

Lijun Xue

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2015-12-31

arXiv (preprint)

doi.org

arxiv.org

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Francesco Visin

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Francesco Visin

Publications