Portrait of Ross Goroshin

Ross Goroshin

Core Industry Member
Adjunct professor, Université de Montréal, Department of Computer Science and Operations Research
Google DeepMind
Research Topics
Applied AI
Computer Vision
Deep Learning
Dynamical Systems
Representation Learning

Biography

Ross Goroshin is a Research Scientist at Google DeepMind, Montreal and Core Industry Member at Mila - Quebec Artificial Intelligence Institute. He holds a PhD in Computer Science from NYU, where he was advised by Yann LeCun. He also earned a B.Eng. in Electrical Engineering from Concordia University and an M.S. in Electrical Engineering from Georgia Tech. His research focuses on computer vision, self-supervised learning, and optimal control.

In addition to his roles at Google DeepMind and Mila, Ross serves as an adjunct professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal.

Publications

TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu Owen He
Ignacio Rocco
Mehdi S. M. Sajjadi
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu Owen He
Ignacio Rocco
Mehdi S. M. Sajjadi
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
Aravindh Mahendran
T. Keck
Joseph Heyward
Skanda Koppula
Etienne Pot
Goker Erdogan
Yana Hasson
Yi Yang
Klaus Greff
Guillaume Le Moing
Sjoerd van Steenkiste
Daniel Zoran
Drew A. Hudson
Pedro V'elez
Luisa F. Polan'ia … (see 15 more)
Luke Friedman
Chris Duvarney
Kelsey Allen
Jacob Walker
Rishabh Kabra
Eric Aboussouan
Jennifer Sun
Thomas Kipf
Carl Doersch
Viorica Puatruaucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations… (see more) on semantic-related tasks
TRecViT: A Recurrent Video Transformer
Viorica Puatruaucean
Xu Owen He
Joseph Heyward
Chuhan Zhang
Mehdi S. M. Sajjadi
George-Cristian Muraru
Artem Zholus
Mahdi Karami
Yutian Chen 0001
Simon Kayode Osindero
João Carreira
We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gate… (see more)d linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing over space, and MLPs over channels. The resulting architecture TRecViT performs well on sparse and dense tasks, trained in supervised or self-supervised regimes. Notably, our model is causal and outperforms or is on par with a pure attention model ViViT-L on large scale video datasets (SSv2, Kinetics400), while having
BootsTAP: Bootstrapped Training for Tracking-Any-Point
Carl Doersch
Yi Yang
Dilara Gokay
Pauline Luc
Skanda Koppula
Ankush Gupta
Joseph Heyward
Ignacio Rocco
João Carreira
Andrew Zisserman
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform… (see more) in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For visualizations, see our project webpage at https://bootstap.github.io/
Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping
Vishal Batchu
A. Wilson
Betty Peng
Carl D. Elkin
Umangi Jain
Christopher Van Arsdale
Varun Gulshan
The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by esti… (see more)mating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a Digital Surface Model (DSM) and roof instance segmentation from lower resolution and single oblique views using deep learning models. Our models, trained on aligned satellite and aerial datasets, produce 25cm DSMs and roof segments. With ~1m DSM MAE on buildings, ~5deg roof pitch error and ~56% IOU on roof segmentation, they significantly enhance the Solar API's potential to promote solar adoption.
Course Correcting Koopman Representations
Mahan Fathi
Clement Gehring
Jonathan Pilault
David Kanaa
Block-State Transformers
Jonathan Pilault
Mahan Fathi
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat
Block-State Transformers
Mahan Fathi
Jonathan Pilault
Orhan Firat