Joyce Chai

Independent visiting researcher

Supervisor

Siva Reddy

Co-supervisor

Yoshua Bengio

Research Topics

Agency

AGI (Artificial General Intelligence)

Conversational AI

Multimodal Learning

Natural Language Processing

Reasoning

Publications

S5 Framework: A Review of Self-Supervised Shared Semantic Space Optimization for Multimodal Zero-Shot Learning

Clst

Yonatan Bisk

Ari Holtzman

Jesse Thomason

Ja-740 cob

Yoshua Bengio

Joyce Chai

Angeliki Lapata

Jonathan Lazaridou

Alek-742 May

Nicolas sandr Nisnevich

P. PintoJoseph

Turian

Ting Chen

Simon Kornblith

Mohammad Norouzi

Yen-Chun Chen

Linjie Li

Licheng Yu

Ahmed El … (see 89 more)

Faisal Kholy

Zhe Ahmed

Yu Gan

Cheng

Zihan Dai

Hanxiao Liu

Quoc V. Le

Jia Deng

Wei Dong

Richard Socher

Li-Jia Li

K. Liu

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Jesse Dodge

Maarten Sap

Ana Marasovic

Gabriel Agnew

Dirk Ilharco

Groeneveld Matt

Li Dong

Nan Yang

Wenhui Wang

Furu Wei

Yu Liu

Jianfeng Wang

Ming Gao

Zhou

Xiaoyi Dong

Jia Bao

Ting Zhang

Dongdong

Weiming Chen

Lu Zhang

Dong Yuan

Fang Chen

Da-cheng Juan

Chuntian Lu

Zhen Li

Futang Peng

Aleksei Timofeev

Yi-Ting Chen

Yaxi Gao

Tom

Andrew Duerig

Tomkins Sujith

Ravi

Lukasz Kaiser

Aidan N. Gomez

Noam M. Shazeer

Niki Vaswani

Llion Parmar

Jones Jakob

Uszko-850

Alex G. Kendall

Yarin Gal

Roberto Cipolla

Salman H. Khan

Muzammal Naseer

Munawar Hayat

Waqas Zamir

Fahad Shahbaz

Khan

Ranjay Krishna

Yuke Zhu

Oliver Groth

Justin John-867

Kenji Hata

Joshua Kravitz

Stephanie Chen

Mike Lewis

Yinhan Liu

Marjan Naman Goyal

Abdelrahman Ghazvininejad

Omer Mohamed

Levy

Luke Zettlemoyer

Bohan Li

Hao Zhou

Jun-Tao He

Mingxuan Wang

Liunian Harold

Mark Li

Da Yatskar

Yin

Cho-Jui

Kai-Wei Chang

Visualbert

In this review, we aim to inspire research into 001 S elf-S upervised S hared S emantic S pace ( S5 ) 002 multimodal learning problems. We e… (see more)quip non-003 expert researchers with a framework of in-004 formed modeling decisions via an extensive 005 literature review, an actionable modeling check-006 list, as well as a series of novel zero-shot eval-007 uation tasks. The core idea for our S5 check-008 list lies in learning contextual multimodal in-009 teractions at various granularity levels via a 010 shared Transformer encoder with a denoising 011 loss term, which is also regularized by a con-012 trastive loss term to induce a semantic align-013 ment prior on the contextual embedding space. 014 Essentially, we aim to model human concept 015 understanding and thus learn to “put a name to 016 a face”. This ultimately enables interpretable 017 zero-shot S5 generalization on a variety of 018 novel downstream tasks. In summary, this re-019 view provides sufficient background and ac-020 tionable strategies for training cutting-edge S5 021 multimodal networks. 022

Seeing things or seeing scenes: Investigating the capabilities of V&L models to align scene descriptions to images

Matt D Anderson

Erich W Graf

James H Elder

Peter Anderson

Xiaodong He

Chris Buehler

Mark Teney

Stephen Johnson

Gould Lei

Emily M. Bender

Timnit Gebru

Angelina McMillan-575

Alexander Koller. 2020

Climb-582

Yonatan Bisk

Ari Holtzman

Jesse Thomason

Yoshua Bengio

Joyce Chai

Angeliki Lazaridou … (see 32 more)

Jonathan May

Aleksandr

Thomas Unterthiner

Mostafa Dehghani

Georg Minderer

Sylvain Heigold

Jakob Gelly

Uszkoreit Neil

Houlsby. 2020

Lisa Anne Hendricks

Gabriel Ilharco

Rowan Zellers

Ali Farhadi

John M. Henderson

Contextual

Thomas L. Grifﬁths. 2021

Are Convolutional

Neu-827

Melissa L.-H. Võ

Jeremy M. Wolfe

Differen-830

Jianfeng Wang

Xiaowei Hu

Xiu-834 Pengchuan Zhang

Roy Schwartz

Bolei Zhou

Àgata Lapedriza

Jianxiong Xiao

Hang Zhao

Xavier Puig

Sanja Fidler

Images can be described in terms of the objects 001 they contain, or in terms of the types of scene 002 or place that they instantiate. In t… (see more)his paper we 003 address to what extent pretrained Vision and 004 Language models can learn to align descrip-005 tions of both types with images. We com-006 pare 3 state-of-the-art models, VisualBERT, 007 LXMERT and CLIP. We ﬁnd that (i) V&L 008 models are susceptible to stylistic biases ac-009 quired during pretraining; (ii) only CLIP per-010 forms consistently well on both object-and 011 scene-level descriptions. A follow-up ablation 012 study shows that CLIP uses object-level infor-013 mation in the visual modality to align with 014 scene-level textual descriptions

Speed Science

Leading in a New Era

Supervision Requests

Joyce Chai

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Joyce Chai

Publications