Publications

A self-attention-based CNN-Bi-LSTM model for accurate state-of-charge estimation of lithium-ion batteries

Zeinab Sherkatghanad

Amin Ghazanfari

Vladimir Makarenkov

2024-05-01

Journal of Energy Storage (published)

doi.org

SelfIE: Self-Interpretation of Large Language Model Embeddings

Haozhe Chen

Carl Vondrick

Chengzhi Mao

How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliabili… (see more)ty, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.

2024-05-01

ICML.cc/2024/Conference (poster)

doi.org

openreview.net

Stochastic positional embeddings improve masked image modeling

Amir Bar

Florian Bordes

Assaf Shocher

Mahmoud Assran

Pascal Vincent

Nicolas Ballas

Trevor Darrell

Amir Globerson

Yann LeCun

Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent… (see more) success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty into MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a Gaussian distribution. StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, StoP improves downstream MIM performance on a variety of downstream tasks, including

2024-05-01

ICML.cc/2024/Conference (poster)

openreview.net

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Jesse Farebrother

Jordi Orbay

Quan Vuong

Adrien Ali Taiga

Yevgen Chebotar

Ted Xiao

Alex Irpan

Sergey Levine

Pablo Samuel Castro

Aleksandra Faust

Aviral Kumar

Rishabh Agarwal

2024-05-01

ICML.cc/2024/Conference (oral)

doi.org

openreview.net

A Tensor Decomposition Perspective on Second-order RNNs

Maude Lizaire

Michael Rizvi-Martel

Marawan Gamal

Guillaume Rabusseau

2024-05-01

ICML.cc/2024/Conference (spotlight)

proceedings.mlr.press

openreview.net

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lu

Zdeněk Kasner

Siva Reddy

2024-05-01

ICML.cc/2024/Conference (spotlight)

doi.org

openreview.net

Does Generative AI speak Nigerian-Pidgin?: Issues about Representativeness and Bias for Multilingualism in LLMs

David Ifeoluwa Adelani

A. Seza Dougruoz

Iyanuoluwa Shode

Aremu Anuoluwapo

2024-04-30

ArXiv (preprint)

arxiv.org

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie

Polina Kirichenko

Mark Ibrahim

Mahmoud Assran

Andrew Gordon Wilson

Aaron Courville

Nicolas Ballas

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (see more)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

2024-04-30

ArXiv (preprint)

doi.org

arxiv.org

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie

Polina Kirichenko

Mark Ibrahim

Mahmoud Assran

Andrew Gordon Wilson

Aaron Courville

Nicolas Ballas

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (see more)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

2024-04-30

ArXiv (preprint)

doi.org

arxiv.org

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Dylan Green

William Harvey

Saeid Naderiparizi

Matthew Niedoba

Yunpeng Liu

Xiaoxuan Liang

Jonathan Wilder Lavington

Ke Zhang

Vasileios Lioutas

Setareh Dabiri

Adam Ścibior

Berend Zwartsenberg

Frank Wood

Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions… (see more) by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context.

2024-04-30

ArXiv (preprint)

doi.org

openreview.net

8-inch Wafer-scale Epitaxial Monolayer MoS2.

Hua Yu

Liangfeng Huang

Lanying Zhou

Yalin Peng

Xiuzhen Li

Peng Yin

Jiaojiao Zhao

M. Zhu

Shuopei Wang

Jieying Liu

Hongyue Du

Jian Tang

Songge Zhang

Yuchao Zhou

Nianpeng Lu

Kaihui Liu

Na Li

Guangyu Zhang

Large-scale, high-quality, and uniform monolayer MoS2 films are crucial for their applications in next-generation electronics and optoelectr… (see more)onics. Epitaxy is a mainstream technique for achieving high-quality MoS2 films and has been demonstrated at a wafer scale up to 4-inch. In this study, we report the epitaxial growth of 8-inch wafer-scale highly oriented monolayer MoS2 on sapphire with excellent spatial homogeneity, using a specially designed vertical chemical vapor deposition (VCVD) system. Field effect transistors (FETs) based on the as-grown 8-inch wafer-scale monolayer MoS2 film were fabricated and exhibited high performances, with an average mobility and an on/off ratio of 53.5 cm2V-1s-1 and 107, respectively. In addition, batch fabrication of logic devices and 11-stage ring oscillators were also demonstrated, showcasing excellent electrical functions. Our work may pave way of MoS2 in practical industry-scale applications. This article is protected by copyright. All rights reserved.

2024-04-29

Advances in Materials (published)

doi.org

8-inch Wafer-scale Epitaxial Monolayer MoS2.

Hua Yu

Liangfeng Huang

Lanying Zhou

Yalin Peng

Xiuzhen Li

Peng Yin

Jiaojiao Zhao

M. Zhu

Shuopei Wang

Jieying Liu

Hongyue Du

Jian Tang

Songge Zhang

Yuchao Zhou

Nianpeng Lu

Kaihui Liu

Na Li

Guangyu Zhang

2024-04-29

Advances in Materials (published)

doi.org

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Publications

AI Advantage

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

AI Advantage

Leveraging AI for a Sustainable Future

Popular keywords:

Publications