Portrait de Chengzhi Mao

Chengzhi Mao

Membre académique principal
Professeur adjoint, McGill University, Département de génie électrique et informatique

Biographie

Chengzhi Mao est professeur adjoint au Département de génie électrique et informatique de l'Université McGill et membre académique principal de Mila – Institut québécois d'intelligence artificielle. Il a réalisé son doctorat au Département d'informatique de l'Université Columbia, sous la direction des professeurs Carl Vondrick et Junfeng Yang. Auparavant, il a obtenu un baccalauréat en génie électrique de l'Université Tsinghua. Ses recherches portent sur l'apprentissage automatique et la vision par ordinateur. En entraînant des machines à intégrer le contexte par l'interaction et le raisonnement, il vise à construire des modèles robustes capables de généralisation. Ses autres intérêts incluent le langage, la robotique, l'inférence causale et l'intelligence artificielle centrée sur l'humain.

Publications

SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen
Carl Vondrick
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliabili… (voir plus)ty, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.
Towards Causal Deep Learning for Vulnerability Detection
Md Mahbubur Rahman
Ira Ceka
Saikat Chakraborty
Baishakhi Ray
Wei Le
Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from… (voir plus) being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.
INViTE: INterpret and Control Vision-Language Models with Text Explanations
Haozhe Chen
Junfeng Yang
Carl Vondrick
Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks. However, due to the… (voir plus)ir black-box nature, understanding the underlying rules behind these models’ predictions and controlling model behaviors have remained open challenges. We present INViTE: a framework for INterpreting Vision Transformer’s latent tokens with Text Explanations. Given a latent token, INViTE retains its semantic information to the final layer using transformer’s local operations and retrieves the closest text for explanation. INViTE enables understanding of model visual reasoning procedure without needing additional model training or data collection. Based on the obtained interpretations, INViTE allows for model editing that controls model reasoning behaviors and improves model robustness against biases and spurious correlations. Our code is available at https://github.com/tonychenxyz/vit-interpret.
Raidar: geneRative AI Detection viA Rewriting
Carl Vondrick
Hao Wang
Junfeng Yang
We find that large language models (LLMs) are more likely to modify human-written text than AI-generated text when tasked with rewriting. Th… (voir plus)is tendency arises because LLMs often perceive AI-generated text as high-quality, leading to fewer modifications. We introduce a method to detect AI-generated content by prompting LLMs to rewrite text and calculating the editing distance of the output. We dubbed our geneRative AI Detection viA Rewriting method Raidar. Raidar significantly improves the F1 detection scores of existing AI content detection models -- both academic and commercial -- across various domains, including News, creative writing, student essays, code, Yelp reviews, and arXiv papers, with gains of up to 29 points. Operating solely on word symbols without high-dimensional features, our method is compatible with black box LLMs, and is inherently robust on new content. Our results illustrate the unique imprint of machine-generated text through the lens of the machines themselves.
Interpreting and Controlling Vision Foundation Models via Text Explanations
Haozhe Chen
Junfeng Yang
Carl Vondrick
Robust Perception through Equivariance
Lingyu Zhang
Abhishek Vaibhav Joshi
Junfeng Yang
Hao Wang
Carl Vondrick
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Revant Teotia
Amrutha Sundar
Sachit Menon
Junfeng Yang
Xin Wang
Carl Vondrick
Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In th… (voir plus)is paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a “doubly right” object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a “why prompt,” which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on doubly right object recognition, in addition to zero-shot transfer to unseen tasks and datasets.
Robust Perception through Equivariance
Lingyu Zhang
Abhishek Vaibhav Joshi
Junfeng Yang
Hao Wang
Carl Vondrick
Robustifying Language Models with Test-Time Adaptation
Noah Thomas McDermott
Junfeng Yang
Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial languag… (voir plus)e examples, which are sentences optimized to fool the language models but with similar semantic meanings for humans. While prior work focuses on making the language model robust at training time, retraining for robustness is often unrealistic for large-scale foundation models. Instead, we propose to make the language models robust at test time. By dynamically adapting the input sentence with predictions from masked words, we show that we can reverse many language adversarial attacks. Since our approach does not require any training, it works for novel tasks at test time and can adapt to novel adversarial corruptions. Visualizations and empirical results on two popular sentence classification datasets demonstrate that our method can repair adversarial language attacks over 65% o
Understanding Zero-shot Adversarial Robustness for Large-Scale Models
Scott Geng
Junfeng Yang
Xin Wang
Carl Vondrick
Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversaria… (voir plus)l perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of adapting large-scale models for zero-shot adversarial robustness. We first identify two key factors during model adaption--training losses and adaptation methods--that affect the model's zero-shot adversarial robustness. We then propose a text-guided contrastive adversarial training loss, which aligns the text embeddings and the adversarial visual features with contrastive learning on a small set of training data. We apply this training loss to two adaption methods, model finetuning and visual prompt tuning. We find that visual prompt tuning is more effective in the absence of texts, while finetuning wins in the existence of text guidance. Overall, our approach significantly improves the zero-shot adversarial robustness over CLIP, seeing an average improvement of 31 points over ImageNet and 15 zero-shot datasets. We hope this work can shed light on understanding the zero-shot adversarial robustness of large-scale models.
Test-time Defense against Adversarial Attacks: Detection and Reconstruction of Adversarial Examples via Masked Autoencoder
Yun-Yun Tsai
Ju-Chin Chao
Albert Wen
Zhaoyuan Yang
Tapan Shah
Junfeng Yang
Existing defense methods against adversarial attacks can be categorized into training time and test time defenses. Training time defense, i.… (voir plus)e., adversarial training, requires a significant amount of extra time for training and is often not able to be generalized to unseen attacks. On the other hand, test time defense by test time weight adaptation requires access to perform gradient descent on (part of) the model weights, which could be infeasible for models with frozen weights. To address these challenges, we propose DRAM, a novel defense method to Detect and Reconstruct the multiple types of Adversarial attacks via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a KS-test to detect adversarial attacks. Moreover, the MAE losses can be used to repair adversarial samples from unseen attack types. In this sense, DRAM neither requires model weight updates in test time nor augments the training set with more adversarial samples. Evaluating DRAM on the large-scale ImageNet data, we achieve the best detection rate of 82% on average on eight types of adversarial attacks compared with other detection baselines. For reconstruction, DRAM improves the robust accuracy by 6% ∼ 41% for Standard ResNet50 and 3% ∼ 8% for Robust ResNet50 compared with other self-supervision tasks, such as rotation prediction and contrastive learning.