Publications

Systematic Rectification of Language Models via Dead-end Analysis

Meng Cao

Mehdi Fatemi

Samira Shabanian

With adversarial or otherwise normal prompts, existing large language models (LLM) can be pushed to generate toxic discourses. One way to re… (see more)duce the risk of LLMs generating undesired discourses is to alter the training of the LLM. This can be very restrictive due to demanding computation requirements. Other methods rely on rule-based or prompt-based token elimination, which are limited as they dismiss future tokens and the overall meaning of the complete discourse. Here, we center detoxification on the probability that the finished discourse is ultimately considered toxic. That is, at each point, we advise against token selections proportional to how likely a finished text from this point will be toxic. To this end, we formally extend the dead-end theory from the recent reinforcement learning (RL) literature to also cover uncertain outcomes. Our approach, called rectification, utilizes a separate but significantly smaller model for detoxification, which can be applied to diverse LLMs as long as they share the same vocabulary. Importantly, our method does not require access to the internal representations of the LLM, but only the token probability distribution at each decoding step. This is crucial as many LLMs today are hosted in servers and only accessible through APIs. When applied to various LLMs, including GPT-3, our approach significantly improves the generated discourse compared to the base LLMs and other techniques in terms of both the overall language and detoxification performance.

2023-02-01

ICLR.cc/2023/Conference (poster)

doi.org

openreview.net

The clinical value of Aspergillus-specific IgG antibody test in the diagnosis of nonneutropenic invasive pulmonary aspergillosis.

Yajie Lu

Lulu Liu

Hongxing Li

Bilin Chen

Yu-hui Gu

Li Wang

Chunlai Feng

Cheng Chen

Yanbin Chen

Wenkui Sun

X. Cui

Min Cao

Yujian Tao

Jinjin Zhong

Huanhuan Zhong

Yueyan Ni

Yuchen Cai

M. Song

X. Liu

Yi Shi Li Liu … (see 1 more)

Xin Su

2023-02-01

Clinical Microbiology and Infection (published)

doi.org

The Hidden Uniform Cluster Prior in Self-Supervised Learning

Mahmoud Assran

Randall Balestriero

Quentin Duval

Florian Bordes

Ishan Misra

Piotr Bojanowski

Pascal Vincent

Michael Rabbat

Nicolas Ballas

A successful paradigm in representation learning is to perform self-supervised pretraining using tasks based on mini-batch statistics (e.g.,… (see more) SimCLR, VICReg, SwAV, MSN). We show that in the formulation of all these methods is an overlooked prior to learn features that enable uniform clustering of the data. While this prior has led to remarkably semantic representations when pretraining on class-balanced data, such as ImageNet, we demonstrate that it can hamper performance when pretraining on class-imbalanced data. By moving away from conventional uniformity priors and instead preferring power-law distributed feature clusters, we show that one can improve the quality of the learned representations on real-world class-imbalanced datasets. To demonstrate this, we develop an extension of the Masked Siamese Networks (MSN) method to support the use of arbitrary features priors.

2023-02-01

ICLR.cc/2023/Conference (poster)

doi.org

openreview.net

Understanding Zero-shot Adversarial Robustness for Large-Scale Models

Chengzhi Mao

Scott Geng

Junfeng Yang

Xin Wang

Carl Vondrick

Pretrained large-scale vision-language models like CLIP have exhibited strong generalization over unseen tasks. Yet imperceptible adversaria… (see more)l perturbations can significantly reduce CLIP's performance on new tasks. In this work, we identify and explore the problem of adapting large-scale models for zero-shot adversarial robustness. We first identify two key factors during model adaption--training losses and adaptation methods--that affect the model's zero-shot adversarial robustness. We then propose a text-guided contrastive adversarial training loss, which aligns the text embeddings and the adversarial visual features with contrastive learning on a small set of training data. We apply this training loss to two adaption methods, model finetuning and visual prompt tuning. We find that visual prompt tuning is more effective in the absence of texts, while finetuning wins in the existence of text guidance. Overall, our approach significantly improves the zero-shot adversarial robustness over CLIP, seeing an average improvement of 31 points over ImageNet and 15 zero-shot datasets. We hope this work can shed light on understanding the zero-shot adversarial robustness of large-scale models.

2023-02-01

ICLR.cc/2023/Conference (poster)

doi.org

openreview.net

A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

Samuel Sokota

Ryan D'Orazio

J Zico Kolter

Nicolas Loizou

Marc Lanctot

Ioannis Mitliagkas

Noam Brown

Christian Kroer

2023-02-01

ICLR.cc/2023/Conference (poster)

openreview.net

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

Mansheej Paul

Feng Chen

Brett W. Larsen

Jonathan Frankle

Surya Ganguli

Gintare Karolina Dziugaite

Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can s… (see more)till be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of training, masking smallest magnitude weights, rewinding back to an early training point, and repeating. Despite its simplicity, the underlying principles for when and how IMP finds winning tickets remain elusive. In particular, what useful information does an IMP mask found at the end of training convey to a rewound network near the beginning of training? How does SGD allow the network to extract this information? And why is iterative pruning needed? We develop answers in terms of the geometry of the error landscape. First, we find that

2023-02-01

ICLR.cc/2023/Conference (notable)

doi.org

openreview.net

Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning

John Nguyen

Jianyu Wang

Kshitiz Malik

Maziar Sanjabi

Michael Rabbat

AI Meta

2023-02-01

ICLR.cc/2023/Conference (notable)

doi.org

openreview.net

BARVINN: Arbitrary Precision DNN Accelerator Controlled by a RISC-V CPU

Mohammadhossein Askarihemmat

Sean Wagner

Olexa Bilaniuk

Yassine Hariri

Yvon Savaria

Jean-Pierre David

2023-01-31

Proceedings of the 28th Asia and South Pacific Design Automation Conference (published)

doi.org

arxiv.org

Graph-based Time-Series Anomaly Detection: A Survey

Thi Kieu Khanh Ho

Ali Karami

Narges Armanfard

With the recent advances in technology, a wide range of systems continue to collect a large amount of data over time and thus generate time … (see more)series. Time-Series Anomaly Detection (TSAD) is an important task in various time-series applications such as e-commerce, cybersecurity, vehicle maintenance, and healthcare monitoring. However, this task is very challenging as it requires considering both the intra-variable dependency and the inter-variable dependency, where a variable can be defined as an observation in time series data. Recent graph-based approaches have made impressive progress in tackling the challenges of this field. In this survey, we conduct a comprehensive and up-to-date review of Graph-based TSAD (G-TSAD). First, we explore the significant potential of graph representation learning for time-series data. Then, we review state-of-the-art graph anomaly detection techniques in the context of time series and discuss their strengths and drawbacks. Finally, we discuss the technical challenges and potential future directions for possible improvements in this research field.

2023-01-31

ArXiv (preprint)

doi.org

arxiv.org

Single-cell multi-omic topic embedding reveals cell-type-specific and COVID-19 severity-related immune signatures

Manqi Zhou

Hao Zhang

Zilong Bai

Dylan Mann-Krzisnik

Fei Wang

Yue Li

The advent of single-cell multi-omics sequencing technology makes it possible for re-searchers to leverage multiple modalities for individua… (see more)l cells and explore cell heterogeneity. However, the high dimensional, discrete, and sparse nature of the data make the downstream analysis particularly challenging. Most of the existing computational methods for single-cell data analysis are either limited to single modality or lack flexibility and interpretability. In this study, we propose an interpretable deep learning method called multi-omic embedded topic model (moETM) to effectively perform integrative analysis of high-dimensional single-cell multimodal data. moETM integrates multiple omics data via a product-of-experts in the encoder for efficient variational inference and then employs multiple linear decoders to learn the multi-omic signatures of the gene regulatory programs. Through comprehensive experiments on public single-cell transcriptome and chromatin accessibility data (i.e., scRNA+scATAC), as well as scRNA and proteomic data (i.e., CITE-seq), moETM demonstrates superior performance compared with six state-of-the-art single-cell data analysis methods on seven publicly available datasets. By applying moETM to the scRNA+scATAC data in human bone marrow mononuclear cells (BMMCs), we identified sequence motifs corresponding to the transcription factors that regulate immune gene signatures. Applying moETM analysis to CITE-seq data from the COVID-19 patients revealed not only known immune cell-type-specific signatures but also composite multi-omic biomarkers of critical conditions due to COVID-19, thus providing insights from both biological and clinical perspectives.

2023-01-31

bioRxiv (preprint)

doi.org

Technical Note—Risk-Averse Regret Minimization in Multistage Stochastic Programs

Mehran Poursoltani

Érick Delage

Angelos Georghiou

2023-01-30

Operational Research (published)

doi.org

Leveraging the Third Dimension in Contrastive Learning

Sumukh K Aithal

Anirudh Goyal

Alex Lamb

Yoshua Bengio

Michael Curtis Mozer

Self-Supervised Learning (SSL) methods operate on unlabeled data to learn robust representations useful for downstream tasks. Most SSL metho… (see more)ds rely on augmentations obtained by transforming the 2D image pixel map. These augmentations ignore the fact that biological vision takes place in an immersive three-dimensional, temporally contiguous environment, and that low-level biological vision relies heavily on depth cues. Using a signal provided by a pretrained state-of-the-art monocular RGB-to-depth model (the \emph{Depth Prediction Transformer}, Ranftl et al., 2021), we explore two distinct approaches to incorporating depth signals into the SSL framework. First, we evaluate contrastive learning using an RGB+depth input representation. Second, we use the depth signal to generate novel views from slightly different camera positions, thereby producing a 3D augmentation for contrastive learning. We evaluate these two approaches on three different SSL methods -- BYOL, SimSiam, and SwAV -- using ImageNette (10 class subset of ImageNet), ImageNet-100 and ImageNet-1k datasets. We find that both approaches to incorporating depth signals improve the robustness and generalization of the baseline SSL methods, though the first approach (with depth-channel concatenation) is superior. For instance, BYOL with the additional depth channel leads to an increase in downstream classification accuracy from 85.3\% to 88.0\% on ImageNette and 84.1\% to 87.0\% on ImageNet-C.

2023-01-27

ArXiv (preprint)

doi.org

openreview.net

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Publications

NLP in the era of generative AI, cognitive sciences, and societal transformation

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications