Nicholas Meade

Investigating Adversarial Trigger Transfer in Large Language Models

Arkil Patel

2024-04-24

ArXiv (preprint)

Universal Adversarial Triggers Are Not Universal

Arkil Patel

2024-04-24

ArXiv (preprint)

StarCoder: may the source be with you!

Raymond Li

Loubna Ben allal

Yangtian Zi

Niklas Muennighoff

Denis Kocetkov

Chenghao Mou

Marc Marone

Christopher Akiki

Jia LI

Jenny Chim

Qian Liu

Evgenii Zheltonozhskii

Terry Yue Zhuo

Thomas Wang

Olivier Dehaene

Mishig Davaadorj

Joel Lamy-Poirier

Joao Monteiro

Oleh Shliazhko

Nicolas Gontier … (see 49 more)

Armel Zebaze

Ming-Ho Yee

Logesh Kumar Umapathi

Jian Zhu

Ben Lipkin

Muhtasham Oblokulov

Zhiruo Wang

Rudra Murthy

Jason T Stillerman

Siva Sankalp Patel

Dmitry Abulkhanov

Marco Zocca

Manan Dey

Zhihan Zhang

N. Fahmy

Urvashi Bhattacharyya

Wenhao Yu

Swayam Singh

Sasha Luccioni

Paulo Villegas

M. Kunakov

Jan Ebert

Fedor Zhdanov

Manuel Romero

Tony Lee

Nadav Timor

Jennifer Ding

Claire S Schlesinger

Hailey Schoelkopf

Jana Ebert

Tri Dao

Mayank Mishra

Alex Gu

Jennifer Robinson

Sean Hughes

Carolyn Jane Anderson

Brendan Dolan-Gavitt

Danish Contractor

Daniel Fried

Dzmitry Bahdanau

Yacine Jernite

Carlos Muñoz Ferrandis

Sean M. Hughes

Thomas Wolf

Arjun Guha

Leandro Von Werra

Harm de Vries

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs)… (see more), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

2023-12-17

TMLR (accepted)

openreview.net

Using In-Context Learning to Improve Dialogue Safety

Devamanyu Hazarika

Di Jin

Yang Liu

Dilek Hakkani-Tur

2023-10-07

EMNLP/2023/Conference (published)

openreview.net

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (see more)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.

2022-12-01

Findings of the Association for Computational Linguistics: EMNLP 2022 (published)

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Elinor Poole-Dayan

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attract… (see more)ed attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model’s language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are often accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation was effective.

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Elinor Poole-Dayan

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attract… (see more)ed attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model’s language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are often accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation was effective.

2021-10-16

ArXiv (preprint)

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (see more)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.

2021-10-15

ArXiv (preprint)